- 19 9月, 2012 9 次提交
-
-
由 Suresh Siddha 提交于
Decouple non-lazy/eager fpu restore policy from the existence of the xsave feature. Introduce a synthetic CPUID flag to represent the eagerfpu policy. "eagerfpu=on" boot paramter will enable the policy. Requested-by: NH. Peter Anvin <hpa@zytor.com> Requested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1347300665-6209-2-git-send-email-suresh.b.siddha@intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Suresh Siddha 提交于
Fundamental model of the current Linux kernel is to lazily init and restore FPU instead of restoring the task state during context switch. This changes that fundamental lazy model to the non-lazy model for the processors supporting xsave feature. Reasons driving this model change are: i. Newer processors support optimized state save/restore using xsaveopt and xrstor by tracking the INIT state and MODIFIED state during context-switch. This is faster than modifying the cr0.TS bit which has serializing semantics. ii. Newer glibc versions use SSE for some of the optimized copy/clear routines. With certain workloads (like boot, kernel-compilation etc), application completes its work with in the first 5 task switches, thus taking upto 5 #DNA traps with the kernel not getting a chance to apply the above mentioned pre-load heuristic. iii. Some xstate features (like AMD's LWP feature) don't honor the cr0.TS bit and thus will not work correctly in the presence of lazy restore. Non-lazy state restore is needed for enabling such features. Some data on a two socket SNB system: * Saved 20K DNA exceptions during boot on a two socket SNB system. * Saved 50K DNA exceptions during kernel-compilation workload. * Improved throughput of the AVX based checksumming function inside the kernel by ~15% as xsave/xrstor is faster than the serializing clts/stts pair. Also now kernel_fpu_begin/end() relies on the patched alternative instructions. So move check_fpu() which uses the kernel_fpu_begin/end() after alternative_instructions(). Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1345842782-24175-7-git-send-email-suresh.b.siddha@intel.com Merge 32-bit boot fix from, Link: http://lkml.kernel.org/r/1347300665-6209-4-git-send-email-suresh.b.siddha@intel.com Cc: Jim Kukunas <james.t.kukunas@linux.intel.com> Cc: NeilBrown <neilb@suse.de> Cc: Avi Kivity <avi@redhat.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Suresh Siddha 提交于
use kernel_fpu_begin/end() instead of unconditionally accessing cr0 and saving/restoring just the few used xmm/ymm registers. This has some advantages like: * If the task's FPU state is already active, then kernel_fpu_begin() will just save the user-state and avoiding the read/write of cr0. In general, cr0 accesses are much slower. * Manual save/restore of xmm/ymm registers will affect the 'modified' and the 'init' optimizations brought in the by xsaveopt/xrstor infrastructure. * Foward compatibility with future vector register extensions will be a problem if the xmm/ymm registers are manually saved and restored (corrupting the extended state of those vector registers). With this patch, there was no significant difference in the xor throughput using AVX, measured during boot. Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1345842782-24175-5-git-send-email-suresh.b.siddha@intel.com Cc: Jim Kukunas <james.t.kukunas@linux.intel.com> Cc: NeilBrown <neilb@suse.de> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Suresh Siddha 提交于
kvm's guest fpu save/restore should be wrapped around kernel_fpu_begin/end(). This will avoid for example taking a DNA in kvm_load_guest_fpu() when it tries to load the fpu immediately after doing unlazy_fpu() on the host side. More importantly this will prevent the host process fpu from being corrupted. Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1345842782-24175-4-git-send-email-suresh.b.siddha@intel.com Cc: Avi Kivity <avi@redhat.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Suresh Siddha 提交于
Few lines below we do drop_fpu() which is more safer. Remove the unnecessary user_fpu_end() in save_xstate_sig(), which allows the drop_fpu() to ignore any pending exceptions from the user-space and drop the current fpu. Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1345842782-24175-3-git-send-email-suresh.b.siddha@intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Suresh Siddha 提交于
No need to save the state with unlazy_fpu(), that is about to get overwritten by the state from the signal frame. Instead use drop_fpu() and continue to restore the new state. Also fold the stop_fpu_preload() into drop_fpu(). Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1345842782-24175-2-git-send-email-suresh.b.siddha@intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Suresh Siddha 提交于
Currently for x86 and x86_32 binaries, fpstate in the user sigframe is copied to/from the fpstate in the task struct. And in the case of signal delivery for x86_64 binaries, if the fpstate is live in the CPU registers, then the live state is copied directly to the user sigframe. Otherwise fpstate in the task struct is copied to the user sigframe. During restore, fpstate in the user sigframe is restored directly to the live CPU registers. Historically, different code paths led to different bugs. For example, x86_64 code path was not preemption safe till recently. Also there is lot of code duplication for support of new features like xsave etc. Unify signal handling code paths for x86 and x86_64 kernels. New strategy is as follows: Signal delivery: Both for 32/64-bit frames, align the core math frame area to 64bytes as needed by xsave (this where the main fpu/extended state gets copied to and excludes the legacy compatibility fsave header for the 32-bit [f]xsave frames). If the state is live, copy the register state directly to the user frame. If not live, copy the state in the thread struct to the user frame. And for 32-bit [f]xsave frames, construct the fsave header separately before the actual [f]xsave area. Signal return: As the 32-bit frames with [f]xstate has an additional 'fsave' header, copy everything back from the user sigframe to the fpstate in the task structure and reconstruct the fxstate from the 'fsave' header (Also user passed pointers may not be correctly aligned for any attempt to directly restore any partial state). At the next fpstate usage, everything will be restored to the live CPU registers. For all the 64-bit frames and the 32-bit fsave frame, restore the state from the user sigframe directly to the live CPU registers. 64-bit signals always restored the math frame directly, so we can expect the math frame pointer to be correctly aligned. For 32-bit fsave frames, there are no alignment requirements, so we can restore the state directly. "lat_sig catch" microbenchmark numbers (for x86, x86_64, x86_32 binaries) are with in the noise range with this change. Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1343171129-2747-4-git-send-email-suresh.b.siddha@intel.com [ Merged in compilation fix ] Link: http://lkml.kernel.org/r/1344544736.8326.17.camel@sbsiddha-desk.sc.intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Suresh Siddha 提交于
Consolidate x86, x86_64 inline asm routines saving/restoring fpu state using config_enabled(). Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1343171129-2747-3-git-send-email-suresh.b.siddha@intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Suresh Siddha 提交于
Use config_enabled() to cleanup the definitions of is_ia32/is_x32. Move the function prototypes to the header file to cleanup ifdefs, and move the x32_setup_rt_frame() code around. Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1343171129-2747-2-git-send-email-suresh.b.siddha@intel.com Merged in compilation fix from, Link: http://lkml.kernel.org/r/1344544736.8326.17.camel@sbsiddha-desk.sc.intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 05 9月, 2012 7 次提交
-
-
由 Mathias Krause 提交于
IOMMU_INIT_POST and IOMMU_INIT_POST_FINISH pass the plain value 0 instead of NULL to __IOMMU_INIT. Fix this and make sparse happy by doing so. Signed-off-by: NMathias Krause <minipli@googlemail.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Link: http://lkml.kernel.org/r/1346621506-30857-8-git-send-email-minipli@googlemail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Mathias Krause 提交于
It's redundant and makes sparse complain about it. Signed-off-by: NMathias Krause <minipli@googlemail.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Link: http://lkml.kernel.org/r/1346621506-30857-7-git-send-email-minipli@googlemail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Mathias Krause 提交于
Don't remove the __user annotation of the fpstate pointer, but drop the superfluous void * cast instead. This fixes the following sparse warnings: xsave.c:135:15: warning: cast removes address space of expression xsave.c:135:15: warning: incorrect type in argument 1 (different address spaces) xsave.c:135:15: expected void const volatile [noderef] <asn:1>*<noident> [...] Signed-off-by: NMathias Krause <minipli@googlemail.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1346621506-30857-6-git-send-email-minipli@googlemail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Mathias Krause 提交于
Stay in sync with the declaration and fix the corresponding sparse warnings. Signed-off-by: NMathias Krause <minipli@googlemail.com> Cc: Dan Williams <dan.j.williams@intel.com> Link: http://lkml.kernel.org/r/1346621506-30857-5-git-send-email-minipli@googlemail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Mathias Krause 提交于
Fix the following sparse warnings by adding appropriate __user casts and annotations: ia32_signal.c:165:38: warning: incorrect type in argument 1 (different address spaces) ia32_signal.c:165:38: expected struct sigaltstack const [noderef] [usertype] <asn:1>*<noident> ia32_signal.c:165:38: got struct sigaltstack * [...] Signed-off-by: NMathias Krause <minipli@googlemail.com> Cc: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/1346621506-30857-4-git-send-email-minipli@googlemail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Mathias Krause 提交于
The address calculated by VDSO32_SYMBOL() is a pointer into userland. Add the __user annotation to fix related sparse warnings in its users. Signed-off-by: NMathias Krause <minipli@googlemail.com> Cc: Andy Lutomirski <luto@MIT.EDU> Link: http://lkml.kernel.org/r/1346621506-30857-3-git-send-email-minipli@googlemail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Mathias Krause 提交于
Fix the following sparse warnings: sys_ia32.c:293:38: warning: incorrect type in argument 2 (different address spaces) sys_ia32.c:293:38: expected unsigned int [noderef] [usertype] <asn:1>*stat_addr sys_ia32.c:293:38: got unsigned int *stat_addr Ironically, sys_ia32.h was introduced to fix sparse warnings but missed that one. Signed-off-by: NMathias Krause <minipli@googlemail.com> Link: http://lkml.kernel.org/r/1346621506-30857-2-git-send-email-minipli@googlemail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 28 8月, 2012 1 次提交
-
-
由 Michael S. Tsirkin 提交于
KVM_GET_MSR was missing support for PV EOI, which is needed for migration. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 23 8月, 2012 3 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
When we are finished with return PFNs to the hypervisor, then populate it back, and also mark the E820 MMIO and E820 gaps as IDENTITY_FRAMEs, we then call P2M to set areas that can be used for ballooning. We were off by one, and ended up over-writting a P2M entry that most likely was an IDENTITY_FRAME. For example: 1-1 mapping on 40000->40200 1-1 mapping on bc558->bc5ac 1-1 mapping on bc5b4->bc8c5 1-1 mapping on bc8c6->bcb7c 1-1 mapping on bcd00->100000 Released 614 pages of unused memory Set 277889 page(s) to 1-1 mapping Populating 40200-40466 pfn range: 614 pages added => here we set from 40466 up to bc559 P2M tree to be INVALID_P2M_ENTRY. We should have done it up to bc558. The end result is that if anybody is trying to construct a PTE for PFN bc558 they end up with ~PAGE_PRESENT. CC: stable@vger.kernel.org Reported-by-and-Tested-by: NAndre Przywara <andre.przywara@amd.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Andreas Herrmann 提交于
This issue was recently observed on an AMD C-50 CPU where a patch of maximum size was applied. Commit be62adb4 ("x86, microcode, AMD: Simplify ucode verification") added current_size in get_matching_microcode(). This is calculated as size of the ucode patch + 8 (ie. size of the header). Later this is compared against the maximum possible ucode patch size for a CPU family. And of course this fails if the patch has already maximum size. Cc: <stable@vger.kernel.org> [3.3+] Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com> Link: http://lkml.kernel.org/r/1344361461-10076-1-git-send-email-bp@amd64.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Avi Kivity 提交于
The sub-register used to access the stack (sp, esp, or rsp) is not determined by the address size attribute like other memory references, but by the stack segment's B bit (if not in x86_64 mode). Fix by using the existing stack_mask() to figure out the correct mask. This long-existing bug was exposed by a combination of a27685c3 (emulate invalid guest state by default), which causes many more instructions to be emulated, and a seabios change (possibly a bug) which causes the high 16 bits of esp to become polluted across calls to real mode software interrupts. Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 22 8月, 2012 5 次提交
-
-
由 Takuya Yoshikawa 提交于
Although the possible race described in commit 85b70591 KVM: MMU: fix shrinking page from the empty mmu was correct, the real cause of that issue was a more trivial bug of mmu_shrink() introduced by commit 19526396 KVM: MMU: do not iterate over all VMs in mmu_shrink() Here is the bug: if (kvm->arch.n_used_mmu_pages > 0) { if (!nr_to_scan--) break; continue; } We skip VMs whose n_used_mmu_pages is not zero and try to shrink others: in other words we try to shrink empty ones by mistake. This patch reverses the logic so that mmu_shrink() can free pages from the first VM whose n_used_mmu_pages is not zero. Note that we also add comments explaining the role of nr_to_scan which is not practically important now, hoping this will be improved in the future. Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Avi Kivity 提交于
Probably a leftover from the early days of self-patching, p6nops are marked __initconst_or_module, which causes them to be discarded in a non-modular kernel. If something later triggers patching, it will overwrite kernel code with garbage. Reported-by: NTomas Racek <tracek@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com> Cc: Michael Tokarev <mjt@tls.msk.ru> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: qemu-devel@nongnu.org Cc: Anthony Liguori <anthony@codemonkey.ws> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Alan Cox <alan@linux.intel.com> Link: http://lkml.kernel.org/r/5034AE84.90708@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Liu, Chuansheng 提交于
When one CPU is going down and this CPU is the last one in irq affinity, current code is setting cpu_all_mask as the new affinity for that irq. But for some systems (such as in Medfield Android mobile) the firmware sends the interrupt to each CPU in the irq affinity mask, averaged, and cpu_all_mask includes all potential CPUs, i.e. offline ones as well. So replace cpu_all_mask with cpu_online_mask. Signed-off-by: Nliu chuansheng <chuansheng.liu@intel.com> Acked-by: NYanmin Zhang <yanmin_zhang@linux.intel.com> Acked-by: NThomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/27240C0AC20F114CBF8149A2696CBE4A137286@SHSMSX101.ccr.corp.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Richard Weinberger 提交于
This comment is no longer true. We support up to 2^16 CPUs because __ticket_t is an u16 if NR_CPUS is larger than 256. Signed-off-by: NRichard Weinberger <richard@nod.at> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Michal Hocko 提交于
Each page mapped in a process's address space must be correctly accounted for in _mapcount. Normally the rules for this are straightforward but hugetlbfs page table sharing is different. The page table pages at the PMD level are reference counted while the mapcount remains the same. If this accounting is wrong, it causes bugs like this one reported by Larry Woodman: kernel BUG at mm/filemap.c:135! invalid opcode: 0000 [#1] SMP CPU 22 Modules linked in: bridge stp llc sunrpc binfmt_misc dcdbas microcode pcspkr acpi_pad acpi] Pid: 18001, comm: mpitest Tainted: G W 3.3.0+ #4 Dell Inc. PowerEdge R620/07NDJ2 RIP: 0010:[<ffffffff8112cfed>] [<ffffffff8112cfed>] __delete_from_page_cache+0x15d/0x170 Process mpitest (pid: 18001, threadinfo ffff880428972000, task ffff880428b5cc20) Call Trace: delete_from_page_cache+0x40/0x80 truncate_hugepages+0x115/0x1f0 hugetlbfs_evict_inode+0x18/0x30 evict+0x9f/0x1b0 iput_final+0xe3/0x1e0 iput+0x3e/0x50 d_kill+0xf8/0x110 dput+0xe2/0x1b0 __fput+0x162/0x240 During fork(), copy_hugetlb_page_range() detects if huge_pte_alloc() shared page tables with the check dst_pte == src_pte. The logic is if the PMD page is the same, they must be shared. This assumes that the sharing is between the parent and child. However, if the sharing is with a different process entirely then this check fails as in this diagram: parent | ------------>pmd src_pte----------> data page ^ other--------->pmd--------------------| ^ child-----------| dst_pte For this situation to occur, it must be possible for Parent and Other to have faulted and failed to share page tables with each other. This is possible due to the following style of race. PROC A PROC B copy_hugetlb_page_range copy_hugetlb_page_range src_pte == huge_pte_offset src_pte == huge_pte_offset !src_pte so no sharing !src_pte so no sharing (time passes) hugetlb_fault hugetlb_fault huge_pte_alloc huge_pte_alloc huge_pmd_share huge_pmd_share LOCK(i_mmap_mutex) find nothing, no sharing UNLOCK(i_mmap_mutex) LOCK(i_mmap_mutex) find nothing, no sharing UNLOCK(i_mmap_mutex) pmd_alloc pmd_alloc LOCK(instantiation_mutex) fault UNLOCK(instantiation_mutex) LOCK(instantiation_mutex) fault UNLOCK(instantiation_mutex) These two processes are not poing to the same data page but are not sharing page tables because the opportunity was missed. When either process later forks, the src_pte == dst pte is potentially insufficient. As the check falls through, the wrong PTE information is copied in (harmless but wrong) and the mapcount is bumped for a page mapped by a shared page table leading to the BUG_ON. This patch addresses the issue by moving pmd_alloc into huge_pmd_share which guarantees that the shared pud is populated in the same critical section as pmd. This also means that huge_pte_offset test in huge_pmd_share is serialized correctly now which in turn means that the success of the sharing will be higher as the racing tasks see the pud and pmd populated together. Race identified and changelog written mostly by Mel Gorman. {akpm@linux-foundation.org: attempt to make the huge_pmd_share() comment comprehensible, clean up coding style] Reported-by: NLarry Woodman <lwoodman@redhat.com> Tested-by: NLarry Woodman <lwoodman@redhat.com> Reviewed-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NMichal Hocko <mhocko@suse.cz> Reviewed-by: NRik van Riel <riel@redhat.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Ken Chen <kenchen@google.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 8月, 2012 1 次提交
-
-
由 Mike Frysinger 提交于
Some of the arguments to {g,s}etsockopt are passed in userland pointers. If we try to use the 64bit entry point, we end up sometimes failing. For example, dhcpcd doesn't run in x32: # dhcpcd eth0 dhcpcd[1979]: version 5.5.6 starting dhcpcd[1979]: eth0: broadcasting for a lease dhcpcd[1979]: eth0: open_socket: Invalid argument dhcpcd[1979]: eth0: send_raw_packet: Bad file descriptor The code in particular is getting back EINVAL when doing: struct sock_fprog pf; setsockopt(s, SOL_SOCKET, SO_ATTACH_FILTER, &pf, sizeof(pf)); Diving into the kernel code, we can see: include/linux/filter.h: struct sock_fprog { unsigned short len; struct sock_filter __user *filter; }; net/core/sock.c: case SO_ATTACH_FILTER: ret = -EINVAL; if (optlen == sizeof(struct sock_fprog)) { struct sock_fprog fprog; ret = -EFAULT; if (copy_from_user(&fprog, optval, sizeof(fprog))) break; ret = sk_attach_filter(&fprog, sk); } break; arch/x86/syscalls/syscall_64.tbl: 54 common setsockopt sys_setsockopt 55 common getsockopt sys_getsockopt So for x64, sizeof(sock_fprog) is 16 bytes. For x86/x32, it's 8 bytes. This comes down to the pointer being 32bit for x32, which means we need to do structure size translation. But since x32 comes in directly to sys_setsockopt, it doesn't get translated like x86. After changing the syscall table and rebuilding glibc with the new kernel headers, dhcp runs fine in an x32 userland. Oddly, it seems like Linus noted the same thing during the initial port, but I guess that was missed/lost along the way: https://lkml.org/lkml/2011/8/26/452 [ hpa: tagging for -stable since this is an ABI fix. ] Bugzilla: https://bugs.gentoo.org/423649Reported-by: NMads <mads@ab3.no> Signed-off-by: NMike Frysinger <vapier@gentoo.org> Link: http://lkml.kernel.org/r/1345320697-15713-1-git-send-email-vapier@gentoo.org Cc: H. J. Lu <hjl.tools@gmail.com> Cc: <stable@vger.kernel.org> v3.4..v3.5 Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 17 8月, 2012 2 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
If P2M leaf is completly packed with INVALID_P2M_ENTRY or with 1:1 PFNs (so IDENTITY_FRAME type PFNs), we can swap the P2M leaf with either a p2m_missing or p2m_identity respectively. The old page (which was created via extend_brk or was grafted on from the mfn_list) can be re-used for setting new PFNs. This also means we can remove git commit: 5bc6f988 xen/p2m: Reserve 8MB of _brk space for P2M leafs when populating back which tried to fix this. and make the amount that is required to be reserved much smaller. CC: stable@vger.kernel.org # for 3.5 only. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Konrad Rzeszutek Wilk 提交于
This reverts commit 00e37bdb. During shutdown of PVHVM guests with more than 2VCPUs on certain machines we can hit the race where the replaced shared_info is not replaced fast enough and the PV time clock retries reading the same area over and over without any any success and is stuck in an infinite loop. Acked-by: NOlaf Hering <olaf@aepfle.de> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 15 8月, 2012 2 次提交
-
-
由 H. Peter Anvin 提交于
This reverts commit bacef661. This commit has been found to cause serious regressions on a number of ASUS machines at the least. We probably need to provide a 1:1 map in addition to the EFI virtual memory map in order for this to work. Signed-off-by: NH. Peter Anvin <hpa@zytor.com> Reported-and-bisected-by: NJérôme Carretero <cJ-ko@zougloub.eu> Cc: Jan Beulich <jbeulich@suse.com> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Matthew Garrett <mjg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120805172903.5f8bb24c@zougloub.eu
-
由 Suresh Siddha 提交于
Recent commit 332afa65 cleaned up a workaround that updates irq_cfg domain for legacy irq's that are handled by the IO-APIC. This was assuming that the recent changes in assign_irq_vector() were sufficient to remove the workaround. But this broke couple of AMD platforms. One of them seems to be sending interrupts to the offline cpu's, resulting in spurious "No irq handler for vector xx (irq -1)" messages when those cpu's come online. And the other platform seems to always send the interrupt to the last logical CPU (cpu-7). Recent changes had an unintended side effect of using only logical cpu-0 in the IO-APIC RTE (during boot for the legacy interrupts) and this broke the legacy interrupts not getting routed to the cpu-7 on the AMD platform, resulting in a boot hang. For now, reintroduce the removed workaround, (essentially not allowing the vector to change for legacy irq's when io-apic starts to handle the irq. Which also addressed the uninteded sife effect of just specifying cpu-0 in the IO-APIC RTE for those irq's during boot). Reported-and-tested-by: NRobert Richter <robert.richter@amd.com> Reported-and-tested-by: NBorislav Petkov <bp@amd64.org> Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1344453412.29170.5.camel@sbsiddha-desk.sc.intel.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 14 8月, 2012 4 次提交
-
-
由 Gleb Natapov 提交于
If PMU counter has PEBS enabled it is not enough to disable counter on a guest entry since PEBS memory write can overshoot guest entry and corrupt guest memory. Disabling PEBS during guest entry solves the problem. Tested-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120809085234.GI3341@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Yan, Zheng 提交于
The Westmere-EX uncore is similar to the Nehalem-EX uncore. The differences are: - Westmere-EX uncore has 10 instances of Cbox. The MSRs for Cbox8 and Cbox9 in the Westmere-EX aren't contiguous with Cbox 0~7. - The fvid field in the ZDP_CTL_FVC register in the Mbox is different. It's 5 bits in the Nehalem-EX, 6 bits in the Westmere-EX. Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1344229882-3907-3-git-send-email-zheng.z.yan@intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Yan, Zheng 提交于
This patch includes following fixes and update: - Only some events in the Sbox and Mbox can use the match/mask registers, add code to check this. - The format definitions for xbr_mm_cfg and xbr_match registers in the Rbox are wrong, xbr_mm_cfg should use 32 bits, xbr_match should use 64 bits. - Cleanup the Rbox code. Compute the addresses extra registers in the enable_event function instead of the hw_config function. This simplifies the code in nhmex_rbox_alter_er(). Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1344229882-3907-2-git-send-email-zheng.z.yan@intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Borislav Petkov 提交于
Fix the following section mismatch: WARNING: arch/x86/kernel/cpu/built-in.o(.text+0x7ad9): Section mismatch in reference from the function uncore_types_exit() to the function .init.text:uncore_type_exit() The function uncore_types_exit() references the function __init uncore_type_exit(). This is often because uncore_types_exit lacks a __init annotation or the annotation of uncore_type_exit is wrong. caused by 14371cce ("perf: Add generic PCI uncore PMU device support"). Cc: Zheng Yan <zheng.z.yan@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1339741902-8449-8-git-send-email-zheng.z.yan@intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 11 8月, 2012 1 次提交
-
-
由 Andrew Boie 提交于
GCC built with nonstandard options can enable -fpic by default. We never want this for 32-bit kernels and it will break the build. [ hpa: Notably the Android toolchain apparently does this. ] Change-Id: Iaab7d66e598b1c65ac4a4f0229eca2cd3d0d2898 Signed-off-by: NAndrew Boie <andrew.p.boie@intel.com> Link: http://lkml.kernel.org/r/1344624546-29691-1-git-send-email-andrew.p.boie@intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 09 8月, 2012 1 次提交
-
-
由 Suresh Siddha 提交于
Clear AVX, AVX2 features along with clearing XSAVE feature bits, as part of the parsing "noxsave" parameter. Fixes the kernel boot panic with "noxsave" boot parameter. We could have checked cpu_has_osxsave along with cpu_has_avx etc, but Peter mentioned clearing the feature bits will be better for uses like static_cpu_has() etc. Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1343755754.2041.2.camel@sbsiddha-desk.sc.intel.com Cc: <stable@vger.kernel.org> # v3.5 Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 05 8月, 2012 1 次提交
-
-
由 Gleb Natapov 提交于
When MSR_KVM_PV_EOI_EN was added to msrs_to_save array KVM_SAVE_MSRS_BEGIN was not updated accordingly. Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
- 03 8月, 2012 1 次提交
-
-
由 Thomas Renninger 提交于
Otherwise you could run into: WARN_ON in numa_register_memblks(), because node_possible_map is zero References: https://bugzilla.novell.com/show_bug.cgi?id=757888 On this machine (ProLiant ML570 G3) the SRAT table contains: - No processor affinities - One memory affinity structure (which is set disabled) CC: Per Jessen <per@opensuse.org> CC: Andi Kleen <andi@firstfloor.org> Signed-off-by: NThomas Renninger <trenn@suse.de> Signed-off-by: NLen Brown <len.brown@intel.com>
-
- 02 8月, 2012 2 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
When we release pages back during bootup: Freeing 9d-100 pfn range: 99 pages freed Freeing 9cf36-9d0d2 pfn range: 412 pages freed Freeing 9f6bd-9f6bf pfn range: 2 pages freed Freeing 9f714-9f7bf pfn range: 171 pages freed Freeing 9f7e0-9f7ff pfn range: 31 pages freed Freeing 9f800-100000 pfn range: 395264 pages freed Released 395979 pages of unused memory We then try to populate those pages back. In the P2M tree however the space for those leafs must be reserved - as such we use extend_brk. We reserve 8MB of _brk space, which means we can fit over 1048576 PFNs - which is more than we should ever need. Without this, on certain compilation of the kernel we would hit: (XEN) domain_crash_sync called from entry.S (XEN) CPU: 0 (XEN) RIP: e033:[<ffffffff818aad3b>] (XEN) RFLAGS: 0000000000000206 EM: 1 CONTEXT: pv guest (XEN) rax: ffffffff81a7c000 rbx: 000000000000003d rcx: 0000000000001000 (XEN) rdx: ffffffff81a7b000 rsi: 0000000000001000 rdi: 0000000000001000 (XEN) rbp: ffffffff81801cd8 rsp: ffffffff81801c98 r8: 0000000000100000 (XEN) r9: ffffffff81a7a000 r10: 0000000000000001 r11: 0000000000000003 (XEN) r12: 0000000000000004 r13: 0000000000000004 r14: 000000000000003d (XEN) r15: 00000000000001e8 cr0: 000000008005003b cr4: 00000000000006f0 (XEN) cr3: 0000000125803000 cr2: 0000000000000000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033 (XEN) Guest stack trace from rsp=ffffffff81801c98: .. which is extend_brk hitting a BUG_ON. Interestingly enough, most of the time we are not going to hit this b/c the _brk space is quite large (v3.5): ffffffff81a25000 B __brk_base ffffffff81e43000 B __brk_limit = ~4MB. vs earlier kernels (with this back-ported), the space is smaller: ffffffff81a25000 B __brk_base ffffffff81a7b000 B __brk_limit = 344 kBytes. where we would certainly hit this and hit extend_brk. Note that git commit c3d93f88 (xen: populate correct number of pages when across mem boundary (v2)) exposed this bug). [v1: Made it 8MB of _brk space instead of 4MB per Jan's suggestion] CC: stable@vger.kernel.org #only for 3.5 Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Avi Kivity 提交于
Commit b2da15ac ("KVM: VMX: Optimize %ds, %es reload") broke i386 in the following scenario: vcpu_load ... vmx_save_host_state vmx_vcpu_run (ds.rpl, es.rpl cleared by hardware) interrupt push ds, es # pushes bad ds, es schedule vmx_vcpu_put vmx_load_host_state reload ds, es (with __USER_DS) pop ds, es # of other thread's stack iret # other thread runs interrupt push ds, es schedule # back in vcpu thread pop ds, es # now with rpl=0 iret ... vcpu_put resume_userspace iret # clears ds, es due to mismatched rpl (instead of resume_userspace, we might return with SYSEXIT and then take an exception; when the exception IRETs we end up with cleared ds, es) Fix by avoiding the optimization on i386 and reloading ds, es on the lightweight exit path. Reported-by: NChris Clayron <chris2553@googlemail.com> Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-