- 02 11月, 2017 1 次提交
-
-
由 Juergen Gross 提交于
Instead of trying to execute any NMI via the bare metal's NMI trap handler use a Xen specific one for PV domains, like we do for e.g. debug traps. As in a PV domain the NMI is handled via the normal kernel stack this is the correct thing to do. This will enable us to get rid of the very fragile and questionable dependencies between the bare metal NMI handler and Xen assumptions believed to be broken anyway. Signed-off-by: NJuergen Gross <jgross@suse.com> Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/5baf5c0528d58402441550c5770b98e7961e7680.1509609304.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 10 10月, 2017 1 次提交
-
-
由 Zhenzhong Duan 提交于
As xen_cpuhp_setup is called by PV and PVHVM, the name of "x86/xen/hvm_guest" is confusing. Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
-
- 28 9月, 2017 3 次提交
-
-
由 Zhenzhong Duan 提交于
When bootup a PVM guest with large memory(Ex.240GB), XEN provided initial mapping overlaps with kernel module virtual space. When mapping in this space is cleared by xen_cleanhighmap(), in certain case there could be an 2MB mapping left. This is due to XEN initialize 4MB aligned mapping but xen_cleanhighmap() finish at 2MB boundary. When module loading is just on top of the 2MB space, got below warning: WARNING: at mm/vmalloc.c:106 vmap_pte_range+0x14e/0x190() Call Trace: [<ffffffff81117083>] warn_alloc_failed+0xf3/0x160 [<ffffffff81146022>] __vmalloc_area_node+0x182/0x1c0 [<ffffffff810ac91e>] ? module_alloc_update_bounds+0x1e/0x80 [<ffffffff81145df7>] __vmalloc_node_range+0xa7/0x110 [<ffffffff810ac91e>] ? module_alloc_update_bounds+0x1e/0x80 [<ffffffff8103ca54>] module_alloc+0x64/0x70 [<ffffffff810ac91e>] ? module_alloc_update_bounds+0x1e/0x80 [<ffffffff810ac91e>] module_alloc_update_bounds+0x1e/0x80 [<ffffffff810ac9a7>] move_module+0x27/0x150 [<ffffffff810aefa0>] layout_and_allocate+0x120/0x1b0 [<ffffffff810af0a8>] load_module+0x78/0x640 [<ffffffff811ff90b>] ? security_file_permission+0x8b/0x90 [<ffffffff810af6d2>] sys_init_module+0x62/0x1e0 [<ffffffff815154c2>] system_call_fastpath+0x16/0x1b Then the mapping of 2MB is cleared, finally oops when the page in that space is accessed. BUG: unable to handle kernel paging request at ffff880022600000 IP: [<ffffffff81260877>] clear_page_c_e+0x7/0x10 PGD 1788067 PUD 178c067 PMD 22434067 PTE 0 Oops: 0002 [#1] SMP Call Trace: [<ffffffff81116ef7>] ? prep_new_page+0x127/0x1c0 [<ffffffff81117d42>] get_page_from_freelist+0x1e2/0x550 [<ffffffff81133010>] ? ii_iovec_copy_to_user+0x90/0x140 [<ffffffff81119c9d>] __alloc_pages_nodemask+0x12d/0x230 [<ffffffff81155516>] alloc_pages_vma+0xc6/0x1a0 [<ffffffff81006ffd>] ? pte_mfn_to_pfn+0x7d/0x100 [<ffffffff81134cfb>] do_anonymous_page+0x16b/0x350 [<ffffffff81139c34>] handle_pte_fault+0x1e4/0x200 [<ffffffff8100712e>] ? xen_pmd_val+0xe/0x10 [<ffffffff810052c9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e [<ffffffff81139dab>] handle_mm_fault+0x15b/0x270 [<ffffffff81510c10>] do_page_fault+0x140/0x470 [<ffffffff8150d7d5>] page_fault+0x25/0x30 Call xen_cleanhighmap() with 4MB aligned for page tables mapping to fix it. The unnecessory call of xen_cleanhighmap() in DEBUG mode is also removed. -v2: add comment about XEN alignment from Juergen. References: https://lists.xen.org/archives/html/xen-devel/2012-07/msg01562.htmlSigned-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com> Reviewed-by: NJuergen Gross <jgross@suse.com> [boris: added 'xen/mmu' tag to commit subject] Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
-
由 Josh Poimboeuf 提交于
Add unwind hint annotations to the xen head code so the ORC unwinder can read head_64.o. hypercall_page needs empty annotations at 32-byte intervals to match the 'xen_hypercall_*' ELF functions at those locations. Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/70ed2eb516fe9266be766d953f93c2571bca88cc.1505764066.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Josh Poimboeuf 提交于
Mark the ends of the startup_xen and hypercall_page code sections. Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/3a80a394d30af43d9cefa1a29628c45ed8420c97.1505764066.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 16 9月, 2017 1 次提交
-
-
由 Arnd Bergmann 提交于
gcc-4.6 causes a harmless link-time warning: WARNING: vmlinux.o(.text.unlikely+0x48e): Section mismatch in reference from the function xen_find_pt_base() to the function .init.text:m2p() The function xen_find_pt_base() references the function __init m2p(). This is often because xen_find_pt_base lacks a __init annotation or the annotation of m2p is wrong. Newer compilers inline this function, so it never shows up, but marking it __init is the right way to avoid the warning. Fixes: 70e61199 ("xen: move p2m list if conflicting with e820 map") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
-
- 13 9月, 2017 1 次提交
-
-
由 Juergen Gross 提交于
With removal of lguest some of the paravirt functions are no longer needed: ->read_cr4() ->store_idt() ->set_pmd_at() ->set_pud_at() ->pte_update() Remove them. Signed-off-by: NJuergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akataria@vmware.com Cc: boris.ostrovsky@oracle.com Cc: chrisw@sous-sol.org Cc: jeremy@goop.org Cc: rusty@rustcorp.com.au Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20170904102527.25409-1-jgross@suse.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 01 9月, 2017 1 次提交
-
-
由 Juergen Gross 提交于
When running as Xen pv-guest the exception frame on the stack contains %r11 and %rcx additional to the other data pushed by the processor. Instead of having a paravirt op being called for each exception type prepend the Xen specific code to each exception entry. When running as Xen pv-guest just use the exception entry with prepended instructions, otherwise use the entry without the Xen specific code. [ tglx: Merged through tip to avoid ugly merge conflict ] Signed-off-by: NJuergen Gross <jgross@suse.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: boris.ostrovsky@oracle.com Cc: luto@amacapital.net Link: http://lkml.kernel.org/r/20170831174249.26853-1-jg@pfupf.net
-
- 31 8月, 2017 3 次提交
-
-
由 Wei Liu 提交于
No functional change because MMU_NORMAL_PT_UPDATE is in fact 0. Set it to make the code consistent with similar code in mmu_pv.c Signed-off-by: NWei Liu <wei.liu2@citrix.com> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
-
由 Juergen Gross 提交于
The function xen_set_domain_pte() is used nowhere in the kernel. Remove it. Signed-off-by: NJuergen Gross <jgross@suse.com> Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
-
由 Juergen Gross 提交于
Remove the last tests for XENFEAT_auto_translated_physmap in pure PV-domain specific paths. PVH V1 is gone and the feature will always be "false" in PV guests. Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
-
- 29 8月, 2017 2 次提交
-
-
由 Thomas Gleixner 提交于
The union inside of desc_struct which allows access to the raw u32 parts of the descriptors. This raw access part is about to go away. Replace the few code parts which access those fields. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064958.120214366@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
The first 32 bits of gate struct are the same for 32 and 64 bit kernels. The 32-bit version uses desc_struct and no designated data structure, so we need different accessors for 32 and 64 bit kernels. Aside of that the macros which are necessary to build the 32-bit gate descriptor are horrible to read. Unify the gate structs and switch all code fiddling with it over. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064957.861974317@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 24 8月, 2017 1 次提交
-
-
由 Juergen Gross 提交于
Xen's paravirt patch function xen_patch() does some special casing for irq_ops functions to apply relocations when those functions can be patched inline instead of calls. Unfortunately none of the special case function replacements is small enough to be patched inline, so the special case never applies. As xen_patch() will call paravirt_patch_default() in all cases it can be just dropped. xen-asm.h doesn't seem necessary without xen_patch() as the only thing left in it would be the definition of XEN_EFLAGS_NMI used only once. So move that definition and remove xen-asm.h. Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boris.ostrovsky@oracle.com Cc: lguest@lists.ozlabs.org Cc: rusty@rustcorp.com.au Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20170816173157.8633-2-jgross@suse.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 15 8月, 2017 1 次提交
-
-
由 Andy Lutomirski 提交于
When I cleaned up the Xen SYSCALL entries, I inadvertently changed the reported segment registers. Before my patch, regs->ss was __USER(32)_DS and regs->cs was __USER(32)_CS. After the patch, they are FLAT_USER_CS/DS(32). This had a couple unfortunate effects. It confused the opportunistic fast return logic. It also significantly increased the risk of triggering a nasty glibc bug: https://sourceware.org/bugzilla/show_bug.cgi?id=21269 Update the Xen entry code to change it back. Reported-by: NBrian Gerst <brgerst@gmail.com> Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Fixes: 8a9949bc ("x86/xen/64: Rearrange the SYSCALL entries") Link: http://lkml.kernel.org/r/daba8351ea2764bb30272296ab9ce08a81bd8264.1502775273.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 11 8月, 2017 2 次提交
-
-
由 Juergen Gross 提交于
A Xen HVM guest running with KASLR enabled will die rather soon today because the shared info page mapping is using va() too early. This was introduced by commit a5d5f328 ("xen: allocate page for shared info page from low memory"). In order to fix this use early_memremap() to get a temporary virtual address for shared info until va() can be used safely. Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NJuergen Gross <jgross@suse.com>
-
由 Juergen Gross 提交于
Instead of calling xen_hvm_init_shared_info() on boot and resume split it up into a boot time function searching for the pfn to use and a mapping function doing the hypervisor mapping call. Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NJuergen Gross <jgross@suse.com>
-
- 10 8月, 2017 1 次提交
-
-
由 Andy Lutomirski 提交于
Xen's raw SYSCALL entries are much less weird than native. Rather than fudging them to look like native entries, use the Xen-provided stack frame directly. This lets us eliminate entry_SYSCALL_64_after_swapgs and two uses of the SWAPGS_UNSAFE_STACK paravirt hook. The SYSENTER code would benefit from similar treatment. This makes one change to the native code path: the compat instruction that clears the high 32 bits of %rax is moved slightly later. I'd be surprised if this affects performance at all. Tested-by: NJuergen Gross <jgross@suse.com> Signed-off-by: NAndy Lutomirski <luto@kernel.org> Reviewed-by: NJuergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/7c88ed36805d36841ab03ec3b48b4122c4418d71.1502164668.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 23 7月, 2017 2 次提交
-
-
由 Juergen Gross 提交于
Commit dc6416f1 ("xen/x86: Call cpu_startup_entry(CPUHP_AP_ONLINE_IDLE) from xen_play_dead()") introduced an error leading to a stack overflow of the idle task when a cpu was brought offline/online many times: by calling cpu_startup_entry() instead of returning at the end of xen_play_dead() do_idle() would be entered again and again. Don't use cpu_startup_entry(), but cpuhp_online_idle() instead allowing to return from xen_play_dead(). Cc: <stable@vger.kernel.org> # 4.12 Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NJuergen Gross <jgross@suse.com>
-
由 Vitaly Kuznetsov 提交于
CONFIG_BOOTPARAM_HOTPLUG_CPU0 allows to offline CPU0 but Xen HVM guests BUG() in xen_teardown_timer(). Remove the BUG_ON(), this is probably a leftover from ancient times when CPU0 hotplug was impossible, it works just fine for HVM. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Acked-by: NJuergen Gross <jgross@suse.com> Signed-off-by: NJuergen Gross <jgross@suse.com>
-
- 21 7月, 2017 2 次提交
-
-
由 Kirill A. Shutemov 提交于
Most of things are in place and we can enable support for 5-level paging. The patch makes XEN_PV and XEN_PVH dependent on !X86_5LEVEL. Both are not ready to work with 5-level paging. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: NJuergen Gross <jgross@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170716225954.74185-9-kirill.shutemov@linux.intel.com [ Minor readability edits. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kirill A. Shutemov 提交于
XEN_ELFNOTE_INIT_P2M has to be 512GB for both 4- and 5-level paging. (PUD_SIZE * PTRS_PER_PUD) would do this. Unfortunately, we cannot use P4D_SIZE, which would fit here. With current headers structure it cannot be used in assembly, if p4d level is folded. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: NJuergen Gross <jgross@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170716225954.74185-4-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 18 7月, 2017 1 次提交
-
-
由 Tom Lendacky 提交于
Xen does not currently support SME for PV guests. Clear the SME CPU capability in order to avoid any ambiguity. Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NJuergen Gross <jgross@suse.com> Cc: <xen-devel@lists.xen.org> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Toshimitsu Kani <toshi.kani@hpe.com> Cc: kasan-dev@googlegroups.com Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/3b605622a9fae5e588e5a13967120a18ec18071b.1500319216.git.thomas.lendacky@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 13 7月, 2017 1 次提交
-
-
由 Xunlei Pang 提交于
As Eric said, "what we need to do is move the variable vmcoreinfo_note out of the kernel's .bss section. And modify the code to regenerate and keep this information in something like the control page. Definitely something like this needs a page all to itself, and ideally far away from any other kernel data structures. I clearly was not watching closely the data someone decided to keep this silly thing in the kernel's .bss section." This patch allocates extra pages for these vmcoreinfo_XXX variables, one advantage is that it enhances some safety of vmcoreinfo, because vmcoreinfo now is kept far away from other kernel data structures. Link: http://lkml.kernel.org/r/1493281021-20737-1-git-send-email-xlpang@redhat.comSigned-off-by: NXunlei Pang <xlpang@redhat.com> Tested-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com> Reviewed-by: NJuergen Gross <jgross@suse.com> Suggested-by: NEric Biederman <ebiederm@xmission.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Young <dyoung@redhat.com> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 7月, 2017 2 次提交
-
-
由 Andy Lutomirski 提交于
We can use PCID if the CPU has PCID and PGE and we're not on Xen. By itself, this has no effect. A followup patch will start using PCID. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Reviewed-by: NNadav Amit <nadav.amit@gmail.com> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/6327ecd907b32f79d5aa0d466f04503bbec5df88.1498751203.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andy Lutomirski 提交于
x86's lazy TLB mode used to be fairly weak -- it would switch to init_mm the first time it tried to flush a lazy TLB. This meant an unnecessary CR3 write and, if the flush was remote, an unnecessary IPI. Rewrite it entirely. When we enter lazy mode, we simply remove the CPU from mm_cpumask. This means that we need a way to figure out whether we've missed a flush when we switch back out of lazy mode. I use the tlb_gen machinery to track whether a context is up to date. Note to reviewers: this patch, my itself, looks a bit odd. I'm using an array of length 1 containing (ctx_id, tlb_gen) rather than just storing tlb_gen, and making it at array isn't necessary yet. I'm doing this because the next few patches add PCID support, and, with PCID, we need ctx_id, and the array will end up with a length greater than 1. Making it an array now means that there will be less churn and therefore less stress on your eyeballs. NB: This is dubious but, AFAICT, still correct on Xen and UV. xen_exit_mmap() uses mm_cpumask() for nefarious purposes and this patch changes the way that mm_cpumask() works. This should be okay, since Xen *also* iterates all online CPUs to find all the CPUs it needs to twiddle. The UV tlbflush code is rather dated and should be changed. Here are some benchmark results, done on a Skylake laptop at 2.3 GHz (turbo off, intel_pstate requesting max performance) under KVM with the guest using idle=poll (to avoid artifacts when bouncing between CPUs). I haven't done any real statistics here -- I just ran them in a loop and picked the fastest results that didn't look like outliers. Unpatched means commit a4eb8b99, so all the bookkeeping overhead is gone. MADV_DONTNEED; touch the page; switch CPUs using sched_setaffinity. In an unpatched kernel, MADV_DONTNEED will send an IPI to the previous CPU. This is intended to be a nearly worst-case test. patched: 13.4µs unpatched: 21.6µs Vitaly's pthread_mmap microbenchmark with 8 threads (on four cores), nrounds = 100, 256M data patched: 1.1 seconds or so unpatched: 1.9 seconds or so The sleepup on Vitaly's test appearss to be because it spends a lot of time blocked on mmap_sem, and this patch avoids sending IPIs to blocked CPUs. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Reviewed-by: NNadav Amit <nadav.amit@gmail.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andrew Banman <abanman@sgi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Travis <travis@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/ddf2c92962339f4ba39d8fc41b853936ec0b44f1.1498751203.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 03 7月, 2017 1 次提交
-
-
由 Gustavo A. R. Silva 提交于
Remove unnecessary variable mfn in function xen_foreach_remap_area() and, refactor the code. Variable mfn at line 518:mfn = xen_remap_buf.mfns[i]; is only being used to store a value to be passed as an argument to the xen_update_mem_tables() function. This value can be passed directly, which makes variable mfn unnecessary. Also, value assigned to variable mfn at line 534:mfn = xen_remap_mfn; is never used. Addresses-Coverity-ID: 1260110 Signed-off-by: NGustavo A. R. Silva <garsilva@embeddedor.com> Reviewed-by: NJuergen Gross <jgross@suse.com> Signed-off-by: NJuergen Gross <jgross@suse.com>
-
- 30 6月, 2017 1 次提交
-
-
由 Josh Poimboeuf 提交于
In preparation for an objtool rewrite which will have broader checks, whitelist functions and files which cause problems because they do unusual things with the stack. These whitelists serve as a TODO list for which functions and files don't yet have undwarf unwinder coverage. Eventually most of the whitelists can be removed in favor of manual CFI hint annotations or objtool improvements. Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: live-patching@vger.kernel.org Link: http://lkml.kernel.org/r/7f934a5d707a574bda33ea282e9478e627fb1829.1498659915.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 25 6月, 2017 1 次提交
-
-
由 Juergen Gross 提交于
In a HVM guest the kernel allocates the page for mapping the shared info structure via extend_brk() today. This will lead to a drop of performance as the underlying EPT entry will have to be split up into 4kB entries as the single shared info page is located in hypervisor memory. The issue has been detected by using the libmicro munmap test: unmapping 8kB of memory was faster by nearly a factor of two when no pv interfaces were active in the HVM guest. So instead of taking a page from memory which might be mapped via large EPT entries use a page which is already mapped via a 4kB EPT entry: we can take a page from the first 1MB of memory as the video memory at 640kB disallows using larger EPT entries. Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NJuergen Gross <jgross@suse.com>
-
- 23 6月, 2017 2 次提交
-
-
由 Daniel Kiper 提交于
The current approach, which is the wholesale efi struct initialization from a 'efi_xen' local template is not robust. Usually if new member is defined then it is properly initialized in drivers/firmware/efi/efi.c, but not in arch/x86/xen/efi.c. The effect is that the Xen initialization clears any fields the generic code might have set and the Xen code does not know about yet. I saw this happen a few times, so let's initialize only the EFI struct members used by Xen and maintain no local duplicate, to avoid such issues in the future. Signed-off-by: NDaniel Kiper <daniel.kiper@oracle.com> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: andrew.cooper3@citrix.com Cc: jgross@suse.com Cc: linux-efi@vger.kernel.org Cc: matt@codeblueprint.co.uk Cc: stable@vger.kernel.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1498128697-12943-3-git-send-email-daniel.kiper@oracle.com [ Clarified the changelog. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
All implementations of apic->cpu_mask_to_apicid_and() and the two incoming cpumasks to search for the target. Move that operation to the call site and rename it to cpu_mask_to_apicid() Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Keith Busch <keith.busch@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20170619235446.641575516@linutronix.de
-
- 20 6月, 2017 1 次提交
-
-
由 Christoph Hellwig 提交于
ARM and x86 had duplicated versions of the dma_ops structure, the only difference is that x86 hasn't wired up the set_dma_mask, mmap, and get_sgtable ops yet. On x86 all of them are identical to the generic version, so they aren't needed but harmless. All the symbols used only for xen_swiotlb_dma_ops can now be marked static as well. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 13 6月, 2017 7 次提交
-
-
由 Ankur Arora 提交于
On PVH, PVHVM, at failure in the VCPUOP_register_vcpu_info hypercall we limit the number of cpus to to MAX_VIRT_CPUS. However, if this failure had occurred for a cpu beyond MAX_VIRT_CPUS, we continue to function with > MAX_VIRT_CPUS. This leads to problems at the next save/restore cycle when there are > MAX_VIRT_CPUS threads going into stop_machine() but coming back up there's valid state for only the first MAX_VIRT_CPUS. This patch pulls the excess CPUs down via cpu_down(). Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NAnkur Arora <ankur.a.arora@oracle.com> Signed-off-by: NJuergen Gross <jgross@suse.com>
-
由 Ankur Arora 提交于
The hypercall VCPUOP_register_vcpu_info can fail. This failure is handled by making per_cpu(xen_vcpu, cpu) point to its shared_info slot and those without one (cpu >= MAX_VIRT_CPUS) be NULL. For PVH/PVHVM, this is not enough, because we also need to pull these VCPUs out of circulation. Fix for PVH/PVHVM: on registration failure in the cpuhp prepare callback (xen_cpu_up_prepare_hvm()), return an error to the cpuhp state-machine so it can fail the CPU init. Fix for PV: the registration happens before smp_init(), so, in the failure case we clamp setup_max_cpus and limit the number of VCPUs that smp_init() will bring-up to MAX_VIRT_CPUS. This is functionally correct but it makes the code a bit simpler if we get rid of this explicit clamping: for VCPUs that don't have valid xen_vcpu, fail the CPU init in the cpuhp prepare callback (xen_cpu_up_prepare_pv()). Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NAnkur Arora <ankur.a.arora@oracle.com> Signed-off-by: NJuergen Gross <jgross@suse.com>
-
由 Ankur Arora 提交于
If CONFIG_SMP is disabled, xen_setup_vcpu_info_placement() is called from xen_setup_shared_info(). This is fine as far as boot goes, but it means that we also call it in the restore path. This results in an OOPS because we assign to pv_mmu_ops.read_cr2 which is __ro_after_init. Also, though less problematically, this means we call xen_vcpu_setup() twice at restore -- once from the vcpu info placement call and the second time from xen_vcpu_restore(). Fix by calling xen_setup_vcpu_info_placement() at boot only. Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NAnkur Arora <ankur.a.arora@oracle.com> Signed-off-by: NJuergen Gross <jgross@suse.com>
-
由 Ankur Arora 提交于
When Xen restores a PVHVM or PVH guest, its shared_info only holds up to 32 CPUs. The hypercall VCPUOP_register_vcpu_info allows us to setup per-page areas for VCPUs. This means we can boot PVH* guests with more than 32 VCPUs. During restore the per-cpu structure is allocated freshly by the hypervisor (vcpu_info_mfn is set to INVALID_MFN) so that the newly restored guest can make a VCPUOP_register_vcpu_info hypercall. However, we end up triggering this condition in Xen: /* Run this command on yourself or on other offline VCPUS. */ if ( (v != current) && !test_bit(_VPF_down, &v->pause_flags) ) which means we are unable to setup the per-cpu VCPU structures for running VCPUS. The Linux PV code paths makes this work by iterating over cpu_possible in xen_vcpu_restore() with: 1) is target CPU up (VCPUOP_is_up hypercall?) 2) if yes, then VCPUOP_down to pause it 3) VCPUOP_register_vcpu_info 4) if it was down, then VCPUOP_up to bring it back up With Xen commit 192df6f9122d ("xen/x86: allow HVM guests to use hypercalls to bring up vCPUs") this is available for non-PV guests. As such first check if VCPUOP_is_up is actually possible before trying this dance. As most of this dance code is done already in xen_vcpu_restore() let's make it callable on PV, PVH and PVHVM. Based-on-patch-by: NKonrad Wilk <konrad.wilk@oracle.com> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NAnkur Arora <ankur.a.arora@oracle.com> Signed-off-by: NJuergen Gross <jgross@suse.com>
-
由 Ankur Arora 提交于
Largely mechanical changes to aid unification of xen_vcpu_restore() logic for PV, PVH and PVHVM. xen_vcpu_setup(): the only change in logic is that clamp_max_cpus() is now handled inside the "if (!xen_have_vcpu_info_placement)" block. xen_vcpu_restore(): code movement from enlighten_pv.c to enlighten.c. xen_vcpu_info_reset(): pulls together all the code where xen_vcpu is set to default. Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NAnkur Arora <ankur.a.arora@oracle.com> Signed-off-by: NJuergen Gross <jgross@suse.com>
-
由 Kirill A. Shutemov 提交于
With CONFIG_X86_5LEVEL=y, level 4 is no longer top level of page tables. Let's give these variable more generic names: init_top_pgt and early_top_pgt. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: NJuergen Gross <jgross@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170606113133.22974-9-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andy Lutomirski 提交于
The kernel has several code paths that read CR3. Most of them assume that CR3 contains the PGD's physical address, whereas some of them awkwardly use PHYSICAL_PAGE_MASK to mask off low bits. Add explicit mask macros for CR3 and convert all of the CR3 readers. This will keep them from breaking when PCID is enabled. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: xen-devel <xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/883f8fb121f4616c1c1427ad87350bb2f5ffeca1.1497288170.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 05 6月, 2017 1 次提交
-
-
由 Andy Lutomirski 提交于
Lazy TLB state is currently managed in a rather baroque manner. AFAICT, there are three possible states: - Non-lazy. This means that we're running a user thread or a kernel thread that has called use_mm(). current->mm == current->active_mm == cpu_tlbstate.active_mm and cpu_tlbstate.state == TLBSTATE_OK. - Lazy with user mm. We're running a kernel thread without an mm and we're borrowing an mm_struct. We have current->mm == NULL, current->active_mm == cpu_tlbstate.active_mm, cpu_tlbstate.state != TLBSTATE_OK (i.e. TLBSTATE_LAZY or 0). The current cpu is set in mm_cpumask(current->active_mm). CR3 points to current->active_mm->pgd. The TLB is up to date. - Lazy with init_mm. This happens when we call leave_mm(). We have current->mm == NULL, current->active_mm == cpu_tlbstate.active_mm, but that mm is only relelvant insofar as the scheduler is tracking it for refcounting. cpu_tlbstate.state != TLBSTATE_OK. The current cpu is clear in mm_cpumask(current->active_mm). CR3 points to swapper_pg_dir, i.e. init_mm->pgd. This patch simplifies the situation. Other than perf, x86 stops caring about current->active_mm at all. We have cpu_tlbstate.loaded_mm pointing to the mm that CR3 references. The TLB is always up to date for that mm. leave_mm() just switches us to init_mm. There are no longer any special cases for mm_cpumask, and switch_mm() switches mms without worrying about laziness. After this patch, cpu_tlbstate.state serves only to tell the TLB flush code whether it may switch to init_mm instead of doing a normal flush. This makes fairly extensive changes to xen_exit_mmap(), which used to look a bit like black magic. Perf is unchanged. With or without this change, perf may behave a bit erratically if it tries to read user memory in kernel thread context. We should build on this patch to teach perf to never look at user memory when cpu_tlbstate.loaded_mm != current->mm. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-