- 09 9月, 2017 1 次提交
-
-
由 Naoya Horiguchi 提交于
_PAGE_PSE is used to distinguish between a truly non-present (_PAGE_PRESENT=0) PMD, and a PMD which is undergoing a THP split and should be treated as present. But _PAGE_SWP_SOFT_DIRTY currently uses the _PAGE_PSE bit, which would cause confusion between one of those PMDs undergoing a THP split, and a soft-dirty PMD. Dropping _PAGE_PSE check in pmd_present() does not work well, because it can hurt optimization of tlb handling in thp split. Thus, we need to move the bit. In the current kernel, bits 1-4 are not used in non-present format since commit 00839ee3 ("x86/mm: Move swap offset/type up in PTE to work around erratum"). So let's move _PAGE_SWP_SOFT_DIRTY to bit 1. Bit 7 is used as reserved (always clear), so please don't use it for other purpose. Link: http://lkml.kernel.org/r/20170717193955.20207-3-zi.yan@sent.comSigned-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: NZi Yan <zi.yan@cs.rutgers.edu> Acked-by: NDave Hansen <dave.hansen@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: David Nellans <dnellans@nvidia.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Minchan Kim <minchan@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 9月, 2017 4 次提交
-
-
由 Andy Lutomirski 提交于
While debugging a problem, I thought that using cr4_set_bits_and_update_boot() to restore CR4.PCIDE would be helpful. It turns out to be counterproductive. Add a comment documenting how this works. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andy Lutomirski 提交于
When Linux brings a CPU down and back up, it switches to init_mm and then loads swapper_pg_dir into CR3. With PCID enabled, this has the side effect of masking off the ASID bits in CR3. This can result in some confusion in the TLB handling code. If we bring a CPU down and back up with any ASID other than 0, we end up with the wrong ASID active on the CPU after resume. This could cause our internal state to become corrupt, although major corruption is unlikely because init_mm doesn't have any user pages. More obviously, if CONFIG_DEBUG_VM=y, we'll trip over an assertion in the next context switch. The result of *that* is a failure to resume from suspend with probability 1 - 1/6^(cpus-1). Fix it by reinitializing cpu_tlbstate on resume and CPU bringup. Reported-by: NLinus Torvalds <torvalds@linux-foundation.org> Reported-by: NJiri Kosina <jikos@kernel.org> Fixes: 10af6235 ("x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID") Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Rik van Riel 提交于
Patch series "mm,fork,security: introduce MADV_WIPEONFORK", v4. If a child process accesses memory that was MADV_WIPEONFORK, it will get zeroes. The address ranges are still valid, they are just empty. If a child process accesses memory that was MADV_DONTFORK, it will get a segmentation fault, since those address ranges are no longer valid in the child after fork. Since MADV_DONTFORK also seems to be used to allow very large programs to fork in systems with strict memory overcommit restrictions, changing the semantics of MADV_DONTFORK might break existing programs. The use case is libraries that store or cache information, and want to know that they need to regenerate it in the child process after fork. Examples of this would be: - systemd/pulseaudio API checks (fail after fork) (replacing a getpid check, which is too slow without a PID cache) - PKCS#11 API reinitialization check (mandated by specification) - glibc's upcoming PRNG (reseed after fork) - OpenSSL PRNG (reseed after fork) The security benefits of a forking server having a re-inialized PRNG in every child process are pretty obvious. However, due to libraries having all kinds of internal state, and programs getting compiled with many different versions of each library, it is unreasonable to expect calling programs to re-initialize everything manually after fork. A further complication is the proliferation of clone flags, programs bypassing glibc's functions to call clone directly, and programs calling unshare, causing the glibc pthread_atfork hook to not get called. It would be better to have the kernel take care of this automatically. The patchset also adds MADV_KEEPONFORK, to undo the effects of a prior MADV_WIPEONFORK. This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO: https://man.openbsd.org/minherit.2 This patch (of 2): MPX only seems to be available on 64 bit CPUs, starting with Skylake and Goldmont. Move VM_MPX into the 64 bit only portion of vma->vm_flags, in order to free up a VMA flag. Link: http://lkml.kernel.org/r/20170811212829.29186-2-riel@redhat.comSigned-off-by: NRik van Riel <riel@redhat.com> Acked-by: NDave Hansen <dave.hansen@intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Matthew Wilcox <willy@infradead.org> Cc: Colm MacCártaigh <colm@allcosts.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Kravetz 提交于
A non-default huge page size can be encoded in the flags argument of the mmap system call. The definitions for these encodings are in arch specific header files. However, all architectures use the same values. Consolidate all the definitions in the primary user header file (uapi/linux/mman.h). Include definitions for all known huge page sizes. Use the generic encoding definitions in hugetlb_encode.h as the basis for these definitions. Link: http://lkml.kernel.org/r/1501527386-10736-3-git-send-email-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 9月, 2017 5 次提交
-
-
由 Ingo Molnar 提交于
Andrei Vagin reported a CRIU regression and bisected it back to: 90f6225f ("x86/idt: Move IST stack based traps to table init") This table init conversion loses the system-gate property of X86_TRAP_BP and erroneously moves it from DPL3 to DPL0. Fix it. Reported-by: NAndrei Vagin <avagin@virtuozzo.com> Signed-off-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: dvlasenk@redhat.com Cc: linux-tip-commits@vger.kernel.org Cc: peterz@infradead.org Cc: brgerst@gmail.com Cc: rostedt@goodmis.org Cc: bp@alien8.de Cc: luto@kernel.org Cc: jpoimboe@redhat.com Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: torvalds@linux-foundation.org Cc: tip-bot for Jacob Shin <tipbot@zytor.com> Link: http://lkml.kernel.org/r/20170901082630.xvyi5bwk6etmppqc@gmail.com
-
由 Jérôme Glisse 提交于
Calls to mmu_notifier_invalidate_page() were replaced by calls to mmu_notifier_invalidate_range() and are now bracketed by calls to mmu_notifier_invalidate_range_start()/end() Remove now useless invalidate_page callback. Changed since v1 (Linus Torvalds) - remove now useless kvm_arch_mmu_notifier_invalidate_page() Signed-off-by: NJérôme Glisse <jglisse@redhat.com> Tested-by: NMike Galbraith <efault@gmx.de> Tested-by: NAdam Borowski <kilobyte@angband.pl> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Juergen Gross 提交于
When running as Xen pv-guest the exception frame on the stack contains %r11 and %rcx additional to the other data pushed by the processor. Instead of having a paravirt op being called for each exception type prepend the Xen specific code to each exception entry. When running as Xen pv-guest just use the exception entry with prepended instructions, otherwise use the entry without the Xen specific code. [ tglx: Merged through tip to avoid ugly merge conflict ] Signed-off-by: NJuergen Gross <jgross@suse.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: boris.ostrovsky@oracle.com Cc: luto@amacapital.net Link: http://lkml.kernel.org/r/20170831174249.26853-1-jg@pfupf.net
-
由 Thomas Gleixner 提交于
The seperation of the EISA init missed to include linux/io.h which breaks the build with some special configurations. Reported-by: NIngo Molnar <mingo@kernel.org> Fixes: f7eaf6e0 ("x86/boot: Move EISA setup to a separate file") Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Eric Dumazet 提交于
Saves 4 bytes replacing following instructions : lea rax, [rsi + rdx * 8 + offsetof(...)] mov rax, qword ptr [rax] cmp rax, 0 by : mov rax, [rsi + rdx * 8 + offsetof(...)] test rax, rax Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 8月, 2017 11 次提交
-
-
由 Jiri Slaby 提交于
Commit 87e81786 ("x86/idt: Move early IDT setup out of 32-bit asm") switched early_ignore_irq to use ENTRY. ENTRY aligns the code, so there is no need for one more ALIGN right before the function. And add one \n after the function to separate it from the data. Signed-off-by: NJiri Slaby <jslaby@suse.cz> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Link: http://lkml.kernel.org/r/20170831121653.28917-1-jslaby@suse.cz
-
由 Wei Liu 提交于
No functional change because MMU_NORMAL_PT_UPDATE is in fact 0. Set it to make the code consistent with similar code in mmu_pv.c Signed-off-by: NWei Liu <wei.liu2@citrix.com> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
-
由 Juergen Gross 提交于
The function xen_set_domain_pte() is used nowhere in the kernel. Remove it. Signed-off-by: NJuergen Gross <jgross@suse.com> Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
-
由 Juergen Gross 提交于
Remove the last tests for XENFEAT_auto_translated_physmap in pure PV-domain specific paths. PVH V1 is gone and the feature will always be "false" in PV guests. Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
-
由 Vitaly Kuznetsov 提交于
Add Hyper-V tracing subsystem and trace hyperv_mmu_flush_tlb_others(). Tracing is done the same way we do xen_mmu_flush_tlb_others(). Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NAndy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: NStephen Hemminger <sthemmin@microsoft.com> Reviewed-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Jork Loeser <Jork.Loeser@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Simon Xiao <sixiao@microsoft.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: devel@linuxdriverproject.org Link: http://lkml.kernel.org/r/20170802160921.21791-10-vkuznets@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Vitaly Kuznetsov 提交于
Hyper-V hosts may support more than 64 vCPUs, we need to use HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX/LIST_EX hypercalls in this case. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NAndy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: NStephen Hemminger <sthemmin@microsoft.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Jork Loeser <Jork.Loeser@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Simon Xiao <sixiao@microsoft.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: devel@linuxdriverproject.org Link: http://lkml.kernel.org/r/20170802160921.21791-9-vkuznets@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Naoya Horiguchi 提交于
x86/boot/KASLR: Work around firmware bugs by excluding EFI_BOOT_SERVICES_* and EFI_LOADER_* from KASLR's choice There's a potential bug in how we select the KASLR kernel address n the early boot code. The KASLR boot code currently chooses the kernel image's physical memory location from E820_TYPE_RAM regions by walking over all e820 entries. E820_TYPE_RAM includes EFI_BOOT_SERVICES_CODE and EFI_BOOT_SERVICES_DATA as well, so those regions can end up hosting the kernel image. According to the UEFI spec, all memory regions marked as EfiBootServicesCode and EfiBootServicesData are available as free memory after the first call to ExitBootServices(). I.e. so such regions should be usable for the kernel, per spec. In real life however, we have workarounds for broken x86 firmware, where we keep such regions reserved until SetVirtualAddressMap() is done. See the following code in should_map_region(): static bool should_map_region(efi_memory_desc_t *md) { ... /* * Map boot services regions as a workaround for buggy * firmware that accesses them even when they shouldn't. * * See efi_{reserve,free}_boot_services(). */ if (md->type =3D=3D EFI_BOOT_SERVICES_CODE || md->type =3D=3D EFI_BOOT_SERVICES_DATA) return false; This workaround suppressed a boot crash, but potential issues still remain because no one prevents the regions from overlapping with kernel image by KASLR. So let's make sure that EFI_BOOT_SERVICES_{CODE|DATA} regions are never chosen as kernel memory for the workaround to work fine. Furthermore, EFI_LOADER_{CODE|DATA} regions are also excluded because they can be used after ExitBootServices() as defined in EFI spec. As a result, we choose kernel address only from EFI_CONVENTIONAL_MEMORY which is the only memory type we know to be safely free. Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Baoquan He <bhe@redhat.com> Cc: Junichi Nomura <j-nomura@ce.jp.nec.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Garnier <thgarnie@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: fanc.fnst@cn.fujitsu.com Cc: izumi.taku@jp.fujitsu.com Link: http://lkml.kernel.org/r/20170828074444.GC23181@hori1.linux.bs1.fc.nec.co.jp [ Rewrote/fixed/clarified the changelog and the in code comments. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Hans de Goede 提交于
When booting 4.13 on a VirtualBox VM on a Skylake host the following error shows up in the logs: [ 0.000000] [Firmware Bug]: TSC_DEADLINE disabled due to Errata; please update microcode to version: 0xb2 (or later) This is caused by apic_check_deadline_errata() only checking CPU model and not the X86_FEATURE_TSC_DEADLINE_TIMER flag (which VirtualBox does NOT export to the guest), combined with VirtualBox not exporting the micro-code version to the guest. This commit adds a check for X86_FEATURE_TSC_DEADLINE_TIMER to apic_check_deadline_errata(), silencing this error on VirtualBox VMs. Signed-off-by: NHans de Goede <hdegoede@redhat.com> Acked-by: NThomas Gleixner <tglx@linutronix.de> Cc: Frank Mehnert <frank.mehnert@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Thayer <michael.thayer@oracle.com> Cc: Michal Necasek <michal.necasek@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: bd9240a1 ("x86/apic: Add TSC_DEADLINE quirk due to errata") Link: http://lkml.kernel.org/r/20170830105811.27539-1-hdegoede@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Vitaly Kuznetsov 提交于
There's a subtle bug in how some of the paravirt guest code handles page table freeing on x86: On x86 software page table walkers depend on the fact that remote TLB flush does an IPI: walk is performed lockless but with interrupts disabled and in case the page table is freed the freeing CPU will get blocked as remote TLB flush is required. On other architectures which don't require an IPI to do remote TLB flush we have an RCU-based mechanism (see include/asm-generic/tlb.h for more details). In virtualized environments we may want to override the ->flush_tlb_others callback in pv_mmu_ops and use a hypercall asking the hypervisor to do a remote TLB flush for us. This breaks the assumption about IPIs. Xen PV has been doing this for years and the upcoming remote TLB flush for Hyper-V will do it too. This is not safe, as software page table walkers may step on an already freed page. Fix the bug by enabling the RCU-based page table freeing mechanism, CONFIG_HAVE_RCU_TABLE_FREE=y. Testing with kernbench and mmap/munmap microbenchmarks, and neither showed any noticeable performance impact. Suggested-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Acked-by: NJuergen Gross <jgross@suse.com> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Jork Loeser <Jork.Loeser@microsoft.com> Cc: KY Srinivasan <kys@microsoft.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20170828082251.5562-1-vkuznets@redhat.com [ Rewrote/fixed/clarified the changelog. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jan Beulich 提交于
The lack of newlines in preceding format strings is a clear indication that these were meant to be continuations of one another, and indeed output ends up quite a bit more compact (and readable) that way. Switch other plain printk()-s in the function instances to pr_info(), as requested. Signed-off-by: NJan Beulich <jbeulich@suse.com> Acked-by: NThomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/59A7D72B0200007800175E4E@prv-mh.provo.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
Stephen reported a merge conflict with the XEN tree. That also shows that the IDT cleanup forgot to remove the now unused trace_{trap} defines. Remove them. Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com>
-
- 29 8月, 2017 19 次提交
-
-
由 Peter Zijlstra 提交于
Move the 'max_precise' capability into generic x86 code where it belongs. This fixes a sysfs splat on !Intel systems where we fail to set x86_pmu_caps_group.atts. Reported-and-tested-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NAndi Kleen <ak@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Fixes: 22688d1c20f5 ("x86/perf: Export some PMU attributes in caps/ directory") Link: http://lkml.kernel.org/r/20170828104650.2u3rsim4jafyjzv2@hirez.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kan Liang 提交于
For understanding how the workload maps to memory channels and hardware behavior, it's very important to collect address maps with physical addresses. For example, 3D XPoint access can only be found by filtering the physical address. Add a new sample type for physical address. perf already has a facility to collect data virtual address. This patch introduces a function to convert the virtual address to physical address. The function is quite generic and can be extended to any architecture as long as a virtual address is provided. - For kernel direct mapping addresses, virt_to_phys is used to convert the virtual addresses to physical address. - For user virtual addresses, __get_user_pages_fast is used to walk the pages tables for user physical address. - This does not work for vmalloc addresses right now. These are not resolved, but code to do that could be added. The new sample type requires collecting the virtual address. The virtual address will not be output unless SAMPLE_ADDR is applied. For security, the physical address can only be exposed to root or privileged user. Tested-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: NKan Liang <kan.liang@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: mpe@ellerman.id.au Link: http://lkml.kernel.org/r/1503967969-48278-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alexander Shishkin 提交于
I just noticed that hw.itrace_started and hw.config are aliased to the same location. Now, the PT driver happens to use both, which works out fine by sheer luck: - STORE(hw.itrace_start) is ordered before STORE(hw.config), in the program order, although there are no compiler barriers to ensure that, - to the perf_log_itrace_start() hw.itrace_start looks set at the same time as when it is intended to be set because both stores happen in the same path, - hw.config is never reset to zero in the PT driver. Now, the use of hw.config by the PT driver makes more sense (it being a HW PMU) than messing around with itrace_started, which is an awkward API to begin with. This patch replaces hw.itrace_started with an attach_state bit and an API call for the PMU drivers to use to communicate the condition. Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20170330153956.25994-1-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jan H. Schönherr 提交于
If a zero for the number of lines manages to slip through, scroll() may underflow some offset calculations, causing accesses outside the video memory. Make the check in __putstr() more pessimistic to prevent that. Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1503858223-14983-1-git-send-email-jschoenh@amazon.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jan H. Schönherr 提交于
The current slack space is not enough for LZ4, which has a worst case overhead of 0.4% for data that cannot be further compressed. With an LZ4 compressed kernel with an embedded initrd, the output is likely to overwrite the input. Increase the slack space to avoid that. Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1503842124-29718-1-git-send-email-jschoenh@amazon.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Colin Ian King 提交于
Don't populate arrays on the stack, instead make them static. Makes the object code smaller by 76 bytes: Before: text data bss dec hex filename 4217 1540 128 5885 16fd arch/x86/platform/intel-mid/pwr.o After: text data bss dec hex filename 3981 1700 128 5809 16b1 arch/x86/platform/intel-mid/pwr.o Signed-off-by: NColin Ian King <colin.king@canonical.com> Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lukas Wunner <lukas@wunner.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-janitors@vger.kernel.org Link: http://lkml.kernel.org/r/20170825163206.23250-1-colin.king@canonical.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jiri Slaby 提交于
ALIGN+GLOBAL is effectively what ENTRY() does, so use ENTRY() which is dedicated for exactly this purpose -- global functions. Note that stub32_clone() is a C-like leaf function -- it has a standard call frame -- it only switches one argument and continues by jumping into C. Since each ENTRY() should be balanced by some END*() marker, we add a corresponding ENDPROC() to stub32_clone() too. Besides that, x86's custom GLOBAL macro is going to die very soon. Signed-off-by: NJiri Slaby <jslaby@suse.cz> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170824080624.7768-2-jslaby@suse.czSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jiri Slaby 提交于
Functions in math-emu are annotated as ENTRY() symbols, but their ends are not annotated at all. But these are standard functions called from C, with proper stack register update etc. Omitting the ends means: * the annotations are not paired and we cannot deal with such functions e.g. in objtool * the symbols are not marked as functions in the object file * there are no sizes of the functions in the object file So fix this by adding ENDPROC() to each such case in math-emu. Signed-off-by: NJiri Slaby <jslaby@suse.cz> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170824080624.7768-1-jslaby@suse.czSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jiri Slaby 提交于
Similarly to the 32-bit code, efi_pe_entry body() is somehow squashed into startup_64(). In the old days, we forced startup_64() to start at offset 0x200 and efi_pe_entry() to start at 0x210. But this requirement was removed long time ago, in: 99f857db ("x86, build: Dynamically find entry points in compressed startup code") The way it is now makes the code less readable and illogical. Given we can now safely extract the inlined efi_pe_entry() body from startup_64() into a separate function, we do so. We also annotate the function appropriatelly by ENTRY+ENDPROC. ABI offsets are preserved: 0000000000000000 T startup_32 0000000000000200 T startup_64 0000000000000390 T efi64_stub_entry On the top-level, it looked like: .org 0x200 ENTRY(startup_64) #ifdef CONFIG_EFI_STUB ; start of inlined jmp preferred_addr GLOBAL(efi_pe_entry) ... ; a lot of assembly (efi_pe_entry) leaq preferred_addr(%rax), %rax jmp *%rax preferred_addr: #endif ; end of inlined ... ; a lot of assembly (startup_64) ENDPROC(startup_64) And it is now converted into: .org 0x200 ENTRY(startup_64) ... ; a lot of assembly (startup_64) ENDPROC(startup_64) #ifdef CONFIG_EFI_STUB ENTRY(efi_pe_entry) ... ; a lot of assembly (efi_pe_entry) leaq startup_64(%rax), %rax jmp *%rax ENDPROC(efi_pe_entry) #endif Signed-off-by: NJiri Slaby <jslaby@suse.cz> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: ard.biesheuvel@linaro.org Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20170824073327.4129-2-jslaby@suse.czSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jiri Slaby 提交于
The efi_pe_entry() body is somehow squashed into startup_32(). In the old days, we forced startup_32() to start at offset 0x00 and efi_pe_entry() to start at 0x10. But this requirement was removed long time ago, in: 99f857db ("x86, build: Dynamically find entry points in compressed startup code") The way it is now makes the code less readable and illogical. Given we can now safely extract the inlined efi_pe_entry() body from startup_32() into a separate function, we do so and we separate it to two functions as they are marked already: efi_pe_entry() + efi32_stub_entry(). We also annotate the functions appropriatelly by ENTRY+ENDPROC. ABI offset is preserved: 0000 128 FUNC GLOBAL DEFAULT 6 startup_32 0080 60 FUNC GLOBAL DEFAULT 6 efi_pe_entry 00bc 68 FUNC GLOBAL DEFAULT 6 efi32_stub_entry On the top-level, it looked like this: ENTRY(startup_32) #ifdef CONFIG_EFI_STUB ; start of inlined jmp preferred_addr ENTRY(efi_pe_entry) ... ; a lot of assembly (efi_pe_entry) ENTRY(efi32_stub_entry) ... ; a lot of assembly (efi32_stub_entry) leal preferred_addr(%eax), %eax jmp *%eax preferred_addr: #endif ; end of inlined ... ; a lot of assembly (startup_32) ENDPROC(startup_32) And it is now converted into: ENTRY(startup_32) ... ; a lot of assembly (startup_32) ENDPROC(startup_32) #ifdef CONFIG_EFI_STUB ENTRY(efi_pe_entry) ... ; a lot of assembly (efi_pe_entry) ENDPROC(efi_pe_entry) ENTRY(efi32_stub_entry) ... ; a lot of assembly (efi32_stub_entry) leal startup_32(%eax), %eax jmp *%eax ENDPROC(efi32_stub_entry) #endif Signed-off-by: NJiri Slaby <jslaby@suse.cz> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: ard.biesheuvel@linaro.org Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20170824073327.4129-1-jslaby@suse.czSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Ingo Molnar 提交于
Mike Galbraith bisected a boot crash back to the following commit: 7a46ec0e ("locking/refcounts, x86/asm: Implement fast refcount overflow protection") The crash/hang pattern is: > Symptom is a few splats as below, with box finally hanging. Network > comes up, but neither ssh nor console login is possible. > > ------------[ cut here ]------------ > WARNING: CPU: 4 PID: 0 at net/netlink/af_netlink.c:374 netlink_sock_destruct+0x82/0xa0 > ... > __sk_destruct() > rcu_process_callbacks() > __do_softirq() > irq_exit() > smp_apic_timer_interrupt() > apic_timer_interrupt() We are at -rc7 already, and the code has grown some dependencies, so instead of a plain revert disable the config temporarily, in the hope of getting real fixes. Reported-by: NMike Galbraith <efault@gmx.de> Tested-by: NMike Galbraith <efault@gmx.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Elena Reshetova <elena.reshetova@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/tip-7a46ec0e2f4850407de5e1d19a44edee6efa58ec@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
set_intr_gate() is an internal function of the IDT code. The only user left is the KVM code which replaces the pagefault handler eventually. Provide an explicit update_intr_gate() function and make set_intr_gate() static. While at it replace the magic number 14 in the KVM code with the proper trap define. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPaolo Bonzini <pbonzini@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064959.663008004@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
The only users of alloc_intr_gate() are hypervisors, which both check the used_vectors bitmap whether they have allocated the gate already. Move that check into alloc_intr_gate() and simplify the users. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NK. Y. Srinivasan <kys@microsoft.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064959.580830286@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
None of this is performance sensitive in any way - so debloat the kernel. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064959.502052875@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
The IDT related inlines are not longer used. Remove them. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064959.422083717@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
Move the gate intialization from interrupt init to the IDT code so all IDT related operations are at a single place. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064959.340209198@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
Replace the APIC/SMP vector gate initialization with the table based mechanism. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064959.260177013@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
Initialize the regular traps with a table. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064959.182128165@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
Initialize the IST based traps via a table. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170828064959.091328949@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-