- 02 4月, 2016 2 次提交
-
-
由 Nadav Amit 提交于
The recently introduced batched invalidations mechanism uses its own mechanism for shootdown. However, it does wrong accounting of interrupts (e.g., inc_irq_stat is called for local invalidations), trace-points (e.g., TLB_REMOTE_SHOOTDOWN for local invalidations) and may break some platforms as it bypasses the invalidation mechanisms of Xen and SGI UV. This patch reuses the existing TLB flushing mechnaisms instead. We use NULL as mm to indicate a global invalidation is required. Fixes 72b252ae ("mm: send one IPI per CPU to TLB flush all entries after unmapping pages") Signed-off-by: NNadav Amit <namit@vmware.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nadav Amit 提交于
TLB_REMOTE_SEND_IPI was recently introduced, but it counts bytes instead of pages. In addition, it does not report correctly the case in which flush_tlb_page flushes a page. Fix it to be consistent with other TLB counters. Fixes: 5b74283a ("x86, mm: trace when an IPI is about to be sent") Signed-off-by: NNadav Amit <namit@vmware.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 31 3月, 2016 1 次提交
-
-
由 Peter Zijlstra 提交于
Patch 5a50f529 ("perf/x86/ibs: Fix race with IBS_STARTING state") closed a big hole while opening another, smaller hole. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: 5a50f529 ("perf/x86/ibs: Fix race with IBS_STARTING state") Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 29 3月, 2016 7 次提交
-
-
由 Huang Rui 提交于
Bit 11 of CPUID 8000_0007 edx is processor feedback interface. Bit 12 of CPUID 8000_0007 edx is accumulated power. Print proper names in proc/cpuinfo Reported-and-tested-by: NBorislav Petkov <bp@alien8.de> Signed-off-by: NHuang Rui <ray.huang@amd.com> Cc: Tony Li <tony.li@amd.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Sherry Hurwitz <sherry.hurwitz@amd.com> Cc: Borislav Petkov <bp@suse.de> Cc: "Len Brown" <lenb@kernel.org> Link: http://lkml.kernel.org/r/1458871720-3209-1-git-send-email-ray.huang@amd.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Borislav Petkov 提交于
We have #ifndef __ASSEMBLY__ ... #endif #ifndef __ASSEMBLY__ ... #endif Merge the two. No functionality change. Signed-off-by: NBorislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1459189217-25532-1-git-send-email-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Vladimir Zapolskiy 提交于
The list of CPU model specific registers contains two copies of TDP registers, remove the one, which is out of numerical order in the list. Fixes: 6a35fc2d ("cpufreq: intel_pstate: get P1 from TAR when available") Signed-off-by: NVladimir Zapolskiy <vladimir_zapolskiy@mentor.com> Cc: Len Brown <len.brown@intel.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Kristen Carlson Accardi <kristen@linux.intel.com> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: http://lkml.kernel.org/r/1459018020-24577-1-git-send-email-vladimir_zapolskiy@mentor.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Borislav Petkov 提交于
It is cpu_core_id anyway. Signed-off-by: NBorislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1458917557-8757-3-git-send-email-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Peter Zijlstra 提交于
Avoid allocating the AMD NB event constraints data structure when not needed. This gets rid of x86_max_cores usage and avoids allocating this on AMD Core Perfctr supporting hardware (which has separate MSRs for NB events). Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: aherrmann@suse.com Cc: Rui Huang <ray.huang@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: jencce.kernel@gmail.com Link: http://lkml.kernel.org/r/20160320124629.GY6375@twins.programming.kicks-ass.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Peter Zijlstra 提交于
It turns out AMD gets x86_max_cores wrong when there are compute units. The issue is that Linux assumes: nr_logical_cpus = nr_cores * nr_siblings But AMD reports its CU unit as 2 cores, but then sets num_smp_siblings to 2 as well. Boris: fixup ras/mce_amd_inj.c too, to compute the Node Base Core properly, according to the new nomenclature. Fixes: 1f12e32f ("x86/topology: Create logical package id") Reported-by: NXiong Zhou <jencce.kernel@gmail.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Andreas Herrmann <aherrmann@suse.com> Cc: Andy Lutomirski <luto@kernel.org> Link: http://lkml.kernel.org/r/20160317095220.GO6344@twins.programming.kicks-ass.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Dan Williams 提交于
Update the definition of memcpy_from_pmem() to return 0 or a negative error code. Implement x86/arch_memcpy_from_pmem() with memcpy_mcsafe(). Cc: Borislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: NIngo Molnar <mingo@kernel.org> Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 26 3月, 2016 3 次提交
-
-
由 Srinivas Pandruvada 提交于
There are several reports of freeze on enabling HWP (Hardware PStates) feature on Skylake-based systems by the Intel P-states driver. The root cause is identified as the HWP interrupts causing BIOS code to freeze. HWP interrupts use the thermal LVT which can be handled by Linux natively, but on the affected Skylake-based systems SMM will respond to it by default. This is a problem for several reasons: - On the affected systems the SMM thermal LVT handler is broken (it will crash when invoked) and a BIOS update is necessary to fix it. - With thermal interrupt handled in SMM we lose all of the reporting features of the arch/x86/kernel/cpu/mcheck/therm_throt driver. - Some thermal drivers like x86-package-temp depend on the thermal threshold interrupts signaled via the thermal LVT. - The HWP interrupts are useful for debugging and tuning performance (if the kernel can handle them). The native handling of thermal interrupts needs to be enabled because of that. This requires some way to tell SMM that the OS can handle thermal interrupts. That can be done by using _OSC/_PDC in processor scope very early during ACPI initialization. The meaning of _OSC/_PDC bit 12 in processor scope is whether or not the OS supports native handling of interrupts for Collaborative Processor Performance Control (CPPC) notifications. Since on HWP-capable systems CPPC is a firmware interface to HWP, setting this bit effectively tells the firmware that the OS will handle thermal interrupts natively going forward. For details on _OSC/_PDC refer to: http://www.intel.com/content/www/us/en/standards/processor-vendor-specific-acpi-specification.html To implement the _OSC/_PDC handshake as described, introduce a new function, acpi_early_processor_osc(), that walks the ACPI namespace looking for ACPI processor objects and invokes _OSC for them with bit 12 in the capabilities buffer set and terminates the namespace walk on the first success. Also modify intel_thermal_interrupt() to clear HWP status bits in the HWP_STATUS MSR to acknowledge HWP interrupts (which prevents them from firing continuously). Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Subject & changelog, function rename ] Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Alexander Potapenko 提交于
Implement the stack depot and provide CONFIG_STACKDEPOT. Stack depot will allow KASAN store allocation/deallocation stack traces for memory chunks. The stack traces are stored in a hash table and referenced by handles which reside in the kasan_alloc_meta and kasan_free_meta structures in the allocated memory chunks. IRQ stack traces are cut below the IRQ entry point to avoid unnecessary duplication. Right now stackdepot support is only enabled in SLAB allocator. Once KASAN features in SLAB are on par with those in SLUB we can switch SLUB to stackdepot as well, thus removing the dependency on SLUB stack bookkeeping, which wastes a lot of memory. This patch is based on the "mm: kasan: stack depots" patch originally prepared by Dmitry Chernenkov. Joonsoo has said that he plans to reuse the stackdepot code for the mm/page_owner.c debugging facility. [akpm@linux-foundation.org: s/depot_stack_handle/depot_stack_handle_t] [aryabinin@virtuozzo.com: comment style fixes] Signed-off-by: NAlexander Potapenko <glider@google.com> Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrey Konovalov <adech.fo@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Potapenko 提交于
KASAN needs to know whether the allocation happens in an IRQ handler. This lets us strip everything below the IRQ entry point to reduce the number of unique stack traces needed to be stored. Move the definition of __irq_entry to <linux/interrupt.h> so that the users don't need to pull in <linux/ftrace.h>. Also introduce the __softirq_entry macro which is similar to __irq_entry, but puts the corresponding functions to the .softirqentry.text section. Signed-off-by: NAlexander Potapenko <glider@google.com> Acked-by: NSteven Rostedt <rostedt@goodmis.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrey Konovalov <adech.fo@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 3月, 2016 1 次提交
-
-
由 Huang Rui 提交于
randconfig builds can sometimes disable CONFIG_CPU_SUP_INTEL while enabling the AMD power reporting PMU driver, resulting in this build failure: arch/x86/kernel/cpu/perf_event.h:663:31: error: 'events_sysfs_show' undeclared here (not in a function) To fix it, move events_sysfs_show() outside of #ifdef CONFIG_CPU_SUP_INTEL. Reported-by: NRandy Dunlap <rdunlap@infradead.org> Reported-by: Nbuild test robot <lkp@intel.com> Signed-off-by: NHuang Rui <ray.huang@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sherry Hurwitz <sherry.hurwitz@amd.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: kbuild-all@01.org Cc: linux-next@vger.kernel.org Cc: spg_linux_kernel@amd.com Link: http://lkml.kernel.org/r/1458875905-4278-1-git-send-email-ray.huang@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 23 3月, 2016 5 次提交
-
-
由 Prarit Bhargava 提交于
After e76b027e ("x86,vdso: Use LSL unconditionally for vgetcpu") native_read_tscp() is unused in the kernel. The function can be removed like native_read_tsc() was. Signed-off-by: NPrarit Bhargava <prarit@redhat.com> Acked-by: NAndy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1458687968-9106-1-git-send-email-prarit@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Ard Biesheuvel 提交于
Replace the arch specific versions of search_extable() and sort_extable() with calls to the generic ones, which now support relative exception tables as well. Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: NH. Peter Anvin <hpa@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dmitry Vyukov 提交于
kcov provides code coverage collection for coverage-guided fuzzing (randomized testing). Coverage-guided fuzzing is a testing technique that uses coverage feedback to determine new interesting inputs to a system. A notable user-space example is AFL (http://lcamtuf.coredump.cx/afl/). However, this technique is not widely used for kernel testing due to missing compiler and kernel support. kcov does not aim to collect as much coverage as possible. It aims to collect more or less stable coverage that is function of syscall inputs. To achieve this goal it does not collect coverage in soft/hard interrupts and instrumentation of some inherently non-deterministic or non-interesting parts of kernel is disbled (e.g. scheduler, locking). Currently there is a single coverage collection mode (tracing), but the API anticipates additional collection modes. Initially I also implemented a second mode which exposes coverage in a fixed-size hash table of counters (what Quentin used in his original patch). I've dropped the second mode for simplicity. This patch adds the necessary support on kernel side. The complimentary compiler support was added in gcc revision 231296. We've used this support to build syzkaller system call fuzzer, which has found 90 kernel bugs in just 2 months: https://github.com/google/syzkaller/wiki/Found-Bugs We've also found 30+ bugs in our internal systems with syzkaller. Another (yet unexplored) direction where kcov coverage would greatly help is more traditional "blob mutation". For example, mounting a random blob as a filesystem, or receiving a random blob over wire. Why not gcov. Typical fuzzing loop looks as follows: (1) reset coverage, (2) execute a bit of code, (3) collect coverage, repeat. A typical coverage can be just a dozen of basic blocks (e.g. an invalid input). In such context gcov becomes prohibitively expensive as reset/collect coverage steps depend on total number of basic blocks/edges in program (in case of kernel it is about 2M). Cost of kcov depends only on number of executed basic blocks/edges. On top of that, kernel requires per-thread coverage because there are always background threads and unrelated processes that also produce coverage. With inlined gcov instrumentation per-thread coverage is not possible. kcov exposes kernel PCs and control flow to user-space which is insecure. But debugfs should not be mapped as user accessible. Based on a patch by Quentin Casasnovas. [akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode'] [akpm@linux-foundation.org: unbreak allmodconfig] [akpm@linux-foundation.org: follow x86 Makefile layout standards] Signed-off-by: NDmitry Vyukov <dvyukov@google.com> Reviewed-by: NKees Cook <keescook@chromium.org> Cc: syzkaller <syzkaller@googlegroups.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Tavis Ormandy <taviso@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Kees Cook <keescook@google.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: David Drysdale <drysdale@google.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andy Lutomirski 提交于
x86's is_compat_task always checked the current syscall type, not the task type. It has no non-arch users any more, so just remove it to avoid confusion. On x86, nothing should really be checking the task ABI. There are legitimate users for the syscall ABI and for the mm ABI. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Paolo Bonzini 提交于
This happens when doing the reboot test from virt-tests: [ 131.833653] BUG: unable to handle kernel NULL pointer dereference at (null) [ 131.842461] IP: [<ffffffffa0950087>] kvm_page_track_is_active+0x17/0x60 [kvm] [ 131.850500] PGD 0 [ 131.852763] Oops: 0000 [#1] SMP [ 132.007188] task: ffff880075fbc500 ti: ffff880850a3c000 task.ti: ffff880850a3c000 [ 132.138891] Call Trace: [ 132.141639] [<ffffffffa092bd11>] page_fault_handle_page_track+0x31/0x40 [kvm] [ 132.149732] [<ffffffffa093380f>] paging64_page_fault+0xff/0x910 [kvm] [ 132.172159] [<ffffffffa092c734>] kvm_mmu_page_fault+0x64/0x110 [kvm] [ 132.179372] [<ffffffffa06743c2>] handle_exception+0x1b2/0x430 [kvm_intel] [ 132.187072] [<ffffffffa067a301>] vmx_handle_exit+0x1e1/0xc50 [kvm_intel] ... Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Fixes: 3d0c27adSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 22 3月, 2016 17 次提交
-
-
由 Rik van Riel 提交于
The async pagefault wake code can run from the idle task in exception context, so everything here needs to be made non-preemptible. Conversion to a simple wait queue and raw spinlock does the trick. Signed-off-by: NRik van Riel <riel@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Lan Tianyu 提交于
The barrier also orders the write to mode from any reads to the page tables done and so update the comment. Signed-off-by: NLan Tianyu <tianyu.lan@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Lan Tianyu 提交于
Update spte before increasing tlbs_dirty to make sure no tlb flush in lost after spte is zapped. This pairs with the barrier in the kvm_flush_remote_tlbs(). Signed-off-by: NLan Tianyu <tianyu.lan@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Lan Tianyu 提交于
Signed-off-by: NLan Tianyu <tianyu.lan@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Lan Tianyu 提交于
There is already a barrier inside of kvm_flush_remote_tlbs() which can help to make sure everyone sees our modifications to the page tables and see changes to vcpu->mode here. So remove the smp_mb in the kvm_mmu_commit_zap_page() and update the comment. Signed-off-by: NLan Tianyu <tianyu.lan@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Huaitong Han 提交于
X86_FEATURE_PKU is referred to as "PKU" in the hardware documentation: CPUID.7.0.ECX[3]:PKU. X86_FEATURE_OSPKE is software support for pkeys, enumerated with CPUID.7.0.ECX[4]:OSPKE, and it reflects the setting of CR4.PKE(bit 22). This patch disables CPUID:PKU without ept, because pkeys is not yet implemented for shadow paging. Signed-off-by: NHuaitong Han <huaitong.han@intel.com> Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Huaitong Han 提交于
Protection keys define a new 4-bit protection key field (PKEY) in bits 62:59 of leaf entries of the page tables, the PKEY is an index to PKRU register(16 domains), every domain has 2 bits(write disable bit, access disable bit). Static logic has been produced in update_pkru_bitmask, dynamic logic need read pkey from page table entries, get pkru value, and deduce the correct result. [ Huaitong: Xiao helps to modify many sections. ] Signed-off-by: NHuaitong Han <huaitong.han@intel.com> Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Huaitong Han 提交于
PKEYS defines a new status bit in the PFEC. PFEC.PK (bit 5), if some conditions is true, the fault is considered as a PKU violation. pkru_mask indicates if we need to check PKRU.ADi and PKRU.WDi, and does cache some conditions for permission_fault. [ Huaitong: Xiao helps to modify many sections. ] Signed-off-by: NHuaitong Han <huaitong.han@intel.com> Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Xiao Guangrong 提交于
Currently XSAVE state of host is not restored after VM-exit and PKRU is managed by XSAVE so the PKRU from guest is still controlling the memory access even if the CPU is running the code of host. This is not safe as KVM needs to access the memory of userspace (e,g QEMU) to do some emulation. So we save/restore PKRU when guest/host switches. Signed-off-by: NHuaitong Han <huaitong.han@intel.com> Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Xiao Guangrong 提交于
KVM will use it to switch pkru between guest and host. CC: Ingo Molnar <mingo@redhat.com> CC: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: NHuaitong Han <huaitong.han@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Huaitong Han 提交于
This patch adds pkeys support for xsave state. Signed-off-by: NHuaitong Han <huaitong.han@intel.com> Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Huaitong Han 提交于
Pkeys is disabled if CPU is in non-paging mode in hardware. However KVM always uses paging mode to emulate guest non-paging, mode with TDP. To emulate this behavior, pkeys needs to be manually disabled when guest switches to non-paging mode. Signed-off-by: NHuaitong Han <huaitong.han@intel.com> Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Huaitong Han 提交于
This patch removes magic number with enum cpuid_leafs. Signed-off-by: NHuaitong Han <huaitong.han@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
This will help in the implementation of PKRU, where the PK bit of the page fault error code cannot be computed in advance (unlike I/D, R/W and U/S). Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Old KVM guests invoke single-context invvpid without actually checking whether it is supported. This was fixed by commit 518c8aee ("KVM: VMX: Make sure single type invvpid is supported before issuing invvpid instruction", 2010-08-01) and the patch after, but pre-2.6.36 kernels lack it including RHEL 6. Reported-by: jmontleo@redhat.com Tested-by: jmontleo@redhat.com Cc: stable@vger.kernel.org Fixes: 99b83ac8Reviewed-by: NDavid Matlack <dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
A guest executing an invalid invvpid instruction would hang because the instruction pointer was not updated. Reported-by: jmontleo@redhat.com Tested-by: jmontleo@redhat.com Cc: stable@vger.kernel.org Fixes: 99b83ac8Reviewed-by: NDavid Matlack <dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
A guest executing an invalid invept instruction would hang because the instruction pointer was not updated. Cc: stable@vger.kernel.org Fixes: bfd0a56bReviewed-by: NDavid Matlack <dmatlack@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 21 3月, 2016 4 次提交
-
-
由 Srinivas Pandruvada 提交于
Added Broadwell-H and Broadwell-Server. Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: bp@alien8.de Link: http://lkml.kernel.org/r/1458517938-25308-1-git-send-email-srinivas.pandruvada@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kan Liang 提交于
The ev_sel_ext in PCU_MSR_PMON_CTL is locked on some CPU models, so despite it being documented in the SDM, if we write 1 to that bit then we can get a #GP fault. Which #GP the perf fuzzer happily triggered in Peter Zijlstra's testing. Also, there are no public events which use that bit, so remove ev_sel_ext bit support for PCU. Reported-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NKan Liang <kan.liang@intel.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1458500301-3594-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Huang Rui 提交于
Introduce an AMD accumlated power reporting mechanism for the Family 15h, Model 60h processor that can be used to calculate the average power consumed by a processor during a measurement interval. The feature support is indicated by CPUID Fn8000_0007_EDX[12]. This feature will be implemented both in hwmon and perf. The current design provides one event to report per package/processor power consumption by counting each compute unit power value. Here the gory details of how the computation is done: * Tsample: compute unit power accumulator sample period * Tref: the PTSC counter period (PTSC: performance timestamp counter) * N: the ratio of compute unit power accumulator sample period to the PTSC period * Jmax: max compute unit accumulated power which is indicated by MSR_C001007b[MaxCpuSwPwrAcc] * Jx/Jy: compute unit accumulated power which is indicated by MSR_C001007a[CpuSwPwrAcc] * Tx/Ty: the value of performance timestamp counter which is indicated by CU_PTSC MSR_C0010280[PTSC] * PwrCPUave: CPU average power i. Determine the ratio of Tsample to Tref by executing CPUID Fn8000_0007. N = value of CPUID Fn8000_0007_ECX[CpuPwrSampleTimeRatio[15:0]]. ii. Read the full range of the cumulative energy value from the new MSR MaxCpuSwPwrAcc. Jmax = value returned. iii. At time x, software reads CpuSwPwrAcc and samples the PTSC. Jx = value read from CpuSwPwrAcc and Tx = value read from PTSC. iv. At time y, software reads CpuSwPwrAcc and samples the PTSC. Jy = value read from CpuSwPwrAcc and Ty = value read from PTSC. v. Calculate the average power consumption for a compute unit over time period (y-x). Unit of result is uWatt: if (Jy < Jx) // Rollover has occurred Jdelta = (Jy + Jmax) - Jx else Jdelta = Jy - Jx PwrCPUave = N * Jdelta * 1000 / (Ty - Tx) Simple example: root@hr-zp:/home/ray/tip# ./tools/perf/perf stat -a -e 'power/power-pkg/' make -j4 CHK include/config/kernel.release CHK include/generated/uapi/linux/version.h CHK include/generated/utsrelease.h CHK include/generated/timeconst.h CHK include/generated/bounds.h CHK include/generated/asm-offsets.h CALL scripts/checksyscalls.sh CHK include/generated/compile.h SKIPPED include/generated/compile.h Building modules, stage 2. Kernel: arch/x86/boot/bzImage is ready (#40) MODPOST 4225 modules Performance counter stats for 'system wide': 183.44 mWatts power/power-pkg/ 341.837270111 seconds time elapsed root@hr-zp:/home/ray/tip# ./tools/perf/perf stat -a -e 'power/power-pkg/' sleep 10 Performance counter stats for 'system wide': 0.18 mWatts power/power-pkg/ 10.012551815 seconds time elapsed Suggested-by: NPeter Zijlstra <peterz@infradead.org> Suggested-by: NIngo Molnar <mingo@kernel.org> Suggested-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: jacob.w.shin@gmail.com Link: http://lkml.kernel.org/r/1457502306-2559-1-git-send-email-ray.huang@amd.com [ Fixed the modular build. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Huang Rui 提交于
AMD CPU family 15h model 0x60 introduces a mechanism for measuring accumulated power. It is used to report the processor power consumption and support for it is indicated by CPUID Fn8000_0007_EDX[12]. Signed-off-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andreas Herrmann <herrmann.der.user@googlemail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Jacob Shin <jacob.w.shin@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Kristen Carlson Accardi <kristen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <rric@kernel.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Wan Zongshun <Vincent.Wan@amd.com> Cc: spg_linux_kernel@amd.com Link: http://lkml.kernel.org/r/1452739808-11871-4-git-send-email-ray.huang@amd.com [ Resolved conflict and moved the synthetic CPUID slot to 19. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-