- 05 9月, 2015 1 次提交
-
-
由 Mel Gorman 提交于
An IPI is sent to flush remote TLBs when a page is unmapped that was potentially accesssed by other CPUs. There are many circumstances where this happens but the obvious one is kswapd reclaiming pages belonging to a running process as kswapd and the task are likely running on separate CPUs. On small machines, this is not a significant problem but as machine gets larger with more cores and more memory, the cost of these IPIs can be high. This patch uses a simple structure that tracks CPUs that potentially have TLB entries for pages being unmapped. When the unmapping is complete, the full TLB is flushed on the assumption that a refill cost is lower than flushing individual entries. Architectures wishing to do this must give the following guarantee. If a clean page is unmapped and not immediately flushed, the architecture must guarantee that a write to that linear address from a CPU with a cached TLB entry will trap a page fault. This is essentially what the kernel already depends on but the window is much larger with this patch applied and is worth highlighting. The architecture should consider whether the cost of the full TLB flush is higher than sending an IPI to flush each individual entry. An additional architecture helper called flush_tlb_local is required. It's a trivial wrapper with some accounting in the x86 case. The impact of this patch depends on the workload as measuring any benefit requires both mapped pages co-located on the LRU and memory pressure. The case with the biggest impact is multiple processes reading mapped pages taken from the vm-scalability test suite. The test case uses NR_CPU readers of mapped files that consume 10*RAM. Linear mapped reader on a 4-node machine with 64G RAM and 48 CPUs 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 Ops lru-file-mmap-read-elapsed 159.62 ( 0.00%) 120.68 ( 24.40%) Ops lru-file-mmap-read-time_range 30.59 ( 0.00%) 2.80 ( 90.85%) Ops lru-file-mmap-read-time_stddv 6.70 ( 0.00%) 0.64 ( 90.38%) 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 User 581.00 611.43 System 5804.93 4111.76 Elapsed 161.03 122.12 This is showing that the readers completed 24.40% faster with 29% less system CPU time. From vmstats, it is known that the vanilla kernel was interrupted roughly 900K times per second during the steady phase of the test and the patched kernel was interrupts 180K times per second. The impact is lower on a single socket machine. 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 Ops lru-file-mmap-read-elapsed 25.33 ( 0.00%) 20.38 ( 19.54%) Ops lru-file-mmap-read-time_range 0.91 ( 0.00%) 1.44 (-58.24%) Ops lru-file-mmap-read-time_stddv 0.28 ( 0.00%) 0.47 (-65.34%) 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 User 58.09 57.64 System 111.82 76.56 Elapsed 27.29 22.55 It's still a noticeable improvement with vmstat showing interrupts went from roughly 500K per second to 45K per second. The patch will have no impact on workloads with no memory pressure or have relatively few mapped pages. It will have an unpredictable impact on the workload running on the CPU being flushed as it'll depend on how many TLB entries need to be refilled and how long that takes. Worst case, the TLB will be completely cleared of active entries when the target PFNs were not resident at all. [sasha.levin@oracle.com: trace tlb flush after disabling preemption in try_to_unmap_flush] Signed-off-by: NMel Gorman <mgorman@suse.de> Reviewed-by: NRik van Riel <riel@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Acked-by: NIngo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sasha.levin@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 8月, 2015 1 次提交
-
-
由 Andy Lutomirski 提交于
As of cf991de2 ("x86/asm/msr: Make wrmsrl_safe() a function"), wrmsrl_safe is a function, but wrmsrl is still a macro. The wrmsrl macro performs invalid shifts if the value argument is 32 bits. This makes it unnecessarily awkward to write code that puts an unsigned long into an MSR. To make this work, syscall_init needs tweaking to stop passing a function pointer to wrmsrl. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Willy Tarreau <w@1wt.eu> Link: http://lkml.kernel.org/r/690f0c629a1085d054e2d1ef3da073cfb3f7db92.1437678821.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 22 8月, 2015 3 次提交
-
-
由 Andrey Ryabinin 提交于
Current definition of KASAN_SHADOW_OFFSET in include/linux/kasan.h will not work for upcomming arm64, so move it to the arch header. Signed-off-by: NAndrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Alexey Klimov <klimov.linux@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Keitel <dkeitel@codeaurora.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: Yury <yury.norov@gmail.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1439444244-26057-2-git-send-email-ryabinin.a.a@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Huang Rui 提交于
MWAITX can enable a timer and a corresponding timer value specified in SW P0 clocks. The SW P0 frequency is the same as TSC. The timer provides an upper bound on how long the instruction waits before exiting. This way, a delay function in the kernel can leverage that MWAITX timer of MWAITX. When a CPU core executes MWAITX, it will be quiesced in a waiting phase, diminishing its power consumption. This way, we can save power in comparison to our default TSC-based delays. A simple test shows that: $ cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc $ sleep 10000s $ cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc Results: * TSC-based default delay: 485115 uWatts average power * MWAITX-based delay: 252738 uWatts average power Thus, that's about 240 milliWatts less power consumption. The test method relies on the support of AMD CPU accumulated power algorithm in fam15h_power for which patches are forthcoming. Suggested-by: NAndy Lutomirski <luto@amacapital.net> Suggested-by: NBorislav Petkov <bp@suse.de> Suggested-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NHuang Rui <ray.huang@amd.com> [ Fix delay truncation. ] Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Andreas Herrmann <herrmann.der.user@gmail.com> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Jacob Shin <jacob.w.shin@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Li <tony.li@amd.com> Link: http://lkml.kernel.org/r/1438744732-1459-3-git-send-email-ray.huang@amd.com Link: http://lkml.kernel.org/r/1439201994-28067-4-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Huang Rui 提交于
AMD Carrizo processors (Family 15h, Models 60h-6fh) added a new feature called MWAITX (MWAIT with extensions) as an extension to MONITOR/MWAIT. This new instruction controls a configurable timer which causes the core to exit wait state on timer expiration, in addition to "normal" MWAIT condition of reading from a monitored VA. Compared to MONITOR/MWAIT, there are minor differences in opcode and input parameters: MWAITX ECX[1]: enable timer if set MWAITX EBX[31:0]: max wait time expressed in SW P0 clocks == TSC. The software P0 frequency is the same as the TSC frequency. MWAIT MWAITX opcode 0f 01 c9 | 0f 01 fb ECX[0] value of RFLAGS.IF seen by instruction ECX[1] unused/#GP if set | enable timer if set ECX[31:2] unused/#GP if set EAX unused (reserve for hint) EBX[31:0] unused | max wait time (SW P0 == TSC) MONITOR MONITORX opcode 0f 01 c8 | 0f 01 fa EAX (logical) address to monitor ECX #GP if not zero Max timeout = EBX/(TSC frequency) Signed-off-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andreas Herrmann <herrmann.der.user@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dirk Brandewie <dirk.j.brandewie@intel.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <bitbucket@online.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Li <tony.li@amd.com> Link: http://lkml.kernel.org/r/1439201994-28067-3-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 21 8月, 2015 1 次提交
-
-
由 Ingo Molnar 提交于
Some in-flight code makes use of the old rdtscll() (now removed), provide a wrapper for a kernel cycle to smooth the transition to rdtsc(). ( We use the safest variant, rdtsc_ordered(), which has barriers - this adds another incentive to remove the wrapper in the future. ) Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm ML <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/dddbf98a2af53312e9aa73a5a2b1622fe5d6f52b.1434501121.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 20 8月, 2015 6 次提交
-
-
由 Boris Ostrovsky 提交于
Map shared data structure that will hold CPU registers, VPMU context, V/PCPU IDs of the CPU interrupted by PMU interrupt. Hypervisor fills this information in its handler and passes it to the guest for further processing. Set up PMU VIRQ. Now that perf infrastructure will assume that PMU is available on a PV guest we need to be careful and make sure that accesses via RDPMC instruction don't cause fatal traps by the hypervisor. Provide a nop RDPMC handler. For the same reason avoid issuing a warning on a write to APIC's LVTPC. Both of these will be made functional in later patches. Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com> Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
-
由 Boris Ostrovsky 提交于
Set Xen's PMU mode via /sys/hypervisor/pmu/pmu_mode. Add XENPMU hypercall. Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
-
由 Juergen Gross 提交于
Cleanup by removing arch/x86/xen/p2m.h as it isn't needed any more. Most definitions in this file are used in p2m.c only. Move those into p2m.c. set_phys_range_identity() is already declared in arch/x86/include/asm/xen/page.h, add __init annotation there. MAX_REMAP_RANGES isn't used at all, just delete it. The only define left is P2M_PER_PAGE which is moved to page.h as well. Signed-off-by: NJuergen Gross <jgross@suse.com> Acked-by: NKonrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
-
由 Juergen Gross 提交于
64 bit pv-domains under Xen are limited to 512 GB of RAM today. The main reason has been the 3 level p2m tree, which was replaced by the virtual mapped linear p2m list. Parallel to the p2m list which is being used by the kernel itself there is a 3 level mfn tree for usage by the Xen tools and eventually for crash dump analysis. For this tree the linear p2m list can serve as a replacement, too. As the kernel can't know whether the tools are capable of dealing with the p2m list instead of the mfn tree, the limit of 512 GB can't be dropped in all cases. This patch replaces the hard limit by a kernel parameter which tells the kernel to obey the 512 GB limit or not. The default is selected by a configuration parameter which specifies whether the 512 GB limit should be active per default for domUs (domain save/restore/migration and crash dump analysis are affected). Memory above the domain limit is returned to the hypervisor instead of being identity mapped, which was wrong anyway. The kernel configuration parameter to specify the maximum size of a domain can be deleted, as it is not relevant any more. Signed-off-by: NJuergen Gross <jgross@suse.com> Acked-by: NKonrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
-
由 Juergen Gross 提交于
Use the newest headers from the xen tree to get some new structure layouts. Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com> Acked-by: NKonrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
-
由 Julien Grall 提交于
Currently, the event channel rebind code is gated with the presence of the vector callback. The virtual interrupt controller on ARM has the concept of per-CPU interrupt (PPI) which allow us to support per-VCPU event channel. Therefore there is no need of vector callback for ARM. Xen is already using a free PPI to notify the guest VCPU of an event. Furthermore, the xen code initialization in Linux (see arch/arm/xen/enlighten.c) is requesting correctly a per-CPU IRQ. Introduce new helper xen_support_evtchn_rebind to allow architecture decide whether rebind an event is support or not. It will always return true on ARM and keep the same behavior on x86. This is also allow us to drop the usage of xen_have_vector_callback entirely in the ARM code. Signed-off-by: NJulien Grall <julien.grall@citrix.com> Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
-
- 18 8月, 2015 1 次提交
-
-
由 Andy Lutomirski 提交于
This reverts commit: 2c7577a7 ("sched/x86_64: Don't save flags on context switch") It was a nice speedup. It's also not quite correct: SYSENTER enables interrupts too early. We can re-add this optimization once the SYSENTER code is beaten into shape, which should happen in 4.3 or 4.4. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org # v3.19 Link: http://lkml.kernel.org/r/85f56651f59f76624e80785a8fd3bdfdd089a818.1439838962.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 17 8月, 2015 1 次提交
-
-
由 Len Brown 提交于
Both the per-APIC flag ".wait_for_init_deassert", and the global atomic_t "init_deasserted" are dead code -- remove them. For all APIC types, "wait_for_master()" prevents an AP from proceeding until the BSP has set cpu_callout_mask, making "init_deasserted" {unnecessary}: BSP: <de-assert INIT> ... BSP: {set init_deasserted} AP: wait_for_master() set cpu_initialized_mask wait for cpu_callout_mask BSP: test cpu_initialized_mask BSP: set cpu_callout_mask AP: test cpu_callout_mask AP: {wait for init_deasserted} ... AP: <touch APIC> Deleting the {dead code} above is necessary to enable some parallelism in a future patch. Signed-off-by: NLen Brown <len.brown@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Jan H. Schönherr <jschoenh@amazon.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Zhu Guihua <zhugh.fnst@cn.fujitsu.com> Link: http://lkml.kernel.org/r/de4b3a9bab894735e285870b5296da25ee6a8a5a.1439739165.git.len.brown@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 15 8月, 2015 1 次提交
-
-
由 Andy Lutomirski 提交于
VMX encodes access rights differently from LAR, and the latter is most likely what x86 people think of when they think of "access rights". Rename them to avoid confusion. Cc: kvm@vger.kernel.org Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 14 8月, 2015 1 次提交
-
-
由 Linus Torvalds 提交于
This reverts commits 9a036b93 ("x86/signal/64: Remove 'fs' and 'gs' from sigcontext") and c6f20629 ("x86/signal/64: Fix SS handling for signals delivered to 64-bit programs"). They were cleanups, but they break dosemu by changing the signal return behavior (and removing 'fs' and 'gs' from the sigcontext struct - while not actually changing any behavior - causes build problems). Reported-and-tested-by: NStas Sergeev <stsp@list.ru> Acked-by: NAndy Lutomirski <luto@amacapital.net> Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 8月, 2015 5 次提交
-
-
由 Ashok Raj 提交于
kexec could boot a kernel that could be legacy with no knowledge of LMCE. Hence we should make sure we clear LMCE optin before kexec reboot. Signed-off-by: NAshok Raj <ashok.raj@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1439396985-12812-9-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Ashok Raj 提交于
Remove unused function declarations. Signed-off-by: NAshok Raj <ashok.raj@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1439396985-12812-8-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Borislav Petkov 提交于
This used to flush out MCEs logged during early boot and which were in the MCA registers from a previous system run. No need for that now, since we've moved to a genpool. Suggested-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1439396985-12812-7-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Chen, Gong 提交于
Use unified genpool to save Action Optional error events and put Action Optional error handling in the same notification chain as MCE error decoding. Signed-off-by: NChen, Gong <gong.chen@linux.intel.com> [ Fold in subsequent patch from Boris for early boot logging. ] Signed-off-by: NTony Luck <tony.luck@intel.com> [ Correct a lot. ] Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1439396985-12812-5-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Boris reported that gcc version 4.4.4 20100503 (Red Hat 4.4.4-2) fails to build linux-next kernels that have this fresh commit via the locking tree: 11276d53 ("locking/static_keys: Add a new static_key interface") The problem appears to be that even though @key and @branch are compile time constants, it doesn't see the following expression as an immediate value: &((char *)key)[branch] More recent GCCs don't appear to have this problem. In particular, Red Hat backported the 'asm goto' feature into 4.4, 'normal' 4.4 compilers will not have this feature and thus not run into this asm. The workaround is to supply both values to the asm as immediates and do the addition in asm. Suggested-by: NH. Peter Anvin <hpa@zytor.com> Reported-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Tested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 12 8月, 2015 1 次提交
-
-
由 Will Deacon 提交于
Since the following commit: 536fa402 ("compiler: Allow 1- and 2-byte smp_load_acquire() and smp_store_release()") smp_store_release() supports byte accesses, so use that in writer unlock and remove the conditional macro override. Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NWaiman Long <Waiman.Long@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: paulmck@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/1438880084-18856-6-git-send-email-will.deacon@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 06 8月, 2015 2 次提交
-
-
由 Thomas Gleixner 提交于
We can spare the irq_desc lookup in the interrupt entry code if we store the descriptor pointer in the vector array instead the interrupt number. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Bjorn Helgaas <bhelgaas@google.com> Link: http://lkml.kernel.org/r/20150802203609.717724106@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
VECTOR_UNDEFINED is a misnomer. The vector is defined, but unused. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Bjorn Helgaas <bhelgaas@google.com> Link: http://lkml.kernel.org/r/20150802203609.477282494@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 05 8月, 2015 7 次提交
-
-
由 Xiao Guangrong 提交于
We have abstracted the data struct and functions which are used to check reserved bit on guest page tables, now we extend the logic to check zero bits on shadow page tables The zero bits on sptes include not only reserved bits on hardware but also the bits that SPTEs willnever use. For example, shadow pages will never use GB pages unless the guest uses them too. Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Xiao Guangrong 提交于
These two fields, rsvd_bits_mask and bad_mt_xwr, in "struct kvm_mmu" are used to check if reserved bits set on guest ptes, move them to a data struct so that the approach can be applied to check host shadow page table entries as well Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Andy Lutomirski 提交于
They are no longer used. Good riddance! Deleting the TIF_ macros is really nice. It was never clear why there were so many variants. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eric Paris <eparis@parisplace.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/22c61682f446628573dde0f1d573ab821677e06da.1438378274.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Denys Vlasenko 提交于
With this config: http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os gcc-4.7.2 generates many copies of these tiny functions: __arch_hweight32 (35 copies): 55 push %rbp e8 66 9b 4a 00 callq __sw_hweight32 48 89 e5 mov %rsp,%rbp 5d pop %rbp c3 retq __arch_hweight64 (8 copies): 55 push %rbp e8 5e c2 8a 00 callq __sw_hweight64 48 89 e5 mov %rsp,%rbp 5d pop %rbp c3 retq See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122 This patch fixes this via s/inline/__always_inline/ To avoid touching 32-bit case where such change was not tested to be a win, reformat __arch_hweight64() to have completely disjoint 64-bit and 32-bit implementations. IOW: made #ifdef / 32 bits and 64 bits instead of having #ifdef / #else / #endif inside a single function body. Only 64-bit __arch_hweight64() is __always_inline'd. text data bss dec filename 86971120 17195912 36659200 140826232 vmlinux.before 86970954 17195912 36659200 140826066 vmlinux Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Rientjes <rientjes@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Graf <tgraf@suug.ch> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1438697716-28121-2-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Denis V. Lunev 提交于
Hypervisor Top Level Functional Specification v3.1/4.0 notes that cpuid (0x40000003) EDX's 10th bit should be used to check that Hyper-V guest crash MSR's functionality available. This patch should fix this recognition. Currently the code checks EAX register instead of EDX. Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com> Signed-off-by: NDenis V. Lunev <den@openvz.org> Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Vitaly Kuznetsov 提交于
Full kernel hang is observed when kdump kernel starts after a crash. This hang happens in vmbus_negotiate_version() function on wait_for_completion() as Hyper-V host (Win2012R2 in my testing) never responds to CHANNELMSG_INITIATE_CONTACT as it thinks the connection is already established. We need to perform some mandatory minimalistic cleanup before we start new kernel. Reported-by: NK. Y. Srinivasan <kys@microsoft.com> Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Vitaly Kuznetsov 提交于
When general-purpose kexec (not kdump) is being performed in Hyper-V guest the newly booted kernel fails with an MCE error coming from the host. It is the same error which was fixed in the "Drivers: hv: vmbus: Implement the protocol for tearing down vmbus state" commit - monitor pages remain special and when they're being written to (as the new kernel doesn't know these pages are special) bad things happen. We need to perform some minimalistic cleanup before booting a new kernel on kexec. To do so we need to register a special machine_ops.shutdown handler to be executed before the native_machine_shutdown(). Registering a shutdown notification handler via the register_reboot_notifier() call is not sufficient as it happens to early for our purposes. machine_ops is not being exported to modules (and I don't think we want to export it) so let's do this in mshyperv.c The minimalistic cleanup consists of cleaning up clockevents, synic MSRs, guest os id MSR, and hypercall MSR. Kdump doesn't require all this stuff as it lives in a separate memory space. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 04 8月, 2015 3 次提交
-
-
由 Andi Kleen 提交于
Add new MSRs (LBR_INFO) and some new MSR bits used by the Intel Skylake PMU driver. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1431285767-27027-4-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andi Kleen 提交于
PEBSv3 has a raw TSC time stamp in its memory buffer that later needs to to be converted to perf_clock. Add a native_sched_clock_from_tsc() that works the same as native_sched_clock(), but starts with an already given TSC value. Paravirt is ignored, it will just get the native clock. But there isn't a para virtualized PEBS anyway. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1431285767-27027-2-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alexander Shishkin 提交于
Intel PT chapter in the new Intel Architecture SDM adds several packets corresponding enable bits and registers that control packet generation. Also, additional bits in the Intel PT CPUID leaf were added to enumerate presence and parameters of these new packets and features. The packets and enables are: * CYC: cycle accurate mode, provides the number of cycles elapsed since previous CYC packet; its presence and available threshold values are enumerated via CPUID; * MTC: mini time counter packets, used for tracking TSC time between full TSC packets; its presence and available resolution options are enumerated via CPUID; * PSB packet period is now configurable, available period values are enumerated via CPUID. This patch adds corresponding bit and register definitions, pmu driver capabilities based on CPUID enumeration, new attribute format bits for the new featurens and extends event configuration validation function to take these into account. Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: hpa@zytor.com Link: http://lkml.kernel.org/r/1438262131-12725-1-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 03 8月, 2015 3 次提交
-
-
由 Konstantin Khlebnikov 提交于
These functions check should_resched() before unlocking spinlock/bh-enable: preempt_count always non-zero => should_resched() always returns false. cond_resched_lock() worked iff spin_needbreak is set. This patch adds argument "preempt_offset" to should_resched(). preempt_count offset constants for that: PREEMPT_DISABLE_OFFSET - offset after preempt_disable() PREEMPT_LOCK_OFFSET - offset after spin_lock() SOFTIRQ_DISABLE_OFFSET - offset after local_bh_distable() SOFTIRQ_LOCK_OFFSET - offset after spin_lock_bh() Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Graf <agraf@suse.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: bdb43806 ("sched: Extract the basic add/sub preempt_count modifiers") Link: http://lkml.kernel.org/r/20150715095204.12246.98268.stgit@buzzSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
There are various problems and short-comings with the current static_key interface: - static_key_{true,false}() read like a branch depending on the key value, instead of the actual likely/unlikely branch depending on init value. - static_key_{true,false}() are, as stated above, tied to the static_key init values STATIC_KEY_INIT_{TRUE,FALSE}. - we're limited to the 2 (out of 4) possible options that compile to a default NOP because that's what our arch_static_branch() assembly emits. So provide a new static_key interface: DEFINE_STATIC_KEY_TRUE(name); DEFINE_STATIC_KEY_FALSE(name); Which define a key of different types with an initial true/false value. Then allow: static_branch_likely() static_branch_unlikely() to take a key of either type and emit the right instruction for the case. This means adding a second arch_static_branch_jump() assembly helper which emits a JMP per default. In order to determine the right instruction for the right state, encode the branch type in the LSB of jump_entry::key. This is the final step in removing the naming confusion that has led to a stream of avoidable bugs such as: a833581e ("x86, perf: Fix static_key bug in load_mm_cr4()") ... but it also allows new static key combinations that will give us performance enhancements in the subsequent patches. Tested-by: Rabin Vincent <rabin@rab.in> # arm Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> # ppc Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # s390 Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andrey Konovalov 提交于
Replace ACCESS_ONCE() macro in smp_store_release() and smp_load_acquire() with WRITE_ONCE() and READ_ONCE() on x86, arm, arm64, ia64, metag, mips, powerpc, s390, sparc and asm-generic since ACCESS_ONCE() does not work reliably on non-scalar types. WRITE_ONCE() and READ_ONCE() were introduced in the following commits: 230fa253 ("kernel: Provide READ_ONCE and ASSIGN_ONCE") 43239cbe ("kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val)") Signed-off-by: NAndrey Konovalov <andreyknvl@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NDavidlohr Bueso <dbueso@suse.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: NRalf Baechle <ralf@linux-mips.org> Cc: Alexander Duyck <alexander.h.duyck@redhat.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@suse.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/1438528264-714-1-git-send-email-andreyknvl@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 31 7月, 2015 3 次提交
-
-
由 Brian Gerst 提交于
Rename v86flags to veflags, and v86mask to veflags_mask. Signed-off-by: NBrian Gerst <brgerst@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1438148483-11932-9-git-send-email-brgerst@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Brian Gerst 提交于
Make it clearer that this is the pointer to the userspace vm86 state area. Signed-off-by: NBrian Gerst <brgerst@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1438148483-11932-8-git-send-email-brgerst@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Brian Gerst 提交于
vm86.h was being implicitly included in alot of places via processor.h, which in turn got it from math_emu.h. Break that chain and explicitly include vm86.h in all files that need it. Also remove unused vm86 field from math_emu_info. Signed-off-by: NBrian Gerst <brgerst@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1438148483-11932-7-git-send-email-brgerst@gmail.com [ Fixed build failure. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-