- 21 9月, 2009 3 次提交
-
-
由 Ingo Molnar 提交于
Bye-bye Performance Counters, welcome Performance Events! In the past few months the perfcounters subsystem has grown out its initial role of counting hardware events, and has become (and is becoming) a much broader generic event enumeration, reporting, logging, monitoring, analysis facility. Naming its core object 'perf_counter' and naming the subsystem 'perfcounters' has become more and more of a misnomer. With pending code like hw-breakpoints support the 'counter' name is less and less appropriate. All in one, we've decided to rename the subsystem to 'performance events' and to propagate this rename through all fields, variables and API names. (in an ABI compatible fashion) The word 'event' is also a bit shorter than 'counter' - which makes it slightly more convenient to write/handle as well. Thanks goes to Stephane Eranian who first observed this misnomer and suggested a rename. User-space tooling and ABI compatibility is not affected - this patch should be function-invariant. (Also, defconfigs were not touched to keep the size down.) This patch has been generated via the following script: FILES=$(find * -type f | grep -vE 'oprofile|[^K]config') sed -i \ -e 's/PERF_EVENT_/PERF_RECORD_/g' \ -e 's/PERF_COUNTER/PERF_EVENT/g' \ -e 's/perf_counter/perf_event/g' \ -e 's/nb_counters/nb_events/g' \ -e 's/swcounter/swevent/g' \ -e 's/tpcounter_event/tp_event/g' \ $FILES for N in $(find . -name perf_counter.[ch]); do M=$(echo $N | sed 's/perf_counter/perf_event/g') mv $N $M done FILES=$(find . -name perf_event.*) sed -i \ -e 's/COUNTER_MASK/REG_MASK/g' \ -e 's/COUNTER/EVENT/g' \ -e 's/\<event\>/event_id/g' \ -e 's/counter/event/g' \ -e 's/Counter/Event/g' \ $FILES ... to keep it as correct as possible. This script can also be used by anyone who has pending perfcounters patches - it converts a Linux kernel tree over to the new naming. We tried to time this change to the point in time where the amount of pending patches is the smallest: the end of the merge window. Namespace clashes were fixed up in a preparatory patch - and some stylistic fallout will be fixed up in a subsequent patch. ( NOTE: 'counters' are still the proper terminology when we deal with hardware registers - and these sed scripts are a bit over-eager in renaming them. I've undone some of that, but in case there's something left where 'counter' would be better than 'event' we can undo that on an individual basis instead of touching an otherwise nicely automated patch. ) Suggested-by: NStephane Eranian <eranian@google.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NPaul Mackerras <paulus@samba.org> Reviewed-by: NArjan van de Ven <arjan@linux.intel.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Howells <dhowells@redhat.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <linux-arch@vger.kernel.org> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
In preparation to the renames, to avoid a namespace clash. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Dave noticed that we leak the PMU resource reservations when we fail the hardware counter init. Reported-by: NDavid Miller <davem@davemloft.net> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NDavid Miller <davem@davemloft.net> LKML-Reference: <1252483487.7746.164.camel@twins> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 20 9月, 2009 5 次提交
-
-
由 Jaswinder Singh Rajput 提交于
fix the following 'make includecheck' warning: arch/x86/kernel/cpu/common.c: linux/smp.h is included more than once. Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Alan Cox <alan@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> LKML-Reference: <1252087783.6385.10.camel@ht.satnam>
-
由 Jaswinder Singh Rajput 提交于
fix the following 'make includecheck' warning: arch/x86/mm/kmemcheck/shadow.c: linux/module.h is included more than once. Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Sam Ravnborg <sam@ravnborg.org> LKML-Reference: <1247065179.4382.51.camel@ht.satnam>
-
由 Jaswinder Singh Rajput 提交于
fix the following 'make includecheck' warning: arch/x86/kernel/traps.c: asm/traps.h is included more than once. Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Sam Ravnborg <sam@ravnborg.org> LKML-Reference: <1247065094.4382.49.camel@ht.satnam>
-
由 Kay Sievers 提交于
This allows subsytems to provide devtmpfs with non-default permissions for the device node. Instead of the default mode of 0600, null, zero, random, urandom, full, tty, ptmx now have a mode of 0666, which allows non-privileged processes to access standard device nodes in case no other userspace process applies the expected permissions. This also fixes a wrong assignment in pktcdvd and a checkpatch.pl complain. Signed-off-by: NKay Sievers <kay.sievers@vrfy.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Arjan van de Ven 提交于
The "end of a C state" trace point currently happens before the code runs that corrects the TSC for having stopped during idle. The result of this is that the timestamp of the end-of-C-state event is garbage on cpus where the TSC stops during idle. This patch moves the end point of the C state to after the timekeeping engine of the kernel has been corrected. Signed-off-by: NArjan van de Ven <arjan@linux.intel.com> Cc: Len Brown <len.brown@intel.com> Cc: fweisbec@gmail.com Cc: peterz@infradead.org Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <20090919133533.139c2a46@infradead.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 19 9月, 2009 2 次提交
-
-
由 Arjan van de Ven 提交于
This patch converts the existing power tracer into an event tracer, so that power events (C states and frequency changes) can be tracked via "perf". This also removes the perl script that was used to demo the tracer; its functionality is being replaced entirely with timechart. Signed-off-by: NArjan van de Ven <arjan@linux.intel.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20090912130542.6d314860@infradead.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Markus Metzger 提交于
Draining the BTS buffer on a buffer overflow interrupt takes too long resulting in a kernel lockup when tracing the kernel. Restructure perf_counter sampling into sample creation and sample output. Prepare a single reference sample for BTS sampling and update the from and to address fields when draining the BTS buffer. Drain the entire BTS buffer between a single perf_output_begin() / perf_output_end() pair. Signed-off-by: NMarkus Metzger <markus.t.metzger@intel.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090915130023.A16204@sedona.ch.intel.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 18 9月, 2009 1 次提交
-
-
由 Suresh Siddha 提交于
Recent enhancement of rb-tree based lookup exposed a bug with the lookup mechanism in the reserve_memtype() which ensures that there are no conflicting memtype requests for the memory range. memtype_rb_search() returns an entry which has a start address <= new start address. And from here we traverse the linear linked list to check if there any conflicts with the existing mappings. As the rbtree is based on the start address of the memory range, it is quite possible that we have several overlapped mappings whose start address is much less than new requested start but the end is >= new requested end. This results in conflicting memtype mappings. Same bug exists with the old code which uses cached_entry from where we traverse the linear linked list. But the new rb-tree code exposes this bug fairly easily. For now, don't use the memtype_rb_search() and always start the search from the head of linear linked list in reserve_memtype(). Linear linked list for most of the systems grow's to few 10's of entries(as we track memory type of RAM pages using struct page). So we should be ok for now. We still retain the rbtree and use it to speed up free_memtype() which doesn't have the same bug(as we know what exactly we are searching for in free_memtype). Also use list_for_each_entry_from() in free_memtype() so that we start the search from rb-tree lookup result. Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> LKML-Reference: <1253136483.4119.12.camel@sbs-t61.sc.intel.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 16 9月, 2009 6 次提交
-
-
由 Kurt Roeckx 提交于
Fixes bugzilla #13780 From: Kurt Roeckx <kurt@roeckx.be> Signed-off-by: NDave Jones <davej@redhat.com>
-
由 Peter Zijlstra 提交于
Sysbench thinks SD_BALANCE_WAKE is too agressive and kbuild doesn't really mind too much, SD_BALANCE_NEWIDLE picks up most of the slack. On a dual socket, quad core, dual thread nehalem system: sysbench (--num_threads=16): SD_BALANCE_WAKE-: 13982 tx/s SD_BALANCE_WAKE+: 15688 tx/s kbuild (-j16): SD_BALANCE_WAKE-: 47.648295846 seconds time elapsed ( +- 0.312% ) SD_BALANCE_WAKE+: 47.608607360 seconds time elapsed ( +- 0.026% ) (same within noise) Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Feng Tang 提交于
get/set_wallclock() have already a set of platform dependent implementations (default, EFI, paravirt). MRST will add another variant. Moving them to platform ops simplifies the existing code and minimizes the effort to integrate new variants. Signed-off-by: NFeng Tang <feng.tang@intel.com> LKML-Reference: <new-submission> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Andreas Herrmann 提交于
Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com> Acked-by: NH. Peter Anvin <hpa@zytor.com>
-
由 Peter Zijlstra 提交于
Silly percpu bits don't respect static.. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission>
-
由 Thomas Gleixner 提交于
init_IRQ() and x86_late_time_init() are missing __init annotations. The x86 platform ops variables are annotated, but the annotation needs to be put between the variable name and the "=" of the initializer. Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 15 9月, 2009 10 次提交
-
-
由 Peter Zijlstra 提交于
APERF/MPERF support for cpu_power. APERF/MPERF is arch defined to be a relative scale of work capacity per logical cpu, this is assumed to include SMT and Turbo mode. APERF/MPERF are specified to both reset to 0 when either counter wraps, which is highly inconvenient, since that'll give a blimp when that happens. The manual specifies writing 0 to the counters after each read, but that's 1) too expensive, and 2) destroys the possibility of sharing these counters with other users, so we live with the blimp - the other existing user does too. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Move some of the aperf/mperf code out from the cpufreq driver thingy so that other people can enjoy it too. Cc: H. Peter Anvin <hpa@zytor.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> Cc: Dave Jones <davej@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: cpufreq@vger.kernel.org Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Move the APERFMPERF capacility into a X86_FEATURE flag so that it can be used outside of the acpi cpufreq driver. Cc: H. Peter Anvin <hpa@zytor.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> Cc: Dave Jones <davej@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: cpufreq@vger.kernel.org Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
If we're looking to place a new task, we might as well find the idlest position _now_, not 1 tick ago. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Mike Galbraith 提交于
Make the idle balancer more agressive, to improve a x264 encoding workload provided by Jason Garrett-Glaser: NEXT_BUDDY NO_LB_BIAS encoded 600 frames, 252.82 fps, 22096.60 kb/s encoded 600 frames, 250.69 fps, 22096.60 kb/s encoded 600 frames, 245.76 fps, 22096.60 kb/s NO_NEXT_BUDDY LB_BIAS encoded 600 frames, 344.44 fps, 22096.60 kb/s encoded 600 frames, 346.66 fps, 22096.60 kb/s encoded 600 frames, 352.59 fps, 22096.60 kb/s NO_NEXT_BUDDY NO_LB_BIAS encoded 600 frames, 425.75 fps, 22096.60 kb/s encoded 600 frames, 425.45 fps, 22096.60 kb/s encoded 600 frames, 422.49 fps, 22096.60 kb/s Peter pointed out that this is better done via newidle_idx, not via LB_BIAS, newidle balancing should look for where there is load _now_, not where there was load 2 ticks ago. Worst-case latencies are improved as well as no buddies means less vruntime spread. (as per prior lkml discussions) This change improves kbuild-peak parallelism as well. Reported-by: NJason Garrett-Glaser <darkshikari@gmail.com> Signed-off-by: NMike Galbraith <efault@gmx.de> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1253011667.9128.16.camel@marge.simson.net> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
When merging select_task_rq_fair() and sched_balance_self() we lost the use of wake_idx, restore that and set them to 0 to make wake balancing more aggressive. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
The problem with wake_idle() is that is doesn't respect things like cpu_power, which means it doesn't deal well with SMT nor the recent RT interaction. To cure this, it needs to do what sched_balance_self() does, which leads to the possibility of merging select_task_rq_fair() and sched_balance_self(). Modify sched_balance_self() to: - update_shares() when walking up the domain tree, (it only called it for the top domain, but it should have done this anyway), which allows us to remove this ugly bit from try_to_wake_up(). - do wake_affine() on the smallest domain that contains both this (the waking) and the prev (the wakee) cpu for WAKE invocations. Then use the top-down balance steps it had to replace wake_idle(). This leads to the dissapearance of SD_WAKE_BALANCE and SD_WAKE_IDLE_FAR, with SD_WAKE_IDLE replaced with SD_BALANCE_WAKE. SD_WAKE_AFFINE needs SD_BALANCE_WAKE to be effective. Touch all topology bits to replace the old with new SD flags -- platforms might need re-tuning, enabling SD_BALANCE_WAKE conditionally on a NUMA distance seems like a good additional feature, magny-core and small nehalem systems would want this enabled, systems with slow interconnects would not. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Andi Kleen 提交于
Fix compilation error in arch/x86/kernel/cpu/mcheck/mce-severity.c when CONFIG_DEBUG_FS is disabled, introduced in commit 5be9ed25. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 Borislav Petkov 提交于
Now that decoding is done in-kernel, suppress mcelog message part. CC: Andi Kleen <andi@firstfloor.org> Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-
由 Borislav Petkov 提交于
Move NB decoder along with required defines to EDAC MCE core. Add registration routines for further decoding of the MCE info in the AMD64 EDAC module. CC: Andi Kleen <andi@firstfloor.org> Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
-
- 13 9月, 2009 1 次提交
-
-
由 Jiri Olsa 提交于
Only 24 bytes needs to be reserved on the stack for the function graph tracer on x86_64. Signed-off-by: NJiri Olsa <jolsa@redhat.com> LKML-Reference: <20090729085837.GB4998@jolsa.lab.eng.brq.redhat.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 11 9月, 2009 4 次提交
-
-
由 Michal Hocko 提交于
Currently we are not including randomized stack size when calculating mmap_base address in arch_pick_mmap_layout for topdown case. This might cause that mmap_base starts in the stack reserved area because stack is randomized by 1GB for 64b (8MB for 32b) and the minimum gap is 128MB. If the stack really grows down to mmap_base then we can get silent mmap region overwrite by the stack values. Let's include maximum stack randomization size into MIN_GAP which is used as the low bound for the gap in mmap. Signed-off-by: NMichal Hocko <mhocko@suse.cz> LKML-Reference: <1252400515-6866-1-git-send-email-mhocko@suse.cz> Acked-by: NJiri Kosina <jkosina@suse.cz> Signed-off-by: NH. Peter Anvin <hpa@zytor.com> Cc: Stable Team <stable@kernel.org>
-
由 Ben Hutchings 提交于
As reported in <http://bugs.debian.org/511703> and <http://bugs.debian.org/515982>, kernels with paravirt-alternatives enabled crash in text_poke_early() on at least some 486-class processors. The problem is that text_poke_early() itself uses inline functions affected by paravirt-alternatives and so will modify instructions that have already been prefetched. Pentium and later processors will invalidate the prefetched instructions in this case, but 486-class processors do not. Change sync_core() to limit prefetching on 486-class (and 386-class) processors, and move the call to sync_core() above the call to the modifiable local_irq_restore(). Signed-off-by: NBen Hutchings <ben@decadent.org.uk> LKML-Reference: <1252547631.3423.134.camel@localhost> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 Steven Rostedt 提交于
The dynamic function tracer relys on the macro P6_NOP5 always being an atomic NOP. If for some reason it is changed to be two operations (like a nop2 nop3) it can faults within the kernel when the function tracer modifies the code. This patch adds a comment to note that the P6_NOPs are expected to be atomic. This will hopefully prevent anyone from changing that. Reported-by: NMathieu Desnoyer <mathieu.desnoyers@polymtl.ca> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Jeremy Fitzhardinge 提交于
Split __phys_addr out into its own file so we can disable -fstack-protector in a fine-grained fashion. Also it doesn't have terribly much to do with the rest of ioremap.c. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
- 10 9月, 2009 8 次提交
-
-
由 Avi Kivity 提交于
Debug registers may only be accessed from cpl 0. Unfortunately, vmx will code to emulate the instruction even though it was issued from guest userspace, possibly leading to an unexpected trap later. Cc: stable@kernel.org Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Gleb Natapov 提交于
Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Marcelo Tosatti 提交于
kvm_mmu_slot_remove_write_access already calls it. Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Gleb Natapov 提交于
No need to call it before each kvm_(set|get)_msr_common() Signed-off-by: NGleb Natapov <gleb@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Avi Kivity 提交于
Only reload debug register 6 if we're running with the guest's debug registers. Saves around 150 cycles from the guest lightweight exit path. dr6 contains a couple of bits that are updated on #DB, so intercept that unconditionally and update those bits then. Signed-off-by: NAvi Kivity <avi@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Avi Kivity 提交于
Instead of saving the debug registers from the processor to a kvm data structure, rely in the debug registers stored in the thread structure. This allows us not to save dr6 and dr7. Reduces lightweight vmexit cost by 350 cycles, or 11 percent. Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Marcelo Tosatti 提交于
Commit b8bcfe99 made paravirt pte updates synchronous in interrupt context. Unfortunately the KVM pv mmu code caches the lazy/nonlazy mode internally, so a pte update from interrupt context during a lazy mmu operation can be batched while it should be performed synchronously. https://bugzilla.redhat.com/show_bug.cgi?id=518022 Drop the internal mode variable and use paravirt_get_lazy_mode(), which returns the correct state. Cc: stable@kernel.org Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-
由 Glauber Costa 提交于
The use of __pa() to calculate the address of a C-visible symbol is wrong, and can lead to unpredictable results. See arch/x86/include/asm/page.h for details. It should be replaced with __pa_symbol(), that does the correct math here, by taking relocations into account. This ensures the correct wallclock data structure physical address is passed to the hypervisor. Cc: stable@kernel.org Signed-off-by: NGlauber Costa <glommer@redhat.com> Signed-off-by: NAvi Kivity <avi@redhat.com>
-