- 14 12月, 2009 5 次提交
-
-
由 Hidetoshi Seto 提交于
It looks better to have a common function. No change in functionality. Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> LKML-Reference: <4B25FDDC.407@jp.fujitsu.com> Signed-off-by: NIngo Molnar <mingo@elte.hu> Cc: Cyrill Gorcunov <gorcunov@openvz.org>
-
由 Cyrill Gorcunov 提交于
Add check if APIC is not disabled since thermal monitoring depends on it. As only apic gets disabled we should not try to install "thermal monitor" vector, print out that thermal monitoring is enabled and etc... Note that "Intel Correct Machine Check Interrupts" already has such a check. Also I decided to not add cpu_has_apic check into mcheck_intel_therm_init since even if it'll call apic_read on disabled apic -- it's safe here and allow us to save a few code bytes. Reported-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> LKML-Reference: <4B25FDC2.3020401@jp.fujitsu.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Yinghai Lu 提交于
This fixes the following breakage of the commit 75f1cdf1: - GART systems that don't AGP with broken BIOS and more than 4GB memory are forced to use swiotlb. They can allocate aperture by hand and use GART. - GART systems without GAP must disable GART on shutdown. - swiotlb usage is forced by the boot option, gart_iommu_hole_init() is not called, so we disable GART early_gart_iommu_check(). Signed-off-by: NYinghai Lu <yinghai@kernel.org> Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> LKML-Reference: <1260759135-6450-3-git-send-email-fujita.tomonori@lab.ntt.co.jp> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 FUJITA Tomonori 提交于
The commit 75f1cdf1 introduced a bug that we initialize SWIOTLB right after dma32_free_bootmem so we wrongly steal memory area allocated for GART with broken BIOS earlier. This moves swiotlb initialization before dma32_free_bootmem(). Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: yinghai@kernel.org LKML-Reference: <1260759135-6450-2-git-send-email-fujita.tomonori@lab.ntt.co.jp> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Joe Perches 提交于
Stephen Rothwell reported these warnings: arch/x86/mm/mmio-mod.c: In function 'print_pte': arch/x86/mm/mmio-mod.c:100: warning: too many arguments for format arch/x86/mm/mmio-mod.c:106: warning: too many arguments for format The 'fmt' was left out accidentally. Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NJoe Perches <joe@perches.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus <torvalds@linux-foundation.org> LKML-Reference: <1260775443.18538.16.camel@Joe-Laptop.home> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 13 12月, 2009 1 次提交
-
-
由 Cliff Wickman 提交于
Interrupt vector 0xec has been doubly defined in irq_vectors.h It seems arbitrary whether LOCAL_PENDING_VECTOR or UV_BAU_MESSAGE is the higher number. As long as they are unique. If they are not unique we'll hit a BUG in alloc_system_vector(). Signed-off-by: NCliff Wickman <cpw@sgi.com> Cc: <stable@kernel.org> LKML-Reference: <E1NJ9Pe-0004P7-0Q@eag09.americas.sgi.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 12 12月, 2009 3 次提交
-
-
由 Mike Travis 提交于
When there are a large number of processors in a system, there is an excessive amount of messages sent to the system console. It's estimated that with 4096 processors in a system, and the console baudrate set to 56K, the startup messages will take about 84 minutes to clear the serial port. This set of patches limits the number of repetitious messages which contain no additional information. Much of this information is obtainable from the /proc and /sysfs. Some of the messages are also sent to the kernel log buffer as KERN_DEBUG messages so dmesg can be used to examine more closely any details specific to a problem. The new cpu bootup sequence for system_state == SYSTEM_BOOTING: Booting Node 0, Processors #1 #2 #3 #4 #5 #6 #7 Ok. Booting Node 1, Processors #8 #9 #10 #11 #12 #13 #14 #15 Ok. ... Booting Node 3, Processors #56 #57 #58 #59 #60 #61 #62 #63 Ok. Brought up 64 CPUs After the system is running, a single line boot message is displayed when CPU's are hotplugged on: Booting Node %d Processor %d APIC 0x%x Status of the following lines: CPU: Physical Processor ID: printed once (for boot cpu) CPU: Processor Core ID: printed once (for boot cpu) CPU: Hyper-Threading is disabled printed once (for boot cpu) CPU: Thermal monitoring enabled printed once (for boot cpu) CPU %d/0x%x -> Node %d: removed CPU %d is now offline: only if system_state == RUNNING Initializing CPU#%d: KERN_DEBUG Signed-off-by: NMike Travis <travis@sgi.com> LKML-Reference: <4B219E28.8080601@sgi.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 Mike Travis 提交于
Print only once that the system is supporting x2apic mode. Signed-off-by: NMike Travis <travis@sgi.com> Acked-by: NCyrill Gorcunov <gorcunov@openvz.org> LKML-Reference: <4B226E92.5080904@sgi.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 Borislav Petkov 提交于
The current rd/wrmsr_on_cpus helpers assume that the supplied cpumasks are contiguous. However, there are machines out there like some K8 multinode Opterons which have a non-contiguous core enumeration on each node (e.g. cores 0,2 on node 0 instead of 0,1), see http://www.gossamer-threads.com/lists/linux/kernel/1160268. This patch fixes out-of-bounds writes (see URL above) by adding per-CPU msr structs which are used on the respective cores. Additionally, two helpers, msrs_{alloc,free}, are provided for use by the callers of the MSR accessors. Cc: H. Peter Anvin <hpa@zytor.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20091211171440.GD31998@aftab> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 11 12月, 2009 8 次提交
-
-
由 Jason Wessel 提交于
On an SMP system the kgdb_single_step flag has the possibility to indefinitely hang the system in the case. Consider the case where, CPU 1 has the schedule lock and CPU 0 is set to single step, there is no way for CPU 0 to run another task. The easy way to observe the problem is to make 2 cpus busy, and run the kgdb test suite. You will see that it hangs the system very quickly. while [ 1 ] ; do find /proc > /dev/null 2>&1 ; done & while [ 1 ] ; do find /proc > /dev/null 2>&1 ; done & echo V1 > /sys/module/kgdbts/parameters/kgdbts The side effect of this patch is that there is the possibility to miss a breakpoint in the case that a single step operation was executed to step over a breakpoint in common code. The trade off of the missed breakpoint is preferred to hanging the kernel. This can be fixed in the future by using kprobes or another strategy to step over planted breakpoints with out of line execution. CC: Ingo Molnar <mingo@elte.hu> Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
由 Jason Wessel 提交于
It is possible for the user_mode_vm(regs) check to return true on the i368 arch for a non master kgdb cpu or when the master kgdb cpu handles the NMI watch dog exception. The solution is simply to select the correct gdb_ss location based on the check to user_mode_vm(regs). CC: Ingo Molnar <mingo@elte.hu> Acked-by: NH. Peter Anvin <hpa@zytor.com> Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
由 Roel Kluin 提交于
The for loop starts with a breakno of 0, and ends when it's 4. so this test is always true. Signed-off-by: NRoel Kluin <roel.kluin@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
由 Al Viro 提交于
New helper - sys_mmap_pgoff(); switch syscalls to using it. Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Yinghai Lu 提交于
Jens found the following crash/regression: [ 0.000000] found SMP MP-table at [ffff8800000fdd80] fdd80 [ 0.000000] Kernel panic - not syncing: Overlapping early reservations 12-f011 MP-table mpc to 0-fff BIOS data page and [ 0.000000] Kernel panic - not syncing: Overlapping early reservations 12-f011 MP-table mpc to 6000-7fff TRAMPOLINE and bisected it to b24c2a92 ("x86: Move find_smp_config() earlier and avoid bootmem usage"). It turns out the BIOS is using the first 64k for mptable, without reserving it. So try to find good range for the real-mode trampoline instead of hard coding it, in case some bios tries to use that range for sth. Reported-by: NJens Axboe <jens.axboe@oracle.com> Signed-off-by: NYinghai Lu <yinghai@kernel.org> Tested-by: NJens Axboe <jens.axboe@oracle.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> LKML-Reference: <4B21630A.6000308@kernel.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Prarit Bhargava 提交于
The per_cpu cpuid4_info shared_map can contain stale data when CPUs are added and removed. The stale data can lead to a NULL pointer derefernce panic on a remove of a CPU that has had siblings previously removed. This patch resolves the panic by verifying a cpu is actually online before adding it to the shared_cpu_map, only examining cpus that are part of the same lower level cache, and by updating other siblings lowest level cache maps when a cpu is added. Signed-off-by: NPrarit Bhargava <prarit@redhat.com> LKML-Reference: <20091209183336.17855.98708.sendpatchset@prarit.bos.redhat.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 Cyrill Gorcunov 提交于
Ralf Hildebrandt reported this boot warning: | Running a vanilla 2.6.32 as Xen DomU, I'm getting: | | [ 0.000999] CPU: Physical Processor ID: 0 | [ 0.000999] CPU: Processor Core ID: 1 | [ 0.000999] Performance Events: AMD PMU driver. | [ 0.000999] ------------[ cut here ]------------ | [ 0.000999] WARNING: at arch/x86/kernel/apic/apic.c:249 native_apic_write_dummy So we need to check if APIC functionality is available, and not just in the P6 driver but elsewhere as well. Reported-by: NRalf Hildebrandt <Ralf.Hildebrandt@charite.de> Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20091210165634.GF5086@lenovo> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Xiao Guangrong 提交于
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <4B20BAA6.7010609@cn.fujitsu.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 10 12月, 2009 8 次提交
-
-
由 Christoph Hellwig 提交于
While Linux provided an O_SYNC flag basically since day 1, it took until Linux 2.4.0-test12pre2 to actually get it implemented for filesystems, since that day we had generic_osync_around with only minor changes and the great "For now, when the user asks for O_SYNC, we'll actually give O_DSYNC" comment. This patch intends to actually give us real O_SYNC semantics in addition to the O_DSYNC semantics. After Jan's O_SYNC patches which are required before this patch it's actually surprisingly simple, we just need to figure out when to set the datasync flag to vfs_fsync_range and when not. This patch renames the existing O_SYNC flag to O_DSYNC while keeping it's numerical value to keep binary compatibility, and adds a new real O_SYNC flag. To guarantee backwards compatiblity it is defined as expanding to both the O_DSYNC and the new additional binary flag (__O_SYNC) to make sure we are backwards-compatible when compiled against the new headers. This also means that all places that don't care about the differences can just check O_DSYNC and get the right behaviour for O_SYNC, too - only places that actuall care need to check __O_SYNC in addition. Drivers and network filesystems have been updated in a fail safe way to always do the full sync magic if O_DSYNC is set. The few places setting O_SYNC for lower layers are kept that way for now to stay failsafe. We enforce that O_DSYNC is set when __O_SYNC is set early in the open path to make sure we always get these sane options. Note that parisc really screwed up their headers as they already define a O_DSYNC that has always been a no-op. We try to repair it by using it for the new O_DSYNC and redefinining O_SYNC to send both the traditional O_SYNC numerical value _and_ the O_DSYNC one. Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <adilger@sun.com> Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Acked-by: NKyle McMartin <kyle@mcmartin.ca> Acked-by: NUlrich Drepper <drepper@redhat.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Joerg Roedel 提交于
The device change notifier is initialized in the dma_ops initialization path. But this path is never executed for iommu=pt. Move the notifier initialization to IOMMU hardware init code to fix this. Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
-
由 Joerg Roedel 提交于
The data structure changes to use dev->archdata.iommu field broke the iommu=pt mode because in this case the dev->archdata.iommu was left uninitialized. This moves the inititalization of the devices into the main init function and fixes the problem. Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
-
由 Joe Perches 提交于
- Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt - Remove #define NAME - Remove NAME from pr_<level> Signed-off-by: NJoe Perches <joe@perches.com> LKML-Reference: <009cb214c45ef932df0242856228f4739cc91408.1260383912.git.joe@perches.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Joe Perches 提交于
- Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt - Strip "kmmio: " from pr_<level>s Signed-off-by: NJoe Perches <joe@perches.com> LKML-Reference: <7aa509f8a23933036d39f54bd51e9acc52068049.1260383912.git.joe@perches.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Joe Perches 提交于
- Add pr_fmt(fmt) "pit: " fmt - Strip pit: prefixes from pr_debug Signed-off-by: NJoe Perches <joe@perches.com> LKML-Reference: <bbd4de532f18bb7c11f64ba20d224c08291cb126.1260383912.git.joe@perches.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Joe Perches 提交于
- Added #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt - Stripped PERCPU: from a pr_warning Signed-off-by: NJoe Perches <joe@perches.com> LKML-Reference: <7ead24eccbea8f2b11795abad3e2893a98e1e111.1260383912.git.joe@perches.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Joe Perches 提交于
- Added #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt - Converted a few printk(KERN_INFO to pr_info( - Stripped "es7000_mipcfg" from pr_debug Signed-off-by: NJoe Perches <joe@perches.com> LKML-Reference: <3b4375af246dec5941168858910210937c110af9.1260383912.git.joe@perches.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 09 12月, 2009 5 次提交
-
-
由 Andy Isaacson 提交于
Robert Hancock observes that DMI_BOARD_NAME is often more useful than DMI_PRODUCT_NAME, especially on standalone motherboards. So, print both. Signed-off-by: NAndy Isaacson <adi@hexapodia.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Robert Hancock <hancockrwd@gmail.com> Cc: Richard Zidlicky <rz@linux-m68k.org> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <20091208083021.GB27174@hexapodia.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Andy Isaacson 提交于
Unify x86_32 and x86_64 implementations of __show_regs() header, standardizing on the x86_64 format string in the process. Also, 32-bit will now call print_modules. Signed-off-by: NAndy Isaacson <adi@hexapodia.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Robert Hancock <hancockrwd@gmail.com> Cc: Richard Zidlicky <rz@linux-m68k.org> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <20091208082942.GA27174@hexapodia.org> [ v2: resolved conflict ] Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Frederic Weisbecker 提交于
Currently, when ptrace needs to modify a breakpoint, like disabling it, changing its address, type or len, it calls modify_user_hw_breakpoint(). This latter will perform the heavy and racy task of unregistering the old breakpoint and registering a new one. This is racy as someone else might steal the reserved breakpoint slot under us, which is undesired as the breakpoint is only supposed to be modified, sometimes in the middle of a debugging workflow. We don't want our slot to be stolen in the middle. So instead of unregistering/registering the breakpoint, just disable it while we modify its breakpoint fields and re-enable it after if necessary. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Prasad <prasad@linux.vnet.ibm.com> LKML-Reference: <1260347148-5519-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Joe Perches 提交于
- Use #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt - Remove "microcode: " prefix from each pr_<level> - Fix duplicated KERN_ERR prefix - Coalesce pr_<level> format strings - Add a space after an exclamation point No other change in output. Signed-off-by: NJoe Perches <joe@perches.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Andreas Herrmann <herrmann.der.user@googlemail.com> LKML-Reference: <1260340250.27677.191.camel@Joe-Laptop.home> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Hidetoshi Seto 提交于
Commit cebe1820 had an unnecessary, wrong change: &mce_banks[i].attr is equivalent to the former bank_attrs[i], not to mce_attrs[i]. Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by: NAndi Kleen <andi@firstfloor.org> LKML-Reference: <4B1E05CC.4040703f@jp.fujitsu.com> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 08 12月, 2009 2 次提交
-
-
由 Jan Beulich 提交于
mce_timer must be passed to setup_timer() in all cases, no matter whether it is going to be actually used. Otherwise, when the CPU gets brought down, its call to del_timer_sync() will never return, as the timer won't have a base associated, and hence lock_timer_base() will loop infinitely. Signed-off-by: NJan Beulich <jbeulich@novell.com> Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: <stable@kernel.org> LKML-Reference: <4B1DB831.2030801@jp.fujitsu.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Masami Hiramatsu 提交于
Delete empty or incomplete inat-tables.c if gen-insn-attr-x86.awk failed, because it causes a build error if user tries to build kernel next time. Reported-by: NArkadiusz Miskiewicz <arekm@maven.pl> Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com> Cc: systemtap <systemtap@sources.redhat.com> Cc: DLE <dle-develop@lists.sourceforge.net> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20091207170033.19230.37688.stgit@dhcp-100-2-132.bos.redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 07 12月, 2009 2 次提交
-
-
由 Thomas Gleixner 提交于
apic_noop is used to provide dummy apic functions. It's installed when the CPU has no APIC or when the APIC is disabled on the kernel command line. The apic_noop implementation of apic_write() warns when the CPU has an APIC or when the APIC is not disabled. That's bogus. The warning should only happen when the CPU has an APIC _AND_ the APIC is not disabled. apic_noop.apic_read() has the correct check. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: <stable@kernel.org> # in <= .32 this typo resides in native_apic_write_dummy() LKML-Reference: <alpine.LFD.2.00.0912071255420.3089@localhost.localdomain> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 OGAWA Hirofumi 提交于
At least, insn.c and inat.c is needed for kprobe for now. So, this compile those only if KPROBES is enabled. Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Masami Hiramatsu <mhiramat@redhat.com> LKML-Reference: <878wdg8icq.fsf@devron.myhome.or.jp> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 06 12月, 2009 6 次提交
-
-
由 Jean Delvare 提交于
Fix the following warning: arch/x86/tools/test_get_len.c: In function "main": arch/x86/tools/test_get_len.c:116: warning: unused variable "c" Signed-off-by: NJean Delvare <khali@linux-fr.org> Cc: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Shaun Patterson 提交于
Signed-off-by: NShaun Patterson <shaunpatterson@gmail.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: pq@iki.fi LKML-Reference: <1260027694.10074.170.camel@linux-4lgc.site> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Frederic Weisbecker 提交于
When we enter in irq, two things can happen to preserve the link to the previous frame pointer: - If we were in an irq already, we don't switch to the irq stack as we are inside. We just need to save the previous frame pointer and to link the new one to the previous. - Otherwise we need another level of indirection. We enter the irq with the previous stack. We save the previous bp inside and make bp pointing to its saved address. Then we switch to the irq stack and push bp another time but to the new stack. This makes two levels to dereference instead of one. In the second case, the current stacktrace code omits the second level and loses the frame pointer accuracy. The stack that follows will then be considered as unreliable. Handling that makes the perf callchain happier. Before: 43.94% [k] _raw_read_lock | --- _read_lock | |--60.53%-- send_sigio | __kill_fasync | kill_fasync | evdev_pass_event | evdev_event | input_pass_event | input_handle_event | input_event | synaptics_process_byte | psmouse_handle_byte | psmouse_interrupt | serio_interrupt | i8042_interrupt | handle_IRQ_event | handle_edge_irq | handle_irq | __irqentry_text_start | ret_from_intr | | | |--30.43%-- __select | | | |--17.39%-- 0x454f15 | | | |--13.04%-- __read | | | |--13.04%-- vread_hpet | | | |--13.04%-- _xcb_lock_io | | | --13.04%-- 0x7f630878ce8 After: 50.00% [k] _raw_read_lock | --- _read_lock | |--98.97%-- send_sigio | __kill_fasync | kill_fasync | evdev_pass_event | evdev_event | input_pass_event | input_handle_event | input_event | | | |--96.88%-- synaptics_process_byte | | psmouse_handle_byte | | psmouse_interrupt | | serio_interrupt | | i8042_interrupt | | handle_IRQ_event | | handle_edge_irq | | handle_irq | | __irqentry_text_start | | ret_from_intr | | | | | |--39.78%-- __const_udelay | | | | | | | |--91.89%-- ath5k_hw_register_timeout | | | | ath5k_hw_noise_floor_calibration | | | | ath5k_hw_reset | | | | ath5k_reset | | | | ath5k_config | | | | ieee80211_hw_config | | | | | | | | | |--88.24%-- ieee80211_scan_work | | | | | worker_thread | | | | | kthread | | | | | child_rip | | | | | | | | | --11.76%-- ieee80211_scan_completed | | | | ieee80211_scan_work | | | | worker_thread | | | | kthread | | | | child_rip | | | | | | | --8.11%-- ath5k_hw_noise_floor_calibration | | | ath5k_hw_reset | | | ath5k_reset | | | ath5k_config Note: This does not only affect perf events but also x86-64 stacktraces. They were considered as unreliable once we quit the irq stack frame. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: "K. Prasad" <prasad@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
-
由 Frederic Weisbecker 提交于
While dumping a stacktrace, the end of the exception stack won't link the frame pointer to the previous stack. The interrupted stack will then be considered as unreliable and ignored by perf, as the frame pointer is unreliable itself. This happens because we overwrite the frame pointer that links to the interrupted frame with the address of the exception stack. This is done in order to reserve space inside. But rbp has been chosen here only because it is not a scratch register, so that the address of the exception stack remains in rbp after calling do_debug(), we can then release the exception stack space without the need to retrieve its address again. But we can pick another non-scratch register to do that, so that we preserve the link to the interrupted stack frame in the stacktraces. Just randomly choose r12. Every registers are saved just before and restored just after calling do_debug(). And r12 is not used in the middle, which makes it a perfect candidate. Example: perf record -g -a -c 1 -f -e mem:$(tasklist_lock_addr):rw Before: 44.18% [k] _raw_read_lock | | --- |--6.31%-- waitid | |--4.26%-- writev | |--3.63%-- __select | |--3.15%-- __waitpid | | | |--28.57%-- 0x8b52e00000139f | | | |--28.57%-- 0x8b52e0000013c6 | | | |--14.29%-- 0x7fde786dc000 | | | |--14.29%-- 0x62696c2f7273752f | | | --14.29%-- 0x1ea9df800000000 | |--3.00%-- __poll After: 43.94% [k] _raw_read_lock | --- _read_lock | |--60.53%-- send_sigio | __kill_fasync | kill_fasync | evdev_pass_event | evdev_event | input_pass_event | input_handle_event | input_event | synaptics_process_byte | psmouse_handle_byte | psmouse_interrupt | serio_interrupt | i8042_interrupt | handle_IRQ_event | handle_edge_irq | handle_irq | __irqentry_text_start | ret_from_intr | | | |--30.43%-- __select | | | |--17.39%-- 0x454f15 | | | |--13.04%-- __read | | | |--13.04%-- vread_hpet | | | |--13.04%-- _xcb_lock_io | | | --13.04%-- 0x7f630878ce87 Note: it does not only affect perf events but also other stacktraces in x86-64. They were considered as unreliable once we quit the debug stack frame. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: "K. Prasad" <prasad@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>
-
由 Frederic Weisbecker 提交于
Dumping the callchains from breakpoint events with perf gives strange results: 3.75% perf [kernel] [k] _raw_read_unlock | --- _raw_read_unlock perf_callchain perf_prepare_sample __perf_event_overflow perf_swevent_overflow perf_swevent_add perf_bp_event hw_breakpoint_exceptions_notify notifier_call_chain __atomic_notifier_call_chain atomic_notifier_call_chain notify_die do_debug debug munmap We are infected with all the debug stack. Like the nmi stack, the debug stack is undesired as it is part of the profiling path, not helpful for the user. Ignore it. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
-
由 Frederic Weisbecker 提交于
struct perf_event::event callback was called when a breakpoint triggers. But this is a rather opaque callback, pretty tied-only to the breakpoint API and not really integrated into perf as it triggers even when we don't overflow. We prefer to use overflow_handler() as it fits into the perf events rules, being called only when we overflow. Reported-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
-