1. 09 12月, 2009 1 次提交
    • F
      hw-breakpoints: Modify breakpoints without unregistering them · 44234adc
      Frederic Weisbecker 提交于
      Currently, when ptrace needs to modify a breakpoint, like disabling
      it, changing its address, type or len, it calls
      modify_user_hw_breakpoint(). This latter will perform the heavy and
      racy task of unregistering the old breakpoint and registering a new
      one.
      
      This is racy as someone else might steal the reserved breakpoint
      slot under us, which is undesired as the breakpoint is only
      supposed to be modified, sometimes in the middle of a debugging
      workflow. We don't want our slot to be stolen in the middle.
      
      So instead of unregistering/registering the breakpoint, just
      disable it while we modify its breakpoint fields and re-enable it
      after if necessary.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      LKML-Reference: <1260347148-5519-1-git-send-regression-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      44234adc
  2. 06 12月, 2009 5 次提交
    • F
      x86: Fixup wrong irq frame link in stacktraces · af2d8289
      Frederic Weisbecker 提交于
      When we enter in irq, two things can happen to preserve the link
      to the previous frame pointer:
      
      - If we were in an irq already, we don't switch to the irq stack
        as we are inside. We just need to save the previous frame
        pointer and to link the new one to the previous.
      
      - Otherwise we need another level of indirection. We enter the irq with
        the previous stack. We save the previous bp inside and make bp
        pointing to its saved address. Then we switch to the irq stack and
        push bp another time but to the new stack. This makes two levels to
        dereference instead of one.
      
      In the second case, the current stacktrace code omits the second level
      and loses the frame pointer accuracy. The stack that follows will then
      be considered as unreliable.
      
      Handling that makes the perf callchain happier.
      Before:
      
      43.94%  [k] _raw_read_lock
                  |
                  --- _read_lock
                     |
                     |--60.53%-- send_sigio
                     |          __kill_fasync
                     |          kill_fasync
                     |          evdev_pass_event
                     |          evdev_event
                     |          input_pass_event
                     |          input_handle_event
                     |          input_event
                     |          synaptics_process_byte
                     |          psmouse_handle_byte
                     |          psmouse_interrupt
                     |          serio_interrupt
                     |          i8042_interrupt
                     |          handle_IRQ_event
                     |          handle_edge_irq
                     |          handle_irq
                     |          __irqentry_text_start
                     |          ret_from_intr
                     |          |
                     |          |--30.43%-- __select
                     |          |
                     |          |--17.39%-- 0x454f15
                     |          |
                     |          |--13.04%-- __read
                     |          |
                     |          |--13.04%-- vread_hpet
                     |          |
                     |          |--13.04%-- _xcb_lock_io
                     |          |
                     |           --13.04%-- 0x7f630878ce8
      
      After:
      
          50.00%  [k] _raw_read_lock
                  |
                  --- _read_lock
                     |
                     |--98.97%-- send_sigio
                     |          __kill_fasync
                     |          kill_fasync
                     |          evdev_pass_event
                     |          evdev_event
                     |          input_pass_event
                     |          input_handle_event
                     |          input_event
                     |          |
                     |          |--96.88%-- synaptics_process_byte
                     |          |          psmouse_handle_byte
                     |          |          psmouse_interrupt
                     |          |          serio_interrupt
                     |          |          i8042_interrupt
                     |          |          handle_IRQ_event
                     |          |          handle_edge_irq
                     |          |          handle_irq
                     |          |          __irqentry_text_start
                     |          |          ret_from_intr
                     |          |          |
                     |          |          |--39.78%-- __const_udelay
                     |          |          |          |
                     |          |          |          |--91.89%-- ath5k_hw_register_timeout
                     |          |          |          |          ath5k_hw_noise_floor_calibration
                     |          |          |          |          ath5k_hw_reset
                     |          |          |          |          ath5k_reset
                     |          |          |          |          ath5k_config
                     |          |          |          |          ieee80211_hw_config
                     |          |          |          |          |
                     |          |          |          |          |--88.24%-- ieee80211_scan_work
                     |          |          |          |          |          worker_thread
                     |          |          |          |          |          kthread
                     |          |          |          |          |          child_rip
                     |          |          |          |          |
                     |          |          |          |           --11.76%-- ieee80211_scan_completed
                     |          |          |          |                     ieee80211_scan_work
                     |          |          |          |                     worker_thread
                     |          |          |          |                     kthread
                     |          |          |          |                     child_rip
                     |          |          |          |
                     |          |          |           --8.11%-- ath5k_hw_noise_floor_calibration
                     |          |          |                     ath5k_hw_reset
                     |          |          |                     ath5k_reset
                     |          |          |                     ath5k_config
      
      Note: This does not only affect perf events but also x86-64
      stacktraces. They were considered as unreliable once we quit
      the irq stack frame.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      af2d8289
    • F
      x86: Fixup wrong debug exception frame link in stacktraces · b625b3b3
      Frederic Weisbecker 提交于
      While dumping a stacktrace, the end of the exception stack won't link
      the frame pointer to the previous stack.
      
      The interrupted stack will then be considered as unreliable and ignored
      by perf, as the frame pointer is unreliable itself.
      
      This happens because we overwrite the frame pointer that links to the
      interrupted frame with the address of the exception stack. This is
      done in order to reserve space inside.
      But rbp has been chosen here only because it is not a scratch register,
      so that the address of the exception stack remains in rbp after calling
      do_debug(), we can then release the exception stack space without the
      need to retrieve its address again.
      
      But we can pick another non-scratch register to do that, so that we
      preserve the link to the interrupted stack frame in the stacktraces.
      
      Just randomly choose r12. Every registers are saved just before and
      restored just after calling do_debug(). And r12 is not used in the
      middle, which makes it a perfect candidate.
      
      Example: perf record -g -a -c 1 -f -e mem:$(tasklist_lock_addr):rw
      
      Before:
          44.18%  [k] _raw_read_lock
                  |
                  |
                  ---  |--6.31%-- waitid
                       |
                       |--4.26%-- writev
                       |
                       |--3.63%-- __select
                       |
                       |--3.15%-- __waitpid
                       |          |
                       |          |--28.57%-- 0x8b52e00000139f
                       |          |
                       |          |--28.57%-- 0x8b52e0000013c6
                       |          |
                       |          |--14.29%-- 0x7fde786dc000
                       |          |
                       |          |--14.29%-- 0x62696c2f7273752f
                       |          |
                       |           --14.29%-- 0x1ea9df800000000
                       |
                       |--3.00%-- __poll
      
      After:
      
          43.94%  [k] _raw_read_lock
                  |
                  --- _read_lock
                     |
                     |--60.53%-- send_sigio
                     |          __kill_fasync
                     |          kill_fasync
                     |          evdev_pass_event
                     |          evdev_event
                     |          input_pass_event
                     |          input_handle_event
                     |          input_event
                     |          synaptics_process_byte
                     |          psmouse_handle_byte
                     |          psmouse_interrupt
                     |          serio_interrupt
                     |          i8042_interrupt
                     |          handle_IRQ_event
                     |          handle_edge_irq
                     |          handle_irq
                     |          __irqentry_text_start
                     |          ret_from_intr
                     |          |
                     |          |--30.43%-- __select
                     |          |
                     |          |--17.39%-- 0x454f15
                     |          |
                     |          |--13.04%-- __read
                     |          |
                     |          |--13.04%-- vread_hpet
                     |          |
                     |          |--13.04%-- _xcb_lock_io
                     |          |
                     |           --13.04%-- 0x7f630878ce87
      
      Note: it does not only affect perf events but also other stacktraces in
      x86-64. They were considered as unreliable once we quit the debug
      stack frame.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      b625b3b3
    • F
      x86/perf: Exclude the debug stack from the callchains · 7f33f9c5
      Frederic Weisbecker 提交于
      Dumping the callchains from breakpoint events with perf gives strange
      results:
      
      3.75%             perf  [kernel]           [k] _raw_read_unlock
                             |
                             --- _raw_read_unlock
                                 perf_callchain
                                 perf_prepare_sample
                                 __perf_event_overflow
                                 perf_swevent_overflow
                                 perf_swevent_add
                                 perf_bp_event
                                 hw_breakpoint_exceptions_notify
                                 notifier_call_chain
                                 __atomic_notifier_call_chain
                                 atomic_notifier_call_chain
                                 notify_die
                                 do_debug
                                 debug
                                 munmap
      
      We are infected with all the debug stack. Like the nmi stack, the debug
      stack is undesired as it is part of the profiling path, not helpful for
      the user.
      
      Ignore it.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
      7f33f9c5
    • F
      hw-breakpoints: Use overflow handler instead of the event callback · b326e956
      Frederic Weisbecker 提交于
      struct perf_event::event callback was called when a breakpoint
      triggers. But this is a rather opaque callback, pretty
      tied-only to the breakpoint API and not really integrated into perf
      as it triggers even when we don't overflow.
      
      We prefer to use overflow_handler() as it fits into the perf events
      rules, being called only when we overflow.
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
      b326e956
    • F
      hw-breakpoints: Drop callback and task parameters from modify helper · 2f0993e0
      Frederic Weisbecker 提交于
      Drop the callback and task parameters from modify_user_hw_breakpoint().
      For now we have no user that need to modify a breakpoint to the point
      of changing its handler or its task context.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
      2f0993e0
  3. 02 12月, 2009 1 次提交
  4. 27 11月, 2009 1 次提交
  5. 26 11月, 2009 3 次提交
  6. 25 11月, 2009 2 次提交
    • T
      x86: Rename global percpu symbol dr7 to cpu_dr7 · 28b4e0d8
      Tejun Heo 提交于
      Percpu symbols now occupy the same namespace as other global
      symbols and as such short global symbols without subsystem
      prefix tend to collide with local variables.  dr7 percpu
      variable used by x86 was hit by this. Rename it to cpu_dr7.
      
      The rename also makes it more consistent with its fellow
      cpu_debugreg percpu variable.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>,
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <20091125115856.GA17856@elte.hu>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      28b4e0d8
    • S
      perf_events, x86: Fix validate_event bug · 1261a02a
      Stephane Eranian 提交于
      The validate_event() was failing on valid event combinations. The
      function was assuming that if x86_schedule_event() returned 0, it
      meant error. But x86_schedule_event() returns the counter index and
      0 is a perfectly valid value. An error is returned if the function
      returns a negative value.
      
      Furthermore, validate_event() was also failing for event groups
      because the event->pmu was not set until after
      hw_perf_event_init().
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Cc: peterz@infradead.org
      Cc: paulus@samba.org
      Cc: perfmon2-devel@lists.sourceforge.net
      Cc: eranian@gmail.com
      LKML-Reference: <4b0bdf36.1818d00a.07cc.25ae@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      --
       arch/x86/kernel/cpu/perf_event.c |    4 ++--
       1 file changed, 2 insertions(+), 2 deletions(-)
      1261a02a
  7. 24 11月, 2009 2 次提交
  8. 23 11月, 2009 1 次提交
  9. 14 11月, 2009 1 次提交
    • I
      hw-breakpoints, x86: Fix modular KVM build · 68efa37d
      Ingo Molnar 提交于
      This build error:
      
      arch/x86/kvm/x86.c:3655: error: implicit declaration of function 'hw_breakpoint_restore'
      
      Happens because in the CONFIG_KVM=m case there's no 'CONFIG_KVM' define
      in the kernel - it's CONFIG_KVM_MODULE in that case.
      
      Make the prototype available unconditionally.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      LKML-Reference: <1258114575-32655-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      68efa37d
  10. 12 11月, 2009 2 次提交
  11. 11 11月, 2009 1 次提交
  12. 10 11月, 2009 3 次提交
  13. 08 11月, 2009 2 次提交
    • J
      x86/PCI: Adjust GFP mask handling for coherent allocations · eb647138
      Jan Beulich 提交于
      Rather than forcing GFP flags and DMA mask to be inconsistent,
      GFP flags should be determined even for the fallback device
      through dma_alloc_coherent_mask()/dma_alloc_coherent_gfp_flags().
      
      This restores 64-bit behavior as it was prior to commits
      8965eb19 and
      4a367f3a (not sure why there are
      two of them), where GFP_DMA was forced on for 32-bit, but not
      for 64-bit, with the slight adjustment that afaict even 32-bit
      doesn't need this without CONFIG_ISA.
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Acked-by: NTakashi Iwai <tiwai@suse.de>
      LKML-Reference: <4AF18187020000780001D8AA@vpn.id2.novell.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      eb647138
    • F
      hw-breakpoints: Rewrite the hw-breakpoints layer on top of perf events · 24f1e32c
      Frederic Weisbecker 提交于
      This patch rebase the implementation of the breakpoints API on top of
      perf events instances.
      
      Each breakpoints are now perf events that handle the
      register scheduling, thread/cpu attachment, etc..
      
      The new layering is now made as follows:
      
             ptrace       kgdb      ftrace   perf syscall
                \          |          /         /
                 \         |         /         /
                                              /
                  Core breakpoint API        /
                                            /
                           |               /
                           |              /
      
                    Breakpoints perf events
      
                           |
                           |
      
                     Breakpoints PMU ---- Debug Register constraints handling
                                          (Part of core breakpoint API)
                           |
                           |
      
                   Hardware debug registers
      
      Reasons of this rewrite:
      
      - Use the centralized/optimized pmu registers scheduling,
        implying an easier arch integration
      - More powerful register handling: perf attributes (pinned/flexible
        events, exclusive/non-exclusive, tunable period, etc...)
      
      Impact:
      
      - New perf ABI: the hardware breakpoints counters
      - Ptrace breakpoints setting remains tricky and still needs some per
        thread breakpoints references.
      
      Todo (in the order):
      
      - Support breakpoints perf counter events for perf tools (ie: implement
        perf_bpcounter_event())
      - Support from perf tools
      
      Changes in v2:
      
      - Follow the perf "event " rename
      - The ptrace regression have been fixed (ptrace breakpoint perf events
        weren't released when a task ended)
      - Drop the struct hw_breakpoint and store generic fields in
        perf_event_attr.
      - Separate core and arch specific headers, drop
        asm-generic/hw_breakpoint.h and create linux/hw_breakpoint.h
      - Use new generic len/type for breakpoint
      - Handle off case: when breakpoints api is not supported by an arch
      
      Changes in v3:
      
      - Fix broken CONFIG_KVM, we need to propagate the breakpoint api
        changes to kvm when we exit the guest and restore the bp registers
        to the host.
      
      Changes in v4:
      
      - Drop the hw_breakpoint_restore() stub as it is only used by KVM
      - EXPORT_SYMBOL_GPL hw_breakpoint_restore() as KVM can be built as a
        module
      - Restore the breakpoints unconditionally on kvm guest exit:
        TIF_DEBUG_THREAD doesn't anymore cover every cases of running
        breakpoints and vcpu->arch.switch_db_regs might not always be
        set when the guest used debug registers.
        (Waiting for a reliable optimization)
      
      Changes in v5:
      
      - Split-up the asm-generic/hw-breakpoint.h moving to
        linux/hw_breakpoint.h into a separate patch
      - Optimize the breakpoints restoring while switching from kvm guest
        to host. We only want to restore the state if we have active
        breakpoints to the host, otherwise we don't care about messed-up
        address registers.
      - Add asm/hw_breakpoint.h to Kbuild
      - Fix bad breakpoint type in trace_selftest.c
      
      Changes in v6:
      
      - Fix wrong header inclusion in trace.h (triggered a build
        error with CONFIG_FTRACE_SELFTEST
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jan Kiszka <jan.kiszka@web.de>
      Cc: Jiri Slaby <jirislaby@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      24f1e32c
  14. 07 11月, 2009 1 次提交
  15. 04 11月, 2009 2 次提交
    • S
      x86, fs: Fix x86 procfs stack information for threads on 64-bit · 89240ba0
      Stefani Seibold 提交于
      This patch fixes two issues in the procfs stack information on
      x86-64 linux.
      
      The 32 bit loader compat_do_execve did not store stack
      start. (this was figured out by Alexey Dobriyan).
      
      The stack information on a x64_64 kernel always shows 0 kbyte
      stack usage, because of a missing implementation of the KSTK_ESP
      macro which always returned -1.
      
      The new implementation now returns the right value.
      Signed-off-by: NStefani Seibold <stefani@seibold.net>
      Cc: Americo Wang <xiyou.wangcong@gmail.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <1257240160.4889.24.camel@wall-e>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      89240ba0
    • P
      x86/hw-breakpoints: Actually flush thread breakpoints in flush_thread(). · 41a48d14
      Paul Mundt 提交于
      flush_thread() tries to do a TIF_DEBUG check before calling in to
      flush_thread_hw_breakpoint() (which subsequently clears the thread flag),
      but for some reason, the x86 code is manually clearing TIF_DEBUG
      immediately before the test, so this path will never be taken.
      
      This kills off the erroneous clear_tsk_thread_flag() and lets
      flush_thread_hw_breakpoint() actually get invoked.
      
      Presumably folks were getting lucky with testing and the
      free_thread_info() -> free_thread_xstate() path was taking care of the
      flush there.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      Acked-by: N"K.Prasad" <prasad@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      LKML-Reference: <20091005102306.GA7889@linux-sh.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      41a48d14
  16. 02 11月, 2009 2 次提交
  17. 29 10月, 2009 1 次提交
  18. 27 10月, 2009 1 次提交
  19. 26 10月, 2009 1 次提交
    • J
      x86: crash_dump: Fix non-pae kdump kernel memory accesses · 72ed7de7
      Jiri Slaby 提交于
      Non-PAE 32-bit dump kernels may wrap an address around 4G and
      poke unwanted space. ptes there are 32-bit long, and since
      pfn << PAGE_SIZE may exceed this limit, high pfn bits are
      cropped and wrong address mapped by kmap_atomic_pfn in
      copy_oldmem_page.
      
      Don't allow this behavior in non-PAE kdump kernels by checking
      pfns passed into copy_oldmem_page. In the case of failure,
      userspace process gets EFAULT.
      
      [v2]
      - fix comments
      - move ifdefs inside the function
      Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Simon Horman <horms@verge.net.au>
      Cc: Paul Mundt <lethal@linux-sh.org>
      LKML-Reference: <1256551903-30567-1-git-send-email-jirislaby@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      72ed7de7
  20. 16 10月, 2009 6 次提交
  21. 15 10月, 2009 1 次提交