1. 22 7月, 2010 1 次提交
    • R
      x86: auditsyscall: fix fastpath return value after reschedule · 03275591
      Roland McGrath 提交于
      In the CONFIG_AUDITSYSCALL fast-path for x86 64-bit system calls,
      we can pass a bad return value and/or error indication for the
      system call to audit_syscall_exit().  This happens when
      TIF_NEED_RESCHED was set as the system call returned, so we went
      out to schedule() and came back to the exit-audit fast-path.  The
      fix is to reload the user return value register from the pt_regs
      before using it for audit_syscall_exit().
      
      Both the 32-bit kernel's fast path and the 64-bit kernel's 32-bit
      system call fast paths work slightly differently, so that they
      always leave the fast path entirely to reschedule and don't return
      there, so they don't have the analogous bugs.
      Reported-by: NAlexander Viro <aviro@redhat.com>
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      03275591
  2. 11 12月, 2009 1 次提交
  3. 06 12月, 2009 1 次提交
    • F
      x86: Fixup wrong debug exception frame link in stacktraces · b625b3b3
      Frederic Weisbecker 提交于
      While dumping a stacktrace, the end of the exception stack won't link
      the frame pointer to the previous stack.
      
      The interrupted stack will then be considered as unreliable and ignored
      by perf, as the frame pointer is unreliable itself.
      
      This happens because we overwrite the frame pointer that links to the
      interrupted frame with the address of the exception stack. This is
      done in order to reserve space inside.
      But rbp has been chosen here only because it is not a scratch register,
      so that the address of the exception stack remains in rbp after calling
      do_debug(), we can then release the exception stack space without the
      need to retrieve its address again.
      
      But we can pick another non-scratch register to do that, so that we
      preserve the link to the interrupted stack frame in the stacktraces.
      
      Just randomly choose r12. Every registers are saved just before and
      restored just after calling do_debug(). And r12 is not used in the
      middle, which makes it a perfect candidate.
      
      Example: perf record -g -a -c 1 -f -e mem:$(tasklist_lock_addr):rw
      
      Before:
          44.18%  [k] _raw_read_lock
                  |
                  |
                  ---  |--6.31%-- waitid
                       |
                       |--4.26%-- writev
                       |
                       |--3.63%-- __select
                       |
                       |--3.15%-- __waitpid
                       |          |
                       |          |--28.57%-- 0x8b52e00000139f
                       |          |
                       |          |--28.57%-- 0x8b52e0000013c6
                       |          |
                       |          |--14.29%-- 0x7fde786dc000
                       |          |
                       |          |--14.29%-- 0x62696c2f7273752f
                       |          |
                       |           --14.29%-- 0x1ea9df800000000
                       |
                       |--3.00%-- __poll
      
      After:
      
          43.94%  [k] _raw_read_lock
                  |
                  --- _read_lock
                     |
                     |--60.53%-- send_sigio
                     |          __kill_fasync
                     |          kill_fasync
                     |          evdev_pass_event
                     |          evdev_event
                     |          input_pass_event
                     |          input_handle_event
                     |          input_event
                     |          synaptics_process_byte
                     |          psmouse_handle_byte
                     |          psmouse_interrupt
                     |          serio_interrupt
                     |          i8042_interrupt
                     |          handle_IRQ_event
                     |          handle_edge_irq
                     |          handle_irq
                     |          __irqentry_text_start
                     |          ret_from_intr
                     |          |
                     |          |--30.43%-- __select
                     |          |
                     |          |--17.39%-- 0x454f15
                     |          |
                     |          |--13.04%-- __read
                     |          |
                     |          |--13.04%-- vread_hpet
                     |          |
                     |          |--13.04%-- _xcb_lock_io
                     |          |
                     |           --13.04%-- 0x7f630878ce87
      
      Note: it does not only affect perf events but also other stacktraces in
      x86-64. They were considered as unreliable once we quit the debug
      stack frame.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      b625b3b3
  4. 04 11月, 2009 1 次提交
  5. 15 10月, 2009 1 次提交
  6. 14 10月, 2009 1 次提交
    • S
      function-graph/x86: Replace unbalanced ret with jmp · 194ec341
      Steven Rostedt 提交于
      The function graph tracer replaces the return address with a hook
      to trace the exit of the function call. This hook will finish by
      returning to the real location the function should return to.
      
      But the current implementation uses a ret to jump to the real
      return location. This causes a imbalance between calls and ret.
      That is the original function does a call, the ret goes to the
      handler and then the handler does a ret without a matching call.
      
      Although the function graph tracer itself still breaks the branch
      predictor by replacing the original ret, by using a second ret and
      causing an imbalance, it breaks the predictor even more.
      
      This patch replaces the ret with a jmp to keep the calls and ret
      balanced. I tested this on one box and it showed a 1.7% increase in
      performance. Another box only showed a small 0.3% increase. But no
      box that I tested this on showed a decrease in performance by
      making this change.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Acked-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20091013203425.042034383@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      194ec341
  7. 13 10月, 2009 1 次提交
  8. 23 9月, 2009 1 次提交
  9. 21 9月, 2009 1 次提交
    • I
      perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482
      Ingo Molnar 提交于
      Bye-bye Performance Counters, welcome Performance Events!
      
      In the past few months the perfcounters subsystem has grown out its
      initial role of counting hardware events, and has become (and is
      becoming) a much broader generic event enumeration, reporting, logging,
      monitoring, analysis facility.
      
      Naming its core object 'perf_counter' and naming the subsystem
      'perfcounters' has become more and more of a misnomer. With pending
      code like hw-breakpoints support the 'counter' name is less and
      less appropriate.
      
      All in one, we've decided to rename the subsystem to 'performance
      events' and to propagate this rename through all fields, variables
      and API names. (in an ABI compatible fashion)
      
      The word 'event' is also a bit shorter than 'counter' - which makes
      it slightly more convenient to write/handle as well.
      
      Thanks goes to Stephane Eranian who first observed this misnomer and
      suggested a rename.
      
      User-space tooling and ABI compatibility is not affected - this patch
      should be function-invariant. (Also, defconfigs were not touched to
      keep the size down.)
      
      This patch has been generated via the following script:
      
        FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
      
        sed -i \
          -e 's/PERF_EVENT_/PERF_RECORD_/g' \
          -e 's/PERF_COUNTER/PERF_EVENT/g' \
          -e 's/perf_counter/perf_event/g' \
          -e 's/nb_counters/nb_events/g' \
          -e 's/swcounter/swevent/g' \
          -e 's/tpcounter_event/tp_event/g' \
          $FILES
      
        for N in $(find . -name perf_counter.[ch]); do
          M=$(echo $N | sed 's/perf_counter/perf_event/g')
          mv $N $M
        done
      
        FILES=$(find . -name perf_event.*)
      
        sed -i \
          -e 's/COUNTER_MASK/REG_MASK/g' \
          -e 's/COUNTER/EVENT/g' \
          -e 's/\<event\>/event_id/g' \
          -e 's/counter/event/g' \
          -e 's/Counter/Event/g' \
          $FILES
      
      ... to keep it as correct as possible. This script can also be
      used by anyone who has pending perfcounters patches - it converts
      a Linux kernel tree over to the new naming. We tried to time this
      change to the point in time where the amount of pending patches
      is the smallest: the end of the merge window.
      
      Namespace clashes were fixed up in a preparatory patch - and some
      stylistic fallout will be fixed up in a subsequent patch.
      
      ( NOTE: 'counters' are still the proper terminology when we deal
        with hardware registers - and these sed scripts are a bit
        over-eager in renaming them. I've undone some of that, but
        in case there's something left where 'counter' would be
        better than 'event' we can undo that on an individual basis
        instead of touching an otherwise nicely automated patch. )
      Suggested-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdd6c482
  10. 13 9月, 2009 1 次提交
  11. 30 8月, 2009 1 次提交
  12. 19 6月, 2009 1 次提交
    • S
      function-graph: add stack frame test · 71e308a2
      Steven Rostedt 提交于
      In case gcc does something funny with the stack frames, or the return
      from function code, we would like to detect that.
      
      An arch may implement passing of a variable that is unique to the
      function and can be saved on entering a function and can be tested
      when exiting the function. Usually the frame pointer can be used for
      this purpose.
      
      This patch also implements this for x86. Where it passes in the stack
      frame of the parent function, and will test that frame on exit.
      
      There was a case in x86_32 with optimize for size (-Os) where, for a
      few functions, gcc would align the stack frame and place a copy of the
      return address into it. The function graph tracer modified the copy and
      not the actual return address. On return from the funtion, it did not go
      to the tracer hook, but returned to the parent. This broke the function
      graph tracer, because the return of the parent (where gcc did not do
      this funky manipulation) returned to the location that the child function
      was suppose to. This caused strange kernel crashes.
      
      This test detected the problem and pointed out where the issue was.
      
      This modifies the parameters of one of the functions that the arch
      specific code calls, so it includes changes to arch code to accommodate
      the new prototype.
      
      Note, I notice that the parsic arch implements its own push_return_trace.
      This is now a generic function and the ftrace_push_return_trace should be
      used instead. This patch does not touch that code.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      71e308a2
  13. 04 6月, 2009 2 次提交
    • A
      x86: fix panic with interrupts off (needed for MCE) · 4ef702c1
      Andi Kleen 提交于
      For some time each panic() called with interrupts disabled
      triggered the !irqs_disabled() WARN_ON in smp_call_function(),
      producing ugly backtraces and confusing users.
      
      This is a common situation with machine checks for example which
      tend to call panic with interrupts disabled, but will also hit
      in other situations e.g. panic during early boot.  In fact it
      means that panic cannot be called in many circumstances, which
      would be bad.
      
      This all started with the new fancy queued smp_call_function,
      which is then used by the shutdown path to shut down the other
      CPUs.
      
      On closer examination it turned out that the fancy RCU
      smp_call_function() does lots of things not suitable in a panic
      situation anyways, like allocating memory and relying on complex
      system state.
      
      I originally tried to patch this over by checking for panic
      there, but it was quite complicated and the original patch
      was also not very popular.  This also didn't fix some of the
      underlying complexity problems.
      
      The new code in post 2.6.29 tries to patch around this by
      checking for oops_in_progress, but that is not enough to make
      this fully safe and I don't think that's a real solution
      because panic has to be reliable.
      
      So instead use an own vector to reboot.  This makes the reboot
      code extremly straight forward, which is definitely a big plus
      in a panic situation where it is important to avoid relying on
      too much kernel state.  The new simple code is also safe to be
      called from interupts off region because it is very very simple.
      
      There can be situations where it is important that panic
      is reliable.  For example on a fatal machine check the panic
      is needed to get the system up again and running as quickly
      as possible.  So it's important that panic is reliable and
      all function it calls simple.
      
      This is why I came up with this simple vector scheme.
      It's very hard to beat in simplicity.  Vectors are not
      particularly precious anymore since all big systems are
      using per CPU vectors.
      
      Another possibility would have been to use an NMI similar
      to kdump, but there is still the problem that NMIs don't
      work reliably on some systems due to BIOS issues.  NMIs
      would have been able to stop CPUs running with interrupts
      off too.  In the sake of universal reliability I opted for
      using a non NMI vector for now.
      
      I put the reboot vector into the highest priority bucket of
      the APIC vectors and moved the 64bit UV_BAU message down
      instead into the next lower priority.
      
      [ Impact: bug fix, fixes an old regression ]
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      4ef702c1
    • A
      x86, mce: implement bootstrapping for machine check wakeups · ccc3c319
      Andi Kleen 提交于
      Machine checks support waking up the mcelog daemon quickly.
      
      The original wake up code for this was pretty ugly, relying on
      a idle notifier and a special process flag. The reason it did
      it this way is that the machine check handler is not subject
      to normal interrupt locking rules so it's not safe
      to call wake_up().  Instead it set a process flag
      and then either did the wakeup in the syscall return
      or in the idle notifier.
      
      This patch adds a new "bootstraping" method as replacement.
      
      The idea is that the handler checks if it's in a state where
      it is unsafe to call wake_up(). If it's safe it calls it directly.
      When it's not safe -- that is it interrupted in a critical
      section with interrupts disables -- it uses a new "self IPI" to trigger
      an IPI to its own CPU. This can be done safely because IPI
      triggers are atomic with some care. The IPI is raised
      once the interrupts are reenabled and can then safely call
      wake_up().
      
      When APICs are disabled the event is just queued and will be picked up
      eventually by the next polling timer. I think that's a reasonable
      compromise, since it should only happen quite rarely.
      
      Contains fixes from Ying Huang.
      
      [ solve conflict on irqinit, make it work on 32bit (entry_arch.h) - HS ]
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      ccc3c319
  14. 03 6月, 2009 1 次提交
    • Y
      perf_counter/x86: Remove the IRQ (non-NMI) handling bits · a3288106
      Yong Wang 提交于
      Remove the IRQ (non-NMI) handling bits as NMI will be used always.
      Signed-off-by: NYong Wang <yong.y.wang@intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      LKML-Reference: <20090603051255.GA2791@ywang-moblin2.bj.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a3288106
  15. 29 5月, 2009 2 次提交
  16. 09 5月, 2009 1 次提交
  17. 18 4月, 2009 1 次提交
    • S
      lockdep, x86: account for irqs enabled in paranoid_exit · 0300e7f1
      Steven Rostedt 提交于
      I hit the check_flags error of lockdep:
      
       WARNING: at kernel/lockdep.c:2893 check_flags+0x1a7/0x1d0()
       [...]
       hardirqs last  enabled at (12567): [<ffffffff8026206a>] local_bh_enable+0xaa/0x110
       hardirqs last disabled at (12569): [<ffffffff80610c76>] int3+0x16/0x40
       softirqs last  enabled at (12566): [<ffffffff80514d2b>] lock_sock_nested+0xfb/0x110
       softirqs last disabled at (12568): [<ffffffff8058454e>] tcp_prequeue_process+0x2e/0xa0
      
      The check_flags warning of lockdep tells me that lockdep thought interrupts
      were disabled, but they were really enabled.
      
      The numbers in the above parenthesis show the order of events:
      
       12566: softirqs last enabled:  lock_sock_nested
       12567: hardirqs last enabled:  local_bh_enable
       12568: softirqs last disabled: tcp_prequeue_process
       12566: hardirqs last disabled: int3
      
      int3 is a breakpoint!
      
      Examining this further, I have CONFIG_NET_TCPPROBE enabled which adds
      break points into the kernel.
      
      The paranoid_exit of the return of int3 does not account for enabling
      interrupts on return to kernel. This code is a bit tricky since it
      is also used by the nmi handler (when lockdep is off), and we must be
      careful about the swapgs. We can not call kernel code after the swapgs
      has been performed.
      
      [ Impact: fix lockdep check_flags warning + self-turn-off ]
      Acked-by: NPeter Zijlsta <a.p.zijlstra@chello.nl>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0300e7f1
  18. 10 4月, 2009 1 次提交
    • S
      x86, function-graph: only save return values on x86_64 · e71e99c2
      Steven Rostedt 提交于
      Impact: speed up
      
      The return to handler portion of the function graph tracer should only
      need to save the return values. The caller already saved off the
      registers that the callee can modify. The returning function already
      saved the registers it modified. When we call our own trace function
      it too will save the registers that the callee must restore.
      
      There's no reason to save off anything more that the registers used
      to return the values.
      
      Note, I did a complete kernel build with this modification and the
      function graph tracer running on x86_64.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e71e99c2
  19. 07 4月, 2009 1 次提交
  20. 12 3月, 2009 2 次提交
  21. 05 3月, 2009 1 次提交
  22. 25 2月, 2009 2 次提交
  23. 14 2月, 2009 1 次提交
  24. 03 2月, 2009 1 次提交
  25. 31 1月, 2009 1 次提交
  26. 21 1月, 2009 1 次提交
  27. 18 1月, 2009 4 次提交
  28. 16 1月, 2009 1 次提交
    • T
      x86: merge 64 and 32 SMP percpu handling · 9939ddaf
      Tejun Heo 提交于
      Now that pda is allocated as part of percpu, percpu doesn't need to be
      accessed through pda.  Unify x86_64 SMP percpu access with x86_32 SMP
      one.  Other than the segment register, operand size and the base of
      percpu symbols, they behave identical now.
      
      This patch replaces now unnecessary pda->data_offset with a dummy
      field which is necessary to keep stack_canary at its place.  This
      patch also moves per_cpu_offset initialization out of init_gdt() into
      setup_per_cpu_areas().  Note that this change also necessitates
      explicit per_cpu_offset initializations in voyager_smp.c.
      
      With this change, x86_OP_percpu()'s are as efficient on x86_64 as on
      x86_32 and also x86_64 can use assembly PER_CPU macros.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9939ddaf
  29. 11 1月, 2009 1 次提交
  30. 17 12月, 2008 1 次提交
  31. 08 12月, 2008 1 次提交
    • I
      performance counters: x86 support · 241771ef
      Ingo Molnar 提交于
      Implement performance counters for x86 Intel CPUs.
      
      It's simplified right now: the PERFMON CPU feature is assumed,
      which is available in Core2 and later Intel CPUs.
      
      The design is flexible to be extended to more CPU types as well.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      241771ef
  32. 03 12月, 2008 2 次提交