1. 21 11月, 2009 1 次提交
  2. 16 11月, 2009 1 次提交
    • P
      perf_event: Optimize perf_output_lock() · 559fdc3c
      Peter Zijlstra 提交于
      The purpose of perf_output_{un,}lock() is to:
      
       1) avoid publishing incomplete data
          [ possible when publishing a head that is ahead of an entry
            that is still being written ]
      
       2) guarantee fwd progress
          [ a simple refcount on pending writers doesn't need to drop to
            0, making it so would end up implementing something like forced
            quiecent states of RCU ]
      
      To satisfy the above without undue complexity it serializes
      between CPUs, this means that a pending writer can only be the
      same cpu in a nested context, and since (under normal operation)
      a cpu always makes progress we're good -- if the head is only
      published when the bottom  most writer completes.
      
      Now we don't need to disable IRQs in order to serialize between
      CPUs, disabling preemption ought to be sufficient, esp since we
      already deal with nesting due to NMIs.
      
      This avoids potentially expensive (and needless) local IRQ
      disable/enable ops.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1258373161.26714.254.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      559fdc3c
  3. 14 11月, 2009 3 次提交
  4. 13 11月, 2009 1 次提交
  5. 12 11月, 2009 3 次提交
  6. 11 11月, 2009 1 次提交
  7. 08 11月, 2009 2 次提交
    • F
      hw-breakpoints: Rewrite the hw-breakpoints layer on top of perf events · 24f1e32c
      Frederic Weisbecker 提交于
      This patch rebase the implementation of the breakpoints API on top of
      perf events instances.
      
      Each breakpoints are now perf events that handle the
      register scheduling, thread/cpu attachment, etc..
      
      The new layering is now made as follows:
      
             ptrace       kgdb      ftrace   perf syscall
                \          |          /         /
                 \         |         /         /
                                              /
                  Core breakpoint API        /
                                            /
                           |               /
                           |              /
      
                    Breakpoints perf events
      
                           |
                           |
      
                     Breakpoints PMU ---- Debug Register constraints handling
                                          (Part of core breakpoint API)
                           |
                           |
      
                   Hardware debug registers
      
      Reasons of this rewrite:
      
      - Use the centralized/optimized pmu registers scheduling,
        implying an easier arch integration
      - More powerful register handling: perf attributes (pinned/flexible
        events, exclusive/non-exclusive, tunable period, etc...)
      
      Impact:
      
      - New perf ABI: the hardware breakpoints counters
      - Ptrace breakpoints setting remains tricky and still needs some per
        thread breakpoints references.
      
      Todo (in the order):
      
      - Support breakpoints perf counter events for perf tools (ie: implement
        perf_bpcounter_event())
      - Support from perf tools
      
      Changes in v2:
      
      - Follow the perf "event " rename
      - The ptrace regression have been fixed (ptrace breakpoint perf events
        weren't released when a task ended)
      - Drop the struct hw_breakpoint and store generic fields in
        perf_event_attr.
      - Separate core and arch specific headers, drop
        asm-generic/hw_breakpoint.h and create linux/hw_breakpoint.h
      - Use new generic len/type for breakpoint
      - Handle off case: when breakpoints api is not supported by an arch
      
      Changes in v3:
      
      - Fix broken CONFIG_KVM, we need to propagate the breakpoint api
        changes to kvm when we exit the guest and restore the bp registers
        to the host.
      
      Changes in v4:
      
      - Drop the hw_breakpoint_restore() stub as it is only used by KVM
      - EXPORT_SYMBOL_GPL hw_breakpoint_restore() as KVM can be built as a
        module
      - Restore the breakpoints unconditionally on kvm guest exit:
        TIF_DEBUG_THREAD doesn't anymore cover every cases of running
        breakpoints and vcpu->arch.switch_db_regs might not always be
        set when the guest used debug registers.
        (Waiting for a reliable optimization)
      
      Changes in v5:
      
      - Split-up the asm-generic/hw-breakpoint.h moving to
        linux/hw_breakpoint.h into a separate patch
      - Optimize the breakpoints restoring while switching from kvm guest
        to host. We only want to restore the state if we have active
        breakpoints to the host, otherwise we don't care about messed-up
        address registers.
      - Add asm/hw_breakpoint.h to Kbuild
      - Fix bad breakpoint type in trace_selftest.c
      
      Changes in v6:
      
      - Fix wrong header inclusion in trace.h (triggered a build
        error with CONFIG_FTRACE_SELFTEST
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jan Kiszka <jan.kiszka@web.de>
      Cc: Jiri Slaby <jirislaby@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      24f1e32c
    • F
      tracing, perf_events: Protect the buffer from recursion in perf · 444a2a3b
      Frederic Weisbecker 提交于
      While tracing using events with perf, if one enables the
      lockdep:lock_acquire event, it will infect every other perf
      trace events.
      
      Basically, you can enable whatever set of trace events through
      perf but if this event is part of the set, the only result we
      can get is a long list of lock_acquire events of rcu read lock,
      and only that.
      
      This is because of a recursion inside perf.
      
      1) When a trace event is triggered, it will fill a per cpu
         buffer and submit it to perf.
      
      2) Perf will commit this event but will also protect some data
         using rcu_read_lock
      
      3) A recursion appears: rcu_read_lock triggers a lock_acquire
         event that will fill the per cpu event and then submit the
         buffer to perf.
      
      4) Perf detects a recursion and ignores it
      
      5) Perf continues its work on the previous event, but its buffer
         has been overwritten by the lock_acquire event, it has then
         been turned into a lock_acquire event of rcu read lock
      
      Such scenario also happens with lock_release with
      rcu_read_unlock().
      
      We could turn the rcu_read_lock() into __rcu_read_lock() to drop
      the lock debugging from perf fast path, but that would make us
      lose the rcu debugging and that doesn't prevent from other
      possible kind of recursion from perf in the future.
      
      This patch adds a recursion protection based on a counter on the
      perf trace per cpu buffers to solve the problem.
      
      -v2: Fixed lost whitespace, added reviewed-by tag
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Reviewed-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Jason Baron <jbaron@redhat.com>
      LKML-Reference: <1257477185-7838-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      444a2a3b
  8. 07 11月, 2009 2 次提交
  9. 06 11月, 2009 2 次提交
    • J
      netfilter: nf_nat: fix NAT issue in 2.6.30.4+ · f9dd09c7
      Jozsef Kadlecsik 提交于
      Vitezslav Samel discovered that since 2.6.30.4+ active FTP can not work
      over NAT. The "cause" of the problem was a fix of unacknowledged data
      detection with NAT (commit a3a9f79e).
      However, actually, that fix uncovered a long standing bug in TCP conntrack:
      when NAT was enabled, we simply updated the max of the right edge of
      the segments we have seen (td_end), by the offset NAT produced with
      changing IP/port in the data. However, we did not update the other parameter
      (td_maxend) which is affected by the NAT offset. Thus that could drift
      away from the correct value and thus resulted breaking active FTP.
      
      The patch below fixes the issue by *not* updating the conntrack parameters
      from NAT, but instead taking into account the NAT offsets in conntrack in a
      consistent way. (Updating from NAT would be more harder and expensive because
      it'd need to re-calculate parameters we already calculated in conntrack.)
      Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9dd09c7
    • F
      hw-breakpoint: Move asm-generic/hw_breakpoint.h to linux/hw_breakpoint.h · 2da3e160
      Frederic Weisbecker 提交于
      We plan to make the breakpoints parameters generic among architectures.
      For that it's better to move the asm-generic header to a generic linux
      header.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      2da3e160
  10. 04 11月, 2009 2 次提交
    • F
      perf/core: Add a callback to perf events · 97eaf530
      Frederic Weisbecker 提交于
      A simple callback in a perf event can be used for multiple purposes.
      For example it is useful for triggered based events like hardware
      breakpoints that need a callback to dispatch a triggered breakpoint
      event.
      
      v2: Simplify a bit the callback attribution as suggested by Paul
          Mackerras
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "K.Prasad" <prasad@linux.vnet.ibm.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mundt <lethal@linux-sh.org>
      97eaf530
    • A
      perf/core: Provide a kernel-internal interface to get to performance counters · fb0459d7
      Arjan van de Ven 提交于
      There are reasons for kernel code to ask for, and use, performance
      counters.
      For example, in CPU freq governors this tends to be a good idea, but
      there are other examples possible as well of course.
      
      This patch adds the needed bits to do enable this functionality; they
      have been tested in an experimental cpufreq driver that I'm working on,
      and the changes are all that I needed to access counters properly.
      
      [fweisbec@gmail.com: added pid to perf_event_create_kernel_counter so
      that we can profile a particular task too
      
      TODO: Have a better error reporting, don't just return NULL in fail
      case.]
      
      v2: Remove the wrong comment about the fact
          perf_event_create_kernel_counter must be called from a kernel
          thread.
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: "K.Prasad" <prasad@linux.vnet.ibm.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jan Kiszka <jan.kiszka@siemens.com>
      Cc: Jiri Slaby <jirislaby@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Jan Kiszka <jan.kiszka@web.de>
      Cc: Avi Kivity <avi@redhat.com>
      LKML-Reference: <20090925122556.2f8bd939@infradead.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      fb0459d7
  11. 03 11月, 2009 1 次提交
  12. 02 11月, 2009 1 次提交
    • E
      9p: fix readdir corner cases · 3e2796a9
      Eric Van Hensbergen 提交于
      The patch below also addresses a couple of other corner cases in readdir
      seen with a large (e.g. 64k) msize.  I'm not sure what people think of
      my co-opting of fid->aux here.  I'd be happy to rework if there's a better
      way.
      
      When the size of the user supplied buffer passed to readdir is smaller
      than the data returned in one go by the 9P read request, v9fs_dir_readdir()
      currently discards extra data so that, on the next call, a 9P read
      request will be issued with offset < previous offset + bytes returned,
      which voilates the constraint described in paragraph 3 of read(5) description.
      This patch preseves the leftover data in fid->aux for use in the next call.
      Signed-off-by: NJim Garlick <garlick@llnl.gov>
      Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
      3e2796a9
  13. 31 10月, 2009 2 次提交
  14. 30 10月, 2009 2 次提交
  15. 29 10月, 2009 5 次提交
  16. 28 10月, 2009 1 次提交
  17. 24 10月, 2009 1 次提交
  18. 23 10月, 2009 1 次提交
    • S
      perf events: Fix swevent hrtimer sampling by keeping track of remaining time... · 721a669b
      Soeren Sandmann 提交于
      perf events: Fix swevent hrtimer sampling by keeping track of remaining time when enabling/disabling swevent hrtimers
      
      Make the hrtimer based events work for sysprof.
      
      Whenever a swevent is scheduled out, the hrtimer is canceled.
      When it is scheduled back in, the timer is restarted. This
      happens every scheduler tick, which means the timer never
      expired because it was getting repeatedly restarted over and
      over with the same period.
      
      To fix that, save the remaining time when disabling; when
      reenabling, use that saved time as the period instead of the
      user-specified sampling period.
      
      Also, move the starting and stopping of the hrtimers to helper
      functions instead of duplicating the code.
      Signed-off-by: NSøren Sandmann Pedersen <sandmann@redhat.com>
      LKML-Reference: <ye8vdi7mluz.fsf@camel16.daimi.au.dk>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      721a669b
  19. 22 10月, 2009 2 次提交
    • R
      virtio_blk: Revert serial number support · 3225beab
      Rusty Russell 提交于
      This reverts "Add serial number support for virtio_blk, V4a".
      
      Turns out that virtio_pci, lguest and s/390 all have an 8 bit limit
      on virtio config space, so noone could ever use this.
      
      This is coming back later in a cleaner form.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: john cooper <john.cooper@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      3225beab
    • C
      virtio: let header files include virtio_ids.h · e95646c3
      Christian Borntraeger 提交于
      Rusty,
      
      commit 3ca4f5ca
          virtio: add virtio IDs file
      moved all device IDs into a single file. While the change itself is
      a very good one, it can break userspace applications. For example
      if a userspace tool wanted to get the ID of virtio_net it used to
      include virtio_net.h. This does no longer work, since virtio_net.h
      does not include virtio_ids.h.
      This patch moves all "#include <linux/virtio_ids.h>" from the C
      files into the header files, making the header files compatible with
      the old ones.
      
      In addition, this patch exports virtio_ids.h to userspace.
      
      CC: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      e95646c3
  20. 20 10月, 2009 1 次提交
  21. 16 10月, 2009 1 次提交
  22. 15 10月, 2009 4 次提交
    • I
      events: Harmonize event field names and print output names · 434a83c3
      Ingo Molnar 提交于
      Now that we can filter based on fields via perf record, people
      will start using filter expressions and will expect them to
      be obvious.
      
      The primary way to see which fields are available is by looking
      at the trace output, such as:
      
        gcc-18676 [000]   343.011728: irq_handler_entry: irq=0 handler=timer
        cc1-18677 [000]   343.012727: irq_handler_entry: irq=0 handler=timer
        cc1-18677 [000]   343.032692: irq_handler_entry: irq=0 handler=timer
        cc1-18677 [000]   343.033690: irq_handler_entry: irq=0 handler=timer
        cc1-18677 [000]   343.034687: irq_handler_entry: irq=0 handler=timer
        cc1-18677 [000]   343.035686: irq_handler_entry: irq=0 handler=timer
        cc1-18677 [000]   343.036684: irq_handler_entry: irq=0 handler=timer
      
      While 'irq==0' filters work, the 'handler==<x>' filter expression
      does not work:
      
        $ perf record -R -f -a -e irq:irq_handler_entry --filter handler=timer sleep 1
         Error: failed to set filter with 22 (Invalid argument)
      
      The problem is that while an 'irq' field exists and is recognized
      as a filter field - 'handler' does not exist - its name is 'name'
      in the output.
      
      To solve this, we need to synchronize the printout and the field
      names, wherever possible.
      
      In cases where the printout prints a non-field, we enclose
      that information in square brackets, such as:
      
        perf-1380  [013]   724.903505: softirq_exit: vec=9 [action=RCU]
        perf-1380  [013]   724.904482: softirq_exit: vec=1 [action=TIMER]
      
      This way users can use filter expressions more intuitively: all
      fields that show up as 'primary' (non-bracketed) information is
      filterable.
      
      This patch harmonizes the field names for all irq, bkl, power,
      sched and timer events.
      
      We might in fact think about dropping the print format bit of
      generic tracepoints altogether, and just print the fields that
      are being recorded.
      
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      434a83c3
    • L
      tracing/profile: Add filter support · 6fb2915d
      Li Zefan 提交于
      - Add an ioctl to allocate a filter for a perf event.
      
      - Free the filter when the associated perf event is to be freed.
      
      - Do the filtering in perf_swevent_match().
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <4AD69546.8050401@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6fb2915d
    • P
      rcu: Stopgap fix for synchronize_rcu_expedited() for TREE_PREEMPT_RCU · 019129d5
      Paul E. McKenney 提交于
      For the short term, map synchronize_rcu_expedited() to
      synchronize_rcu() for TREE_PREEMPT_RCU and to
      synchronize_sched_expedited() for TREE_RCU.
      
      Longer term, there needs to be a real expedited grace period for
      TREE_PREEMPT_RCU, but candidate patches to date are considerably
      more complex and intrusive.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: npiggin@suse.de
      Cc: jens.axboe@oracle.com
      LKML-Reference: <12555405592331-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      019129d5
    • L
      workqueue: add 'flush_delayed_work()' to run and wait for delayed work · 8c53e463
      Linus Torvalds 提交于
      It basically turns a delayed work into an immediate work, and then waits
      for it to finish, thus allowing you to force (and wait for) an immediate
      flush of a delayed work.
      
      We'll want to use this in the tty layer to clean up tty_flush_to_ldisc().
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      [ Fixed to use 'del_timer_sync()' as noted by Oleg ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8c53e463