1. 01 4月, 2013 2 次提交
  2. 18 3月, 2013 3 次提交
    • N
      perf/cgroup: Add __percpu annotation to perf_cgroup->info · 86e213e1
      Namhyung Kim 提交于
      It's a per-cpu data structure but missed the __percpu annotation.
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Link: http://lkml.kernel.org/r/1363600594-11453-1-git-send-email-namhyung@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      86e213e1
    • N
      perf: Generate EXIT event only once per task context · d610d98b
      Namhyung Kim 提交于
      perf_event_task_event() iterates pmu list and generate events
      for each eligible pmu context.  But if task_event has task_ctx
      like in EXIT it'll generate events even though the pmu doesn't
      have an eligible one. Fix it by moving the code to proper
      places.
      
      Before this patch:
      
        $ perf record -n true
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.006 MB perf.data (~248 samples) ]
      
        $ perf report -D | tail
        Aggregated stats:
                   TOTAL events:         73
                    MMAP events:         67
                    COMM events:          2
                    EXIT events:          4
        cycles stats:
                   TOTAL events:         73
                    MMAP events:         67
                    COMM events:          2
                    EXIT events:          4
      
      After this patch:
      
        $ perf report -D | tail
        Aggregated stats:
                   TOTAL events:         70
                    MMAP events:         67
                    COMM events:          2
                    EXIT events:          1
        cycles stats:
                   TOTAL events:         70
                    MMAP events:         67
                    COMM events:          2
                    EXIT events:          1
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1363332433-7637-1-git-send-email-namhyung@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d610d98b
    • N
      perf: Reset hwc->last_period on sw clock events · 778141e3
      Namhyung Kim 提交于
      When cpu/task clock events are initialized, their sampling
      frequencies are converted to have a fixed value.  However it
      missed to update the hwc->last_period which was set to 1 for
      initial sampling frequency calibration.
      
      Because this hwc->last_period value is used as a period in
      perf_swevent_ hrtime(), every recorded sample will have an
      incorrected period of 1.
      
        $ perf record -e task-clock noploop 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.158 MB perf.data (~6919 samples) ]
      
        $ perf report -n --show-total-period  --stdio
        # Samples: 4K of event 'task-clock'
        # Event count (approx.): 4000
        #
        # Overhead       Samples        Period  Command  Shared Object              Symbol
        # ........  ............  ............  .......  .............  ..................
        #
            99.95%          3998          3998  noploop  noploop        [.] main
             0.03%             1             1  noploop  libc-2.15.so   [.] init_cacheinfo
             0.03%             1             1  noploop  ld-2.15.so     [.] open_verify
      
      Note that it doesn't affect the non-sampling event so that the
      perf stat still gets correct value with or without this patch.
      
        $ perf stat -e task-clock noploop 1
      
         Performance counter stats for 'noploop 1':
      
               1000.272525 task-clock                #    1.000 CPUs utilized
      
               1.000560605 seconds time elapsed
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1363574507-18808-1-git-send-email-namhyung@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      778141e3
  3. 06 3月, 2013 1 次提交
  4. 28 2月, 2013 2 次提交
    • S
      hlist: drop the node parameter from iterators · b67bfe0d
      Sasha Levin 提交于
      I'm not sure why, but the hlist for each entry iterators were conceived
      
              list_for_each_entry(pos, head, member)
      
      The hlist ones were greedy and wanted an extra parameter:
      
              hlist_for_each_entry(tpos, pos, head, member)
      
      Why did they need an extra pos parameter? I'm not quite sure. Not only
      they don't really need it, it also prevents the iterator from looking
      exactly like the list iterator, which is unfortunate.
      
      Besides the semantic patch, there was some manual work required:
      
       - Fix up the actual hlist iterators in linux/list.h
       - Fix up the declaration of other iterators based on the hlist ones.
       - A very small amount of places were using the 'node' parameter, this
       was modified to use 'obj->member' instead.
       - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
       properly, so those had to be fixed up manually.
      
      The semantic patch which is mostly the work of Peter Senna Tschudin is here:
      
      @@
      iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
      
      type T;
      expression a,c,d,e;
      identifier b;
      statement S;
      @@
      
      -T b;
          <+... when != b
      (
      hlist_for_each_entry(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue(a,
      - b,
      c) S
      |
      hlist_for_each_entry_from(a,
      - b,
      c) S
      |
      hlist_for_each_entry_rcu(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_rcu_bh(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue_rcu_bh(a,
      - b,
      c) S
      |
      for_each_busy_worker(a, c,
      - b,
      d) S
      |
      ax25_uid_for_each(a,
      - b,
      c) S
      |
      ax25_for_each(a,
      - b,
      c) S
      |
      inet_bind_bucket_for_each(a,
      - b,
      c) S
      |
      sctp_for_each_hentry(a,
      - b,
      c) S
      |
      sk_for_each(a,
      - b,
      c) S
      |
      sk_for_each_rcu(a,
      - b,
      c) S
      |
      sk_for_each_from
      -(a, b)
      +(a)
      S
      + sk_for_each_from(a) S
      |
      sk_for_each_safe(a,
      - b,
      c, d) S
      |
      sk_for_each_bound(a,
      - b,
      c) S
      |
      hlist_for_each_entry_safe(a,
      - b,
      c, d, e) S
      |
      hlist_for_each_entry_continue_rcu(a,
      - b,
      c) S
      |
      nr_neigh_for_each(a,
      - b,
      c) S
      |
      nr_neigh_for_each_safe(a,
      - b,
      c, d) S
      |
      nr_node_for_each(a,
      - b,
      c) S
      |
      nr_node_for_each_safe(a,
      - b,
      c, d) S
      |
      - for_each_gfn_sp(a, c, d, b) S
      + for_each_gfn_sp(a, c, d) S
      |
      - for_each_gfn_indirect_valid_sp(a, c, d, b) S
      + for_each_gfn_indirect_valid_sp(a, c, d) S
      |
      for_each_host(a,
      - b,
      c) S
      |
      for_each_host_safe(a,
      - b,
      c, d) S
      |
      for_each_mesh_entry(a,
      - b,
      c, d) S
      )
          ...+>
      
      [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
      [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: fix warnings]
      [akpm@linux-foudnation.org: redo intrusive kvm changes]
      Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b67bfe0d
    • T
      events: convert to idr_alloc() · 0e9c3be2
      Tejun Heo 提交于
      Convert to the much saner new idr interface.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0e9c3be2
  5. 23 2月, 2013 1 次提交
  6. 09 2月, 2013 1 次提交
    • O
      perf: Introduce hw_perf_event->tp_target and ->tp_list · f22c1bb6
      Oleg Nesterov 提交于
      sys_perf_event_open()->perf_init_event(event) is called before
      find_get_context(event), this means that event->ctx == NULL when
      class->reg(TRACE_REG_PERF_REGISTER/OPEN) is called and thus it
      can't know if this event is per-task or system-wide.
      
      This patch adds hw_perf_event->tp_target for PERF_TYPE_TRACEPOINT,
      this is analogous to PERF_TYPE_BREAKPOINT/bp_target we already have.
      The patch also moves ->bp_target up so that it can overlap with the
      new member, this can help the compiler to generate the better code.
      
      trace_uprobe_register() will use it for prefiltering to avoid the
      unnecessary breakpoints in mm's we do not want to trace.
      
      ->tp_target doesn't have its own reference, but we can rely on the
      fact that either sys_perf_event_open() holds a reference, or it is
      equal to event->ctx->task. So this pointer is always valid until
      free_event().
      
      Also add the "struct list_head tp_list" into this union. It is not
      strictly necessary, but it can simplify the next changes and we can
      add it for free.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      f22c1bb6
  7. 03 2月, 2013 1 次提交
  8. 20 11月, 2012 1 次提交
  9. 19 11月, 2012 1 次提交
    • E
      pidns: Use task_active_pid_ns where appropriate · 17cf22c3
      Eric W. Biederman 提交于
      The expressions tsk->nsproxy->pid_ns and task_active_pid_ns
      aka ns_of_pid(task_pid(tsk)) should have the same number of
      cache line misses with the practical difference that
      ns_of_pid(task_pid(tsk)) is released later in a processes life.
      
      Furthermore by using task_active_pid_ns it becomes trivial
      to write an unshare implementation for the the pid namespace.
      
      So I have used task_active_pid_ns everywhere I can.
      
      In fork since the pid has not yet been attached to the
      process I use ns_of_pid, to achieve the same effect.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      17cf22c3
  10. 09 10月, 2012 1 次提交
    • K
      mm: kill vma flag VM_RESERVED and mm->reserved_vm counter · 314e51b9
      Konstantin Khlebnikov 提交于
      A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
      currently it lost original meaning but still has some effects:
      
       | effect                 | alternative flags
      -+------------------------+---------------------------------------------
      1| account as reserved_vm | VM_IO
      2| skip in core dump      | VM_IO, VM_DONTDUMP
      3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
      4| do not mlock           | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
      
      This patch removes reserved_vm counter from mm_struct.  Seems like nobody
      cares about it, it does not exported into userspace directly, it only
      reduces total_vm showed in proc.
      
      Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.
      
      remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
      remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.
      
      [akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      314e51b9
  11. 05 10月, 2012 2 次提交
  12. 27 9月, 2012 2 次提交
  13. 15 9月, 2012 1 次提交
    • T
      cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them · 8c7f6edb
      Tejun Heo 提交于
      Currently, cgroup hierarchy support is a mess.  cpu related subsystems
      behave correctly - configuration, accounting and control on a parent
      properly cover its children.  blkio and freezer completely ignore
      hierarchy and treat all cgroups as if they're directly under the root
      cgroup.  Others show yet different behaviors.
      
      These differing interpretations of cgroup hierarchy make using cgroup
      confusing and it impossible to co-mount controllers into the same
      hierarchy and obtain sane behavior.
      
      Eventually, we want full hierarchy support from all subsystems and
      probably a unified hierarchy.  Users using separate hierarchies
      expecting completely different behaviors depending on the mounted
      subsystem is deterimental to making any progress on this front.
      
      This patch adds cgroup_subsys.broken_hierarchy and sets it to %true
      for controllers which are lacking in hierarchy support.  The goal of
      this patch is two-fold.
      
      * Move users away from using hierarchy on currently non-hierarchical
        subsystems, so that implementing proper hierarchy support on those
        doesn't surprise them.
      
      * Keep track of which controllers are broken how and nudge the
        subsystems to implement proper hierarchy support.
      
      For now, start with a single warning message.  We can whine louder
      later on.
      
      v2: Fixed a typo spotted by Michal. Warning message updated.
      
      v3: Updated memcg part so that it doesn't generate warning in the
          cases where .use_hierarchy=false doesn't make the behavior
          different from root.use_hierarchy=true.  Fixed a typo spotted by
          Glauber.
      
      v4: Check ->broken_hierarchy after cgroup creation is complete so that
          ->create() can affect the result per Michal.  Dropped unnecessary
          memcg root handling per Michal.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NSerge E. Hallyn <serue@us.ibm.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Turner <pjt@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      8c7f6edb
  14. 04 9月, 2012 2 次提交
  15. 10 8月, 2012 2 次提交
    • J
      perf: Add ability to attach user stack dump to sample · c5ebcedb
      Jiri Olsa 提交于
      Introducing PERF_SAMPLE_STACK_USER sample type bit to trigger the dump
      of the user level stack on sample. The size of the dump is specified by
      sample_stack_user value.
      
      Being able to dump parts of the user stack, starting from the stack
      pointer, will be useful to make a post mortem dwarf CFI based stack
      unwinding.
      
      Added HAVE_PERF_USER_STACK_DUMP config option to determine if the
      architecture provides user stack dump on perf event samples.  This needs
      access to the user stack pointer which is not unified across
      architectures. Enabling this for x86 architecture.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Original-patch-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Arun Sharma <asharma@fb.com>
      Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Ulrich Drepper <drepper@gmail.com>
      Link: http://lkml.kernel.org/r/1344345647-11536-6-git-send-email-jolsa@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c5ebcedb
    • J
      perf: Add ability to attach user level registers dump to sample · 4018994f
      Jiri Olsa 提交于
      Introducing PERF_SAMPLE_REGS_USER sample type bit to trigger the dump of
      user level registers on sample. Registers we want to dump are specified
      by sample_regs_user bitmask.
      
      Only user level registers are dumped at the moment. Meaning the register
      values of the user space context as it was before the user entered the
      kernel for whatever reason (syscall, irq, exception, or a PMI happening
      in userspace).
      
      The layout of the sample_regs_user bitmap is described in
      asm/perf_regs.h for archs that support register dump.
      
      This is going to be useful to bring Dwarf CFI based stack unwinding on
      top of samples.
      Original-patch-by: NFrederic Weisbecker <fweisbec@gmail.com>
      [ Dump registers ABI specification. ]
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Suggested-by: NStephane Eranian <eranian@google.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Arun Sharma <asharma@fb.com>
      Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Ulrich Drepper <drepper@gmail.com>
      Link: http://lkml.kernel.org/r/1344345647-11536-3-git-send-email-jolsa@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4018994f
  16. 31 7月, 2012 1 次提交
  17. 18 6月, 2012 4 次提交
  18. 01 6月, 2012 1 次提交
  19. 23 5月, 2012 1 次提交
    • J
      Revert "sched, perf: Use a single callback into the scheduler" · ab0cce56
      Jiri Olsa 提交于
      This reverts commit cb04ff9a ("sched, perf: Use a single
      callback into the scheduler").
      
      Before this change was introduced, the process switch worked
      like this (wrt. to perf event schedule):
      
           schedule (prev, next)
             - schedule out all perf events for prev
             - switch to next
             - schedule in all perf events for current (next)
      
      After the commit, the process switch looks like:
      
           schedule (prev, next)
             - schedule out all perf events for prev
             - schedule in all perf events for (next)
             - switch to next
      
      The problem is, that after we schedule perf events in, the pmu
      is enabled and we can receive events even before we make the
      switch to next - so "current" still being prev process (event
      SAMPLE data are filled based on the value of the "current"
      process).
      
      Thats exactly what we see for test__PERF_RECORD test. We receive
      SAMPLES with PID of the process that our tracee is scheduled
      from.
      
      Discussed with Peter Zijlstra:
      
       > Bah!, yeah I guess reverting is the right thing for now. Sad
       > though.
       >
       > So by having the two hooks we have a black-spot between them
       > where we receive no events at all, this black-spot covers the
       > hand-over of current and we thus don't receive the 'wrong'
       > events.
       >
       > I rather liked we could do away with both that black-spot and
       > clean up the code a little, but apparently people rely on it.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: acme@redhat.com
      Cc: paulus@samba.org
      Cc: cjashfor@linux.vnet.ibm.com
      Cc: fweisbec@gmail.com
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/20120523111302.GC1638@m.brq.redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ab0cce56
  20. 09 5月, 2012 2 次提交
  21. 26 4月, 2012 2 次提交
  22. 24 3月, 2012 1 次提交
  23. 23 3月, 2012 1 次提交
    • P
      perf: Fix mmap_page capabilities and docs · c7206205
      Peter Zijlstra 提交于
      Complete the syscall-less self-profiling feature and address
      all complaints, namely:
      
       - capabilities, so we can detect what is actually available at runtime
      
           Add a capabilities field to perf_event_mmap_page to indicate
           what is actually available for use.
      
       - on x86: RDPMC weirdness due to being 40/48 bits and not sign-extending
         properly.
      
       - ABI documentation as to how all this stuff works.
      
      Also improve the documentation for the new features.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vince Weaver <vweaver1@eecs.utk.edu>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/1332433596.2487.33.camel@twinsSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c7206205
  24. 05 3月, 2012 3 次提交
    • S
      perf: Add callback to flush branch_stack on context switch · d010b332
      Stephane Eranian 提交于
      With branch stack sampling, it is possible to filter by priv levels.
      
      In system-wide mode, that means it is possible to capture only user
      level branches. The builtin SW LBR filter needs to disassemble code
      based on LBR captured addresses. For that, it needs to know the task
      the addresses are associated with. Because of context switches, the
      content of the branch stack buffer may contain addresses from
      different tasks.
      
      We need a callback on context switch to either flush the branch stack
      or save it. This patch adds a new callback in struct pmu which is called
      during context switches. The callback is called only when necessary.
      That is when a system-wide context has, at least, one event which
      uses PERF_SAMPLE_BRANCH_STACK. The callback is never called for
      per-thread context.
      
      In this version, the Intel x86 code simply flushes (resets) the LBR
      on context switches (fills it with zeroes). Those zeroed branches are
      then filtered out by the SW filter.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1328826068-11713-11-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      d010b332
    • S
      perf: Disable PERF_SAMPLE_BRANCH_* when not supported · 2481c5fa
      Stephane Eranian 提交于
      PERF_SAMPLE_BRANCH_* is disabled for:
      
       - SW events (sw counters, tracepoints)
       - HW breakpoints
       - ALL but Intel x86 architecture
       - AMD64 processors
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1328826068-11713-10-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      2481c5fa
    • S
      perf: Add generic taken branch sampling support · bce38cd5
      Stephane Eranian 提交于
      This patch adds the ability to sample taken branches to the
      perf_event interface.
      
      The ability to capture taken branches is very useful for all
      sorts of analysis. For instance, basic block profiling, call
      counts, statistical call graph.
      
      This new capability requires hardware assist and as such may
      not be available on all HW platforms. On Intel x86 it is
      implemented on top of the Last Branch Record (LBR) facility.
      
      To enable taken branches sampling, the PERF_SAMPLE_BRANCH_STACK
      bit must be set in attr->sample_type.
      
      Sampled taken branches may be filtered by type and/or priv
      levels.
      
      The patch adds a new field, called branch_sample_type, to the
      perf_event_attr structure. It contains a bitmask of filters
      to apply to the sampled taken branches.
      
      Filters may be implemented in HW. If the HW filter does not exist
      or is not good enough, some arch may also implement a SW filter.
      
      The following generic filters are currently defined:
      - PERF_SAMPLE_USER
        only branches whose targets are at the user level
      
      - PERF_SAMPLE_KERNEL
        only branches whose targets are at the kernel level
      
      - PERF_SAMPLE_HV
        only branches whose targets are at the hypervisor level
      
      - PERF_SAMPLE_ANY
        any type of branches (subject to priv levels filters)
      
      - PERF_SAMPLE_ANY_CALL
        any call branches (may incl. syscall on some arch)
      
      - PERF_SAMPLE_ANY_RET
        any return branches (may incl. syscall returns on some arch)
      
      - PERF_SAMPLE_IND_CALL
        indirect call branches
      
      Obviously filter may be combined. The priv level bits are optional.
      If not provided, the priv level of the associated event are used. It
      is possible to collect branches at a priv level different from the
      associated event. Use of kernel, hv priv levels is subject to permissions
      and availability (hv).
      
      The number of taken branch records present in each sample may vary based
      on HW, the type of sampled branches, the executed code. Therefore
      each sample contains the number of taken branches it contains.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1328826068-11713-2-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      bce38cd5
  25. 24 2月, 2012 1 次提交
    • I
      static keys: Introduce 'struct static_key', static_key_true()/false() and... · c5905afb
      Ingo Molnar 提交于
      static keys: Introduce 'struct static_key', static_key_true()/false() and static_key_slow_[inc|dec]()
      
      So here's a boot tested patch on top of Jason's series that does
      all the cleanups I talked about and turns jump labels into a
      more intuitive to use facility. It should also address the
      various misconceptions and confusions that surround jump labels.
      
      Typical usage scenarios:
      
              #include <linux/static_key.h>
      
              struct static_key key = STATIC_KEY_INIT_TRUE;
      
              if (static_key_false(&key))
                      do unlikely code
              else
                      do likely code
      
      Or:
      
              if (static_key_true(&key))
                      do likely code
              else
                      do unlikely code
      
      The static key is modified via:
      
              static_key_slow_inc(&key);
              ...
              static_key_slow_dec(&key);
      
      The 'slow' prefix makes it abundantly clear that this is an
      expensive operation.
      
      I've updated all in-kernel code to use this everywhere. Note
      that I (intentionally) have not pushed through the rename
      blindly through to the lowest levels: the actual jump-label
      patching arch facility should be named like that, so we want to
      decouple jump labels from the static-key facility a bit.
      
      On non-jump-label enabled architectures static keys default to
      likely()/unlikely() branches.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NJason Baron <jbaron@redhat.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: a.p.zijlstra@chello.nl
      Cc: mathieu.desnoyers@efficios.com
      Cc: davem@davemloft.net
      Cc: ddaney.cavm@gmail.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20120222085809.GA26397@elte.huSigned-off-by: NIngo Molnar <mingo@elte.hu>
      c5905afb