1. 08 8月, 2013 5 次提交
  2. 31 7月, 2013 8 次提交
  3. 23 7月, 2013 15 次提交
    • P
      sched: Micro-optimize the smart wake-affine logic · 7d9ffa89
      Peter Zijlstra 提交于
      Smart wake-affine is using node-size as the factor currently, but the overhead
      of the mask operation is high.
      
      Thus, this patch introduce the 'sd_llc_size' percpu variable, which will record
      the highest cache-share domain size, and make it to be the new factor, in order
      to reduce the overhead and make it more reasonable.
      Tested-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
      Tested-by: NMichael Wang <wangyun@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Acked-by: NMichael Wang <wangyun@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Link: http://lkml.kernel.org/r/51D5008E.6030102@linux.vnet.ibm.com
      [ Tidied up the changelog. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      7d9ffa89
    • M
      sched: Implement smarter wake-affine logic · 62470419
      Michael Wang 提交于
      The wake-affine scheduler feature is currently always trying to pull
      the wakee close to the waker. In theory this should be beneficial if
      the waker's CPU caches hot data for the wakee, and it's also beneficial
      in the extreme ping-pong high context switch rate case.
      
      Testing shows it can benefit hackbench up to 15%.
      
      However, the feature is somewhat blind, from which some workloads
      such as pgbench suffer. It's also time-consuming algorithmically.
      
      Testing shows it can damage pgbench up to 50% - far more than the
      benefit it brings in the best case.
      
      So wake-affine should be smarter and it should realize when to
      stop its thankless effort at trying to find a suitable CPU to wake on.
      
      This patch introduces 'wakee_flips', which will be increased each
      time the task flips (switches) its wakee target.
      
      So a high 'wakee_flips' value means the task has more than one
      wakee, and the bigger the number, the higher the wakeup frequency.
      
      Now when making the decision on whether to pull or not, pay attention to
      the wakee with a high 'wakee_flips', pulling such a task may benefit
      the wakee. Also imply that the waker will face cruel competition later,
      it could be very cruel or very fast depends on the story behind
      'wakee_flips', waker therefore suffers.
      
      Furthermore, if waker also has a high 'wakee_flips', that implies that
      multiple tasks rely on it, then waker's higher latency will damage all
      of them, so pulling wakee seems to be a bad deal.
      
      Thus, when 'waker->wakee_flips / wakee->wakee_flips' becomes
      higher and higher, the cost of pulling seems to be worse and worse.
      
      The patch therefore helps the wake-affine feature to stop its pulling
      work when:
      
      	wakee->wakee_flips > factor &&
      	waker->wakee_flips > (factor * wakee->wakee_flips)
      
      The 'factor' here is the number of CPUs in the current CPU's NUMA node,
      so a bigger node will lead to more pulling since the trial becomes more
      severe.
      
      After applying the patch, pgbench shows up to 40% improvements and no regressions.
      
      Tested with 12 cpu x86 server and tip 3.10.0-rc7.
      
      The percentages in the final column highlight the areas with the biggest wins,
      all other areas improved as well:
      
      	pgbench		    base	smart
      
      	| db_size | clients |  tps  |	|  tps  |
      	+---------+---------+-------+   +-------+
      	| 22 MB   |       1 | 10598 |   | 10796 |
      	| 22 MB   |       2 | 21257 |   | 21336 |
      	| 22 MB   |       4 | 41386 |   | 41622 |
      	| 22 MB   |       8 | 51253 |   | 57932 |
      	| 22 MB   |      12 | 48570 |   | 54000 |
      	| 22 MB   |      16 | 46748 |   | 55982 | +19.75%
      	| 22 MB   |      24 | 44346 |   | 55847 | +25.93%
      	| 22 MB   |      32 | 43460 |   | 54614 | +25.66%
      	| 7484 MB |       1 |  8951 |   |  9193 |
      	| 7484 MB |       2 | 19233 |   | 19240 |
      	| 7484 MB |       4 | 37239 |   | 37302 |
      	| 7484 MB |       8 | 46087 |   | 50018 |
      	| 7484 MB |      12 | 42054 |   | 48763 |
      	| 7484 MB |      16 | 40765 |   | 51633 | +26.66%
      	| 7484 MB |      24 | 37651 |   | 52377 | +39.11%
      	| 7484 MB |      32 | 37056 |   | 51108 | +37.92%
      	| 15 GB   |       1 |  8845 |   |  9104 |
      	| 15 GB   |       2 | 19094 |   | 19162 |
      	| 15 GB   |       4 | 36979 |   | 36983 |
      	| 15 GB   |       8 | 46087 |   | 49977 |
      	| 15 GB   |      12 | 41901 |   | 48591 |
      	| 15 GB   |      16 | 40147 |   | 50651 | +26.16%
      	| 15 GB   |      24 | 37250 |   | 52365 | +40.58%
      	| 15 GB   |      32 | 36470 |   | 50015 | +37.14%
      Signed-off-by: NMichael Wang <wangyun@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/51D50057.9000809@linux.vnet.ibm.com
      [ Improved the changelog. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      62470419
    • V
      sched: Move h_load calculation to task_h_load() · 68520796
      Vladimir Davydov 提交于
      The bad thing about update_h_load(), which computes hierarchical load
      factor for task groups, is that it is called for each task group in the
      system before every load balancer run, and since rebalance can be
      triggered very often, this function can eat really a lot of cpu time if
      there are many cpu cgroups in the system.
      
      Although the situation was improved significantly by commit a35b6466
      ('sched, cgroup: Reduce rq->lock hold times for large cgroup
      hierarchies'), the problem still can arise under some kinds of loads,
      e.g. when cpus are switching from idle to busy and back very frequently.
      
      For instance, when I start 1000 of processes that wake up every
      millisecond on my 8 cpus host, 'top' and 'perf top' show:
      
      Cpu(s): 17.8%us, 24.3%sy,  0.0%ni, 57.9%id,  0.0%wa,  0.0%hi,  0.0%si
      Events: 243K cycles
        7.57%  [kernel]               [k] __schedule
        7.08%  [kernel]               [k] timerqueue_add
        6.13%  libc-2.12.so           [.] usleep
      
      Then if I create 10000 *idle* cpu cgroups (no processes in them), cpu
      usage increases significantly although the 'wakers' are still executing
      in the root cpu cgroup:
      
      Cpu(s): 19.1%us, 48.7%sy,  0.0%ni, 31.6%id,  0.0%wa,  0.0%hi,  0.7%si
      Events: 230K cycles
       24.56%  [kernel]            [k] tg_load_down
        5.76%  [kernel]            [k] __schedule
      
      This happens because this particular kind of load triggers 'new idle'
      rebalance very frequently, which requires calling update_h_load(),
      which, in turn, calls tg_load_down() for every *idle* cpu cgroup even
      though it is absolutely useless, because idle cpu cgroups have no tasks
      to pull.
      
      This patch tries to improve the situation by making h_load calculation
      proceed only when h_load is really necessary. To achieve this, it
      substitutes update_h_load() with update_cfs_rq_h_load(), which computes
      h_load only for a given cfs_rq and all its ascendants, and makes the
      load balancer call this function whenever it considers if a task should
      be pulled, i.e. it moves h_load calculations directly to task_h_load().
      For h_load of the same cfs_rq not to be updated multiple times (in case
      several tasks in the same cgroup are considered during the same balance
      run), the patch keeps the time of the last h_load update for each cfs_rq
      and breaks calculation when it finds h_load to be uptodate.
      
      The benefit of it is that h_load is computed only for those cfs_rq's,
      which really need it, in particular all idle task groups are skipped.
      Although this, in fact, moves h_load calculation under rq lock, it
      should not affect latency much, because the amount of work done under rq
      lock while trying to pull tasks is limited by sched_nr_migrate.
      
      After the patch applied with the setup described above (1000 wakers in
      the root cgroup and 10000 idle cgroups), I get:
      
      Cpu(s): 16.9%us, 24.8%sy,  0.0%ni, 58.4%id,  0.0%wa,  0.0%hi,  0.0%si
      Events: 242K cycles
        7.57%  [kernel]                  [k] __schedule
        6.70%  [kernel]                  [k] timerqueue_add
        5.93%  libc-2.12.so              [.] usleep
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1373896159-1278-1-git-send-email-vdavydov@parallels.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      68520796
    • A
      perf tools: Add test for converting perf time to/from TSC · 3bd5a5fc
      Adrian Hunter 提交于
      The test uses the newly added cap_usr_time_zero and time_zero of
      perf_event_mmap_page.  TSC from rdtsc is compared with the time
      from 2 perf events.  The test passes if the calculated times are
      all in the correct order.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/1372425741-1676-4-git-send-email-adrian.hunter@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3bd5a5fc
    • A
      perf/x86: Add ability to calculate TSC from perf sample timestamps · c73deb6a
      Adrian Hunter 提交于
      For modern CPUs, perf clock is directly related to TSC.  TSC
      can be calculated from perf clock and vice versa using a simple
      calculation.  Two of the three componenets of that calculation
      are already exported in struct perf_event_mmap_page.  This patch
      exports the third.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Link: http://lkml.kernel.org/r/1372425741-1676-3-git-send-email-adrian.hunter@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c73deb6a
    • A
      perf: Fix broken union in 'struct perf_event_mmap_page' · 860f085b
      Adrian Hunter 提交于
      The capabilities bits must not be "union'ed" together.
      Put them in a separate struct.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1372425741-1676-2-git-send-email-adrian.hunter@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      860f085b
    • P
      perf: Update perf_event_type documentation · a5cdd40c
      Peter Zijlstra 提交于
      Due to a discussion with Adrian I had a good look at the perf_event_type record
      layout and found the documentation to be somewhat unclear.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20130716150907.GL23818@dyad.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a5cdd40c
    • J
      kprobes/x86: Call out into INT3 handler directly instead of using notifier · 17f41571
      Jiri Kosina 提交于
      In fd4363ff ("x86: Introduce int3 (breakpoint)-based
      instruction patching"), the mechanism that was introduced for
      notifying alternatives code from int3 exception handler that and
      exception occured was die_notifier.
      
      This is however problematic, as early code might be using jump
      labels even before the notifier registration has been performed,
      which will then lead to an oops due to unhandled exception. One
      of such occurences has been encountered by Fengguang:
      
       int3: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
       Modules linked in:
       CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.11.0-rc1-01429-g04bf576 #8
       task: ffff88000da1b040 ti: ffff88000da1c000 task.ti: ffff88000da1c000
       RIP: 0010:[<ffffffff811098cc>]  [<ffffffff811098cc>] ttwu_do_wakeup+0x28/0x225
       RSP: 0000:ffff88000dd03f10  EFLAGS: 00000006
       RAX: 0000000000000000 RBX: ffff88000dd12940 RCX: ffffffff81769c40
       RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000001
       RBP: ffff88000dd03f28 R08: ffffffff8176a8c0 R09: 0000000000000002
       R10: ffffffff810ff484 R11: ffff88000dd129e8 R12: ffff88000dbc90c0
       R13: ffff88000dbc90c0 R14: ffff88000da1dfd8 R15: ffff88000da1dfd8
       FS:  0000000000000000(0000) GS:ffff88000dd00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: 00000000ffffffff CR3: 0000000001c88000 CR4: 00000000000006e0
       Stack:
        ffff88000dd12940 ffff88000dbc90c0 ffff88000da1dfd8 ffff88000dd03f48
        ffffffff81109e2b ffff88000dd12940 0000000000000000 ffff88000dd03f68
        ffffffff81109e9e 0000000000000000 0000000000012940 ffff88000dd03f98
       Call Trace:
        <IRQ>
        [<ffffffff81109e2b>] ttwu_do_activate.constprop.56+0x6d/0x79
        [<ffffffff81109e9e>] sched_ttwu_pending+0x67/0x84
        [<ffffffff8110c845>] scheduler_ipi+0x15a/0x2b0
        [<ffffffff8104dfb4>] smp_reschedule_interrupt+0x38/0x41
        [<ffffffff8173bf5d>] reschedule_interrupt+0x6d/0x80
        <EOI>
        [<ffffffff810ff484>] ? __atomic_notifier_call_chain+0x5/0xc1
        [<ffffffff8105cc30>] ? native_safe_halt+0xd/0x16
        [<ffffffff81015f10>] default_idle+0x147/0x282
        [<ffffffff81017026>] arch_cpu_idle+0x3d/0x5d
        [<ffffffff81127d6a>] cpu_idle_loop+0x46d/0x5db
        [<ffffffff81127f5c>] cpu_startup_entry+0x84/0x84
        [<ffffffff8104f4f8>] start_secondary+0x3c8/0x3d5
        [...]
      
      Fix this by directly calling poke_int3_handler() from the int3
      exception handler (analogically to what ftrace has been doing
      already), instead of relying on notifier, registration of which
      might not have yet been finalized by the time of the first trap.
      Reported-and-tested-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1307231007490.14024@pobox.suse.czSigned-off-by: NIngo Molnar <mingo@kernel.org>
      17f41571
    • I
      Merge tag 'perf-core-for-mingo' of... · 4f16d61f
      Ingo Molnar 提交于
      Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
        * Fix memcpy benchmark for large sizes, from Andi Kleen.
      
        * Support callchain sorting based on addresses, from Andi Kleen
      
        * Move weight back to common sort keys, From Andi Kleen.
      
        * Fix named threads support in 'perf script', from David Ahern.
      
        * Handle ENODEV on default cycles event, fix from David Ahern.
      
        * More install tests, from Jiri Olsa.
      
        * Fix build with perl 5.18, from Kirill A. Shutemov.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      4f16d61f
    • A
      perf tools: Move weight back to common sort keys · f9ea55d0
      Andi Kleen 提交于
      This is a partial revert of Namhyung's patch
      
       afab87b9
       perf sort: Separate out memory-specific sort keys
      
      He wrote
      
       For global/local weights, I'm not entirely sure to place them into the
       memory dimension.  But it's the only user at this time.
      
      Well TSX is another (in fact the original) user of the flags, and it
      needs them to be common. So move local/global weight back to the common
      sort keys.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Link: http://lkml.kernel.org/r/1374188333-17899-1-git-send-email-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f9ea55d0
    • J
      perf tests: Add broken install-* tests into tests/make · dbad4189
      Jiri Olsa 提交于
      Adding install-* tests into tests/make. Those tests are
      broken, so commenting them out right away.
      
      * Nothing get installed for install-man, install_doc and
        install_html targets, they just rebuild the documentation.
      
      * I've got following error for 'install-info':
      
        $ make -f tests/make make_install_info
        - make_install_info: cd . && make -f Makefile DESTDIR=/tmp/tmp.Xi4mb9J1a0 install-info
      
        $ tail -f make_install_info
        ...
        PERF_VERSION = 3.11.rc1.g9b3c2d
        make[2]: *** No rule to make target `user-manual.xml', needed by `user-manual.texi'.  Stop.
        make[1]: *** [install-info] Error 2
      
      * I've got following error for 'install-pdf':
      
        $ make -f tests/make make_install_pdf
        - make_install_pdf: cd . && make -f Makefile DESTDIR=/tmp/tmp.fXseECBbt1 install-pdf
      
        $ tail -f make_install_pdf
        ...
        PERF_VERSION = 3.11.rc1.g9b3c2d
        make[2]: *** No rule to make target `user-manual.xml', needed by `user-manual.pdf'.  Stop.
        make[1]: *** [install-pdf] Error 2
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1374497014-2817-6-git-send-email-jolsa@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      dbad4189
    • J
      perf tests: Add 'make install/install-bin' tests into tests/make · c0ec1108
      Jiri Olsa 提交于
      Adding 'make install' and 'make install-bin' tests into tests/make. It's
      run as part of the suite, but could be run separately like:
      
        $ make -f tests/make make_install
        - make_install: cd . && make -f Makefile DESTDIR=/tmp/tmp.LpkYbk5pfs install
          test: test -x /tmp/tmp.LpkYbk5pfs/bin/perf
        $ make -f tests/make make_install_bin
        - make_install_bin: cd . && make -f Makefile DESTDIR=/tmp/tmp.dMxePBMcFT
          install-bin
          test: test -x /tmp/tmp.dMxePBMcFT/bin/perf
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1374497014-2817-5-git-send-email-jolsa@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c0ec1108
    • J
      perf tests: Add DESTDIR=TMP_DEST tests/make variable · c9311674
      Jiri Olsa 提交于
      Adding TMP_DEST tests/make variable to provide the DESTDIR directory for
      installation tests.
      
      Adding this to existing test targets, since DESTDIR variable 'should
      not' affect other than install* targets. We can always separate this if
      there's a need for DESTDIR-free build test.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1374497014-2817-4-git-send-email-jolsa@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c9311674
    • J
      perf tests: Rename TMP to TMP_O tests/make variable · 8ba7cdea
      Jiri Olsa 提交于
      Renaming TMP to TMP_O tests/make variable to make a name space for other
      temp variables.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1374497014-2817-3-git-send-email-jolsa@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8ba7cdea
    • J
      perf tests: Run ctags/cscope make tests only with needed binaries · 0659e669
      Jiri Olsa 提交于
      Running tags and cscope make tests only if the 'ctags' and 'cscope'
      binaries are installed, so we don't have false alarm test failures.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1374497014-2817-2-git-send-email-jolsa@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0659e669
  4. 22 7月, 2013 5 次提交
    • K
      perf tools: Fix build with perl 5.18 · 575bf1d0
      Kirill A. Shutemov 提交于
      perl.h from new Perl release doesn't like -Wundef and -Wswitch-default:
      
      /usr/lib/perl5/core_perl/CORE/perl.h:548:5: error: "SILENT_NO_TAINT_SUPPORT" is not defined [-Werror=undef]
       #if SILENT_NO_TAINT_SUPPORT && !defined(NO_TAINT_SUPPORT)
           ^
      /usr/lib/perl5/core_perl/CORE/perl.h:556:5: error: "NO_TAINT_SUPPORT" is not defined [-Werror=undef]
       #if NO_TAINT_SUPPORT
           ^
      In file included from /usr/lib/perl5/core_perl/CORE/perl.h:3471:0,
                       from util/scripting-engines/trace-event-perl.c:30:
      /usr/lib/perl5/core_perl/CORE/sv.h:1455:5: error: "NO_TAINT_SUPPORT" is not defined [-Werror=undef]
       #if NO_TAINT_SUPPORT
           ^
      In file included from /usr/lib/perl5/core_perl/CORE/perl.h:3472:0,
                       from util/scripting-engines/trace-event-perl.c:30:
      /usr/lib/perl5/core_perl/CORE/regexp.h:436:5: error: "NO_TAINT_SUPPORT" is not defined [-Werror=undef]
       #if NO_TAINT_SUPPORT
           ^
      In file included from /usr/lib/perl5/core_perl/CORE/hv.h:592:0,
                       from /usr/lib/perl5/core_perl/CORE/perl.h:3480,
                       from util/scripting-engines/trace-event-perl.c:30:
      /usr/lib/perl5/core_perl/CORE/hv_func.h: In function ‘S_perl_hash_siphash_2_4’:
      /usr/lib/perl5/core_perl/CORE/hv_func.h:222:3: error: switch missing default case [-Werror=switch-default]
         switch( left )
         ^
      /usr/lib/perl5/core_perl/CORE/hv_func.h: In function ‘S_perl_hash_superfast’:
      /usr/lib/perl5/core_perl/CORE/hv_func.h:274:5: error: switch missing default case [-Werror=switch-default]
           switch (rem) { \
           ^
      /usr/lib/perl5/core_perl/CORE/hv_func.h: In function ‘S_perl_hash_murmur3’:
      /usr/lib/perl5/core_perl/CORE/hv_func.h:398:5: error: switch missing default case [-Werror=switch-default]
           switch(bytes_in_carry) { /* how many bytes in carry */
           ^
      
      Let's disable the warnings for code which uses perl.h.
      Signed-off-by: NKirill A. Shutemov <kirill@shutemov.name>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1372063394-20126-1-git-send-email-kirill@shutemov.nameSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      575bf1d0
    • A
      perf tools: Support callchain sorting based on addresses · 99571ab3
      Andi Kleen 提交于
      With programs with very large functions it can be useful to distinguish
      the callgraph nodes on more than just function names. So for example if
      you have multiple calls to the same function, it ends up being separate
      nodes in the chain.
      
      This patch adds a new key field to the callgraph options, that allows
      comparing nodes on functions (as today, default) and addresses.
      
      Longer term it would be nice to also handle src lines, but that would
      need more changes and address is a reasonable proxy for it today.
      
      I right now reference the global params, as there was no simple way to
      register a params pointer.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/n/tip-0uskktybf0e7wrnoi5e9b9it@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      99571ab3
    • A
      perf bench: Fix memcpy benchmark for large sizes · a198996c
      Andi Kleen 提交于
      The glibc calloc() function has an optimization to not explicitely
      memset() very large calloc allocations that just came from mmap(),
      because they are known to be zero.
      
      This could result in the perf memcpy benchmark reading only from
      the zero page, which gives unrealistic results.
      
      Always call memset explicitly on the source area to avoid this problem.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Hitoshi Mitake <h.mitake@gmail.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Link: http://lkml.kernel.org/n/tip-pzz2qrdq9eymxda0y8yxdn33@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a198996c
    • D
      perf evsel: Handle ENODEV on default cycles event · 2b821cce
      David Ahern 提交于
      Some systems (e.g., VMs on qemu-0.13 with the default vcpu model) report
      an unsupported CPU model:
      
      Performance Events: unsupported p6 CPU model 2 no PMU driver, software events only.
      
      Subsequent invocations of perf fail with:
      
      The sys_perf_event_open() syscall returned with 19 (No such device) for event (cycles).
      /bin/dmesg may provide additional information.
      No CONFIG_PERF_EVENTS=y kernel support configured?
      
      Add ENODEV to the list of errno's to fallback to cpu-clock.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1374190079-28507-1-git-send-email-dsahern@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2b821cce
    • D
      perf script: Fix named threads support · 2eaa1b40
      David Ahern 提交于
      Commit 73994dc1 broke named thread support in perf-script. The thread
      struct in al is the main thread for a multithreaded process. The thread
      struct used for analysis (e.g., dumping events) should be the specific
      thread for the sample.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Cc: Feng Tang <feng.tang@intel.com>
      Link: http://lkml.kernel.org/r/1374185175-28272-1-git-send-email-dsahern@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2eaa1b40
  5. 19 7月, 2013 7 次提交
    • M
      kprobes/x86: Remove unused text_poke_smp() and text_poke_smp_batch() functions · ea8596bb
      Masami Hiramatsu 提交于
      Since introducing the text_poke_bp() for all text_poke_smp*()
      callers, text_poke_smp*() are now unused. This patch basically
      reverts:
      
        3d55cc8a ("x86: Add text_poke_smp for SMP cross modifying code")
        7deb18dc ("x86: Introduce text_poke_smp_batch() for batch-code modifying")
      
      and related commits.
      
      This patch also fixes a Kconfig dependency issue on STOP_MACHINE
      in the case of CONFIG_SMP && !CONFIG_MODULE_UNLOAD.
      Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Reviewed-by: NJiri Kosina <jkosina@suse.cz>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: yrl.pp-manager.tt@hitachi.com
      Cc: Borislav Petkov <bpetkov@suse.de>
      Link: http://lkml.kernel.org/r/20130718114753.26675.18714.stgit@mhiramat-M0-7522Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ea8596bb
    • M
      kprobes/x86: Use text_poke_bp() instead of text_poke_smp*() · a7b0133e
      Masami Hiramatsu 提交于
      Use text_poke_bp() for optimizing kprobes instead of
      text_poke_smp*(). Since the number of kprobes is usually not so
      large (<100) and text_poke_bp() is much lighter than
      text_poke_smp() [which uses stop_machine()], this just stops
      using batch processing.
      Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Reviewed-by: NJiri Kosina <jkosina@suse.cz>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: yrl.pp-manager.tt@hitachi.com
      Cc: Borislav Petkov <bpetkov@suse.de>
      Link: http://lkml.kernel.org/r/20130718114750.26675.9174.stgit@mhiramat-M0-7522Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a7b0133e
    • M
      kprobes/x86: Remove an incorrect comment about int3 in NMI/MCE · c7e85c42
      Masami Hiramatsu 提交于
      Remove a comment about an int3 issue in NMI/MCE, since
      commit:
      
        3f3c8b8c ("x86: Add workaround to NMI iret woes")
      
      already fixed that. Keeping this incorrect comment can mislead developers.
      Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Reviewed-by: NJiri Kosina <jkosina@suse.cz>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: yrl.pp-manager.tt@hitachi.com
      Cc: Borislav Petkov <bpetkov@suse.de>
      Link: http://lkml.kernel.org/r/20130718114747.26675.84110.stgit@mhiramat-M0-7522Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c7e85c42
    • I
      Merge branch 'x86/jumplabel' into perf/core · 9bb15425
      Ingo Molnar 提交于
      Upcoming kprobes patches rely on the int3 code-patching machinery introduced by:
      
         fd4363ff x86: Introduce int3 (breakpoint)-based instruction patching
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9bb15425
    • I
      Merge tag 'perf-core-for-mingo' of... · 5a982132
      Ingo Molnar 提交于
      Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
       * Add missing 'finished_round' event forwarding in 'perf inject', from Adrian Hunter.
      
       * Assorted tidy ups, from Adrian Hunter.
      
       * Fall back to sysfs event names when parsing fails, from Andi Kleen.
      
       * List pmu events in perf list, from Andi Kleen.
      
       * Cleanup some memory allocation/freeing uses, from David Ahern.
      
       * Add option to collapse undesired parts of call graph, from Greg Price.
      
       * Prep work for multi perf data file storage, from Jiri Olsa.
      
       * Add support for more than two files comparision in 'perf diff', from Jiri Olsa
      
       * A few more 'perf test' improvements, from Jiri Olsa
      
       * libtraceevent cleanups, from Namhyung Kim.
      
       * Remove odd build stall in 'perf sched' by moving a large struct initialization
         from a local variable to a global one, from Namhyung Kim.
      
       * Add support for callchains in the gtk UI, from Namhyung Kim.
      
       * Do not apply symfs for an absolute vmlinux path, fix from Namhyung Kim.
      
       * Use default include path notation for libtraceevent, from Robert Richter.
      
       * Fix 'make tools/perf', from Robert Richter.
      
       * Make Power7 events available, from Runzhen Wang.
      
       * Add --objdump option to 'perf top', from Sukadev Bhattiprolu.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5a982132
    • I
      Merge branch 'linus' into perf/core · e43fff2b
      Ingo Molnar 提交于
      Merge in a v3.11-rc1-ish branch to go from v3.10 based development
      to a v3.11 based one.
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e43fff2b
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · ecb2cf1a
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
       "A couple interesting SKB fragment handling fixes, plus the usual small
        bits here and there:
      
         1) Fix 64-bit divide build failure on 32-bit platforms in mlx5, from
            Tim Gardner.
      
         2) Get rid of a stupid reimplementation on "%*phC" in our sysfs MAC
            address printing helper.
      
         3) Fix NETIF_F_SG capability advertisement in hyperv driver, if the
            device can't do checksumming offloads then it shouldn't say it can
            do SG either.  From Haiyang Zhang.
      
         4) bgmac needs to depend on PHYLIB, from Hauke Mehrtens.
      
         5) Don't leak DMA mappings on mapping failures, from Neil Horman.
      
         6) We need to reset the transport header of SKBs in ipv4 before we
            attempt to perform early socket demux, just like ipv6 does.  From
            Eric Dumazet.
      
         7) Add missing locking on vxlan device removal, from Stephen
            Hemminger.
      
         8) xen-netfront has to make two passes over an SKB to prepare it for
            transfer.  One pass calculates the number of slots needed, the
            second massages the SKB and fills the slots.  Unfortunately, the
            first pass doesn't calculate the number of slots properly so we
            can end up trying to build a MAX_SKB_FRAGS + 1 SKB which doesn't
            work out so well.  Fix from Jan Beulich with help and discussion
            with several others.
      
         9) Fix a similar problem in tun and macvtap, which have to split up
            scatter-gather elements at PAGE_SIZE boundaries.  Don't do
            zerocopy if it would result in a > MAX_SKB_FRAGS skb.  Fixes from
            Jason Wang.
      
        10) On receive, once we've decoded the VLAN state completely, clear
            skb->vlan_tci.  Otherwise demuxed tunnels underneath can trigger
            the VLAN code again, corrupting the packet.  Fix from Eric
            Dumazet"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        vlan: fix a race in egress prio management
        vlan: mask vlan prio bits
        macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
        tuntap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
        pkt_sched: sch_qfq: remove a source of high packet delay/jitter
        xen-netfront: pull on receive skb may need to happen earlier
        vxlan: add necessary locking on device removal
        hyperv: Fix the NETIF_F_SG flag setting in netvsc
        net: Fix sysfs_format_mac() code duplication.
        be2net: Fix to avoid hardware workaround when not needed
        macvtap: do not assume 802.1Q when send vlan packets
        macvtap: fix the missing ret value of TUNSETQUEUE
        ipv4: set transport header earlier
        mlx5 core: Fix __udivdi3 when compiling for 32 bit arches
        bgmac: add dependency to phylib
        net/irda: fixed style issues in irlan_eth
        ethtool: fixed trailing statements in ethtool
        ndisc: bool initializations should use true and false
        atl1e: unmap partially mapped skb on dma error and free skb
      ecb2cf1a