1. 02 9月, 2017 3 次提交
    • R
      perf test powerpc: Fix 'Object code reading' test · 9a805d86
      Ravi Bangoria 提交于
      'Object code reading' test always fails on powerpc guest. Two reasons
      for the failure are:
      
      1. When elf section is too big (size beyond 'unsigned int' max value).
      objdump fails to disassemble from such section. This was fixed with
      commit 0f6329bd7fc ("binutils/objdump: Fix disassemble for huge elf
      sections") in binutils.
      
      2. When the sample is from hypervisor. Hypervisor symbols can not be
      resolved within guest and thus thread__find_addr_map() fails for such
      symbols. Fix this by ignoring hypervisor symbols in the test.
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Acked-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/1504170896-7876-1-git-send-email-ravi.bangoria@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9a805d86
    • A
      perf trace: Support syscall name globbing · 27702bcf
      Arnaldo Carvalho de Melo 提交于
      So now we can use:
      
        # perf trace -e pkey_*
         532.784 ( 0.006 ms): pkey/16018 pkey_alloc(init_val: DISABLE_WRITE) = -1 EINVAL Invalid argument
         532.795 ( 0.004 ms): pkey/16018 pkey_mprotect(start: 0x7f380d0a6000, len: 4096, prot: READ|WRITE, pkey: -1) = 0
         532.801 ( 0.002 ms): pkey/16018 pkey_free(pkey: -1                ) = -1 EINVAL Invalid argument
        ^C[root@jouet ~]#
      
      Or '-e epoll*', '-e *msg*', etc.
      
      Combining syscall names with perf events, tracepoints, etc, continues to
      be valid, i.e. this is possible:
      
        # perf probe -L sys_nanosleep
        <SyS_nanosleep@/home/acme/git/linux/kernel/time/hrtimer.c:0>
            0  SYSCALL_DEFINE2(nanosleep, struct timespec __user *, rqtp,
                              struct timespec __user *, rmtp)
               {
                      struct timespec64 tu;
      
            5         if (get_timespec64(&tu, rqtp))
            6                 return -EFAULT;
      
                      if (!timespec64_valid(&tu))
            9                 return -EINVAL;
      
           11         current->restart_block.nanosleep.type = rmtp ? TT_NATIVE : TT_NONE;
           12         current->restart_block.nanosleep.rmtp = rmtp;
           13         return hrtimer_nanosleep(&tu, HRTIMER_MODE_REL, CLOCK_MONOTONIC);
               }
      
        # perf probe my_probe="sys_nanosleep:12 rmtp"
        Added new event:
          probe:my_probe       (on sys_nanosleep:12 with rmtp)
      
        You can now use it in all perf tools, such as:
      
      	perf record -e probe:my_probe -aR sleep 1
      
        #
        # perf trace -e probe:my_probe/max-stack=5/,*sleep sleep 1
           0.427 ( 0.003 ms): sleep/16690 nanosleep(rqtp: 0x7ffefc245090) ...
           0.430 (         ): probe:my_probe:(ffffffffbd112923) rmtp=0)
                                             sys_nanosleep ([kernel.kallsyms])
                                             do_syscall_64 ([kernel.kallsyms])
                                             return_from_SYSCALL_64 ([kernel.kallsyms])
                                             __nanosleep_nocancel (/usr/lib64/libc-2.25.so)
           0.427 (1000.208 ms): sleep/16690  ... [continued]: nanosleep()) = 0
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-elycoi8wy6y0w9dkj7ox1mzz@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      27702bcf
    • A
      perf syscalltbl: Support glob matching on syscall names · 89be3f8a
      Arnaldo Carvalho de Melo 提交于
      With two new methods, one to find the first match, returning its syscall
      id and its index in whatever internal database it keeps the syscall
      into, then one to find the next match, if any.
      
      Implemented only on arches where we actually read the syscall table from
      the kernel sources, i.e. x86-64 for now, all the others use the libaudit
      method for which this returns -1, i.e. just stubs were added, with the
      actual implementation using whatever libaudit functions for matching
      that may be available.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-i0sj4rxk1a63pfe9gl8z8irs@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      89be3f8a
  2. 30 8月, 2017 2 次提交
    • J
      perf report: Calculate the average cycles of iterations · c4ee0625
      Jin Yao 提交于
      The branch history code has a loop detection function. With this, we can
      get the number of iterations by calculating the removed loops.
      
      While it would be nice for knowing the average cycles of iterations.
      This patch adds up the cycles in branch entries of removed loops and
      save the result to the next branch entry (e.g. branch entry A).
      
      Finally it will display the iteration number and average cycles at the
      "from" of branch entry A.
      
      For example:
      perf record -g -j any,save_type ./div
      perf report --branch-history --no-children --stdio
      
      --22.63%--main div.c:42 (RET CROSS_2M)
                compute_flag div.c:28 (cycles:2 iter:173115 avg_cycles:2)
                |
                 --10.73%--compute_flag div.c:27 (RET CROSS_2M)
                           rand rand.c:28 (cycles:1)
                           rand rand.c:28 (RET CROSS_2M)
                           __random random.c:298 (cycles:1)
                           __random random.c:297 (COND_BWD CROSS_2M)
                           __random random.c:295 (cycles:1)
                           __random random.c:295 (COND_BWD CROSS_2M)
                           __random random.c:295 (cycles:1)
                           __random random.c:295 (RET CROSS_2M)
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1502111115-18305-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c4ee0625
    • I
      Merge tag 'perf-core-for-mingo-4.14-20170829' of... · 1b2f76d7
      Ingo Molnar 提交于
      Merge tag 'perf-core-for-mingo-4.14-20170829' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
       - Fix remote HITM detection for Skylake in 'perf c2c' (Jiri Olsa)
      
       - Fixes for the handling of PERF_RECORD_READ records (Jiri Olsa)
      
       - Fix kprobes blackist symbol lookup in 'perf probe' (Li Bin)
      
       - The PLT header and entry sizes are not the same in !x86, fix it for ARM and
         AARCH64 (Li Bin)
      
       - Beautify pkey_{alloc,free,mprotect} arguments in 'perf trace' (Arnaldo Carvalho de Melo)
      
       - Fix CC, AR, LD external definition, allow flex and bison to be
         externally defined and other related Makefile fixes (David Carrillo-Cisneros)
      
       - Sync CPU features kernel ABI headers with tooling headers (Arnaldo Carvalho de Melo)
      
       - Fix path to PMU formats in 'perf stat' documentation (Jack Henschel)
      
       - Fix static build with newer toolchains (Jiri Olsa)
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      1b2f76d7
  3. 29 8月, 2017 25 次提交
    • L
      perf symbols: Fix plt entry calculation for ARM and AARCH64 · b2f76050
      Li Bin 提交于
      On x86, the plt header size is as same as the plt entry size, and can be
      identified from shdr's sh_entsize of the plt.
      
      But we can't assume that the sh_entsize of the plt shdr is always the
      plt entry size in all architecture, and the plt header size may be not
      as same as the plt entry size in some architecure.
      
      On ARM, the plt header size is 20 bytes and the plt entry size is 12
      bytes (don't consider the FOUR_WORD_PLT case) that refer to the binutils
      implementation. The plt section is as follows:
      
      Disassembly of section .plt:
      000004a0 <__cxa_finalize@plt-0x14>:
       4a0:   e52de004        push    {lr}            ; (str lr, [sp, #-4]!)
       4a4:   e59fe004        ldr     lr, [pc, #4]    ; 4b0 <_init+0x1c>
       4a8:   e08fe00e        add     lr, pc, lr
       4ac:   e5bef008        ldr     pc, [lr, #8]!
       4b0:   00008424        .word   0x00008424
      
      000004b4 <__cxa_finalize@plt>:
       4b4:   e28fc600        add     ip, pc, #0, 12
       4b8:   e28cca08        add     ip, ip, #8, 20  ; 0x8000
       4bc:   e5bcf424        ldr     pc, [ip, #1060]!        ; 0x424
      
      000004c0 <printf@plt>:
       4c0:   e28fc600        add     ip, pc, #0, 12
       4c4:   e28cca08        add     ip, ip, #8, 20  ; 0x8000
       4c8:   e5bcf41c        ldr     pc, [ip, #1052]!        ; 0x41c
      
      On AARCH64, the plt header size is 32 bytes and the plt entry size is 16
      bytes.  The plt section is as follows:
      
      Disassembly of section .plt:
      0000000000000560 <__cxa_finalize@plt-0x20>:
       560:   a9bf7bf0        stp     x16, x30, [sp,#-16]!
       564:   90000090        adrp    x16, 10000 <__FRAME_END__+0xf8a8>
       568:   f944be11        ldr     x17, [x16,#2424]
       56c:   9125e210        add     x16, x16, #0x978
       570:   d61f0220        br      x17
       574:   d503201f        nop
       578:   d503201f        nop
       57c:   d503201f        nop
      
      0000000000000580 <__cxa_finalize@plt>:
       580:   90000090        adrp    x16, 10000 <__FRAME_END__+0xf8a8>
       584:   f944c211        ldr     x17, [x16,#2432]
       588:   91260210        add     x16, x16, #0x980
       58c:   d61f0220        br      x17
      
      0000000000000590 <__gmon_start__@plt>:
       590:   90000090        adrp    x16, 10000 <__FRAME_END__+0xf8a8>
       594:   f944c611        ldr     x17, [x16,#2440]
       598:   91262210        add     x16, x16, #0x988
       59c:   d61f0220        br      x17
      
      NOTES:
      
      In addition to ARM and AARCH64, other architectures, such as
      s390/alpha/mips/parisc/poperpc/sh/sparc/xtensa also need to consider
      this issue.
      Signed-off-by: NLi Bin <huawei.libin@huawei.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexis Berlemont <alexis.berlemont@gmail.com>
      Cc: David Tolnay <dtolnay@gmail.com>
      Cc: Hanjun Guo <guohanjun@huawei.com>
      Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: zhangmengting@huawei.com
      Link: http://lkml.kernel.org/r/1496622849-21877-1-git-send-email-huawei.libin@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b2f76050
    • L
      perf probe: Fix kprobe blacklist checking condition · 2c29461e
      Li Bin 提交于
      The commit 9aaf5a5f ("perf probe: Check kprobes blacklist when
      adding new events"), 'perf probe' supports checking the blacklist of the
      fuctions which can not be probed.  But the checking condition is wrong,
      that the end_addr of the symbol which is the start_addr of the next
      symbol can't be included.
      
      Committer notes:
      
      IOW make it match its kernel counterpart in kernel/kprobes.c:
      
        bool within_kprobe_blacklist(unsigned long addr)
      
      Each entry have as its end address not its end address, but the first
      address _outside_ that symbol, which for related functions, is the first
      address of the next symbol, like these from kernel/trace/trace_probe.c:
      
      0xffffffffbd198df0-0xffffffffbd198e40	print_type_u8
      0xffffffffbd198e40-0xffffffffbd198e90	print_type_u16
      0xffffffffbd198e90-0xffffffffbd198ee0	print_type_u32
      0xffffffffbd198ee0-0xffffffffbd198f30	print_type_u64
      0xffffffffbd198f30-0xffffffffbd198f80	print_type_s8
      0xffffffffbd198f80-0xffffffffbd198fd0	print_type_s16
      0xffffffffbd198fd0-0xffffffffbd199020	print_type_s32
      0xffffffffbd199020-0xffffffffbd199070	print_type_s64
      0xffffffffbd199070-0xffffffffbd1990c0	print_type_x8
      0xffffffffbd1990c0-0xffffffffbd199110	print_type_x16
      0xffffffffbd199110-0xffffffffbd199160	print_type_x32
      0xffffffffbd199160-0xffffffffbd1991b0	print_type_x64
      
      But not always:
      
      0xffffffffbd1997b0-0xffffffffbd1997c0	fetch_kernel_stack_address (kernel/trace/trace_probe.c)
      0xffffffffbd1c57f0-0xffffffffbd1c58b0	__context_tracking_enter   (kernel/context_tracking.c)
      Signed-off-by: NLi Bin <huawei.libin@huawei.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: zhangmengting@huawei.com
      Fixes: 9aaf5a5f ("perf probe: Check kprobes blacklist when adding new events")
      Link: http://lkml.kernel.org/r/1504011443-7269-1-git-send-email-huawei.libin@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2c29461e
    • P
      perf/x86: Fix caps/ for !Intel · 5da382eb
      Peter Zijlstra 提交于
      Move the 'max_precise' capability into generic x86 code where it
      belongs. This fixes a sysfs splat on !Intel systems where we fail to set
      x86_pmu_caps_group.atts.
      Reported-and-tested-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: hpa@zytor.com
      Fixes: 22688d1c20f5 ("x86/perf: Export some PMU attributes in caps/ directory")
      Link: http://lkml.kernel.org/r/20170828104650.2u3rsim4jafyjzv2@hirez.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5da382eb
    • K
      perf/core, x86: Add PERF_SAMPLE_PHYS_ADDR · fc7ce9c7
      Kan Liang 提交于
      For understanding how the workload maps to memory channels and hardware
      behavior, it's very important to collect address maps with physical
      addresses. For example, 3D XPoint access can only be found by filtering
      the physical address.
      
      Add a new sample type for physical address.
      
      perf already has a facility to collect data virtual address. This patch
      introduces a function to convert the virtual address to physical address.
      The function is quite generic and can be extended to any architecture as
      long as a virtual address is provided.
      
       - For kernel direct mapping addresses, virt_to_phys is used to convert
         the virtual addresses to physical address.
      
       - For user virtual addresses, __get_user_pages_fast is used to walk the
         pages tables for user physical address.
      
       - This does not work for vmalloc addresses right now. These are not
         resolved, but code to do that could be added.
      
      The new sample type requires collecting the virtual address. The
      virtual address will not be output unless SAMPLE_ADDR is applied.
      
      For security, the physical address can only be exposed to root or
      privileged user.
      Tested-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: mpe@ellerman.id.au
      Link: http://lkml.kernel.org/r/1503967969-48278-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fc7ce9c7
    • A
      perf/core, pt, bts: Get rid of itrace_started · 8d4e6c4c
      Alexander Shishkin 提交于
      I just noticed that hw.itrace_started and hw.config are aliased to the
      same location. Now, the PT driver happens to use both, which works out
      fine by sheer luck:
      
       - STORE(hw.itrace_start) is ordered before STORE(hw.config), in the
          program order, although there are no compiler barriers to ensure that,
      
       - to the perf_log_itrace_start() hw.itrace_start looks set at the same
         time as when it is intended to be set because both stores happen in the
         same path,
      
       - hw.config is never reset to zero in the PT driver.
      
      Now, the use of hw.config by the PT driver makes more sense (it being a
      HW PMU) than messing around with itrace_started, which is an awkward API
      to begin with.
      
      This patch replaces hw.itrace_started with an attach_state bit and an
      API call for the PMU drivers to use to communicate the condition.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/20170330153956.25994-1-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8d4e6c4c
    • I
      e0563e04
    • Z
      perf/ftrace: Fix double traces of perf on ftrace:function · 75e83876
      Zhou Chengming 提交于
      When running perf on the ftrace:function tracepoint, there is a bug
      which can be reproduced by:
      
        perf record -e ftrace:function -a sleep 20 &
        perf record -e ftrace:function ls
        perf script
      
                    ls 10304 [005]   171.853235: ftrace:function:
        perf_output_begin
                    ls 10304 [005]   171.853237: ftrace:function:
        perf_output_begin
                    ls 10304 [005]   171.853239: ftrace:function:
        task_tgid_nr_ns
                    ls 10304 [005]   171.853240: ftrace:function:
        task_tgid_nr_ns
                    ls 10304 [005]   171.853242: ftrace:function:
        __task_pid_nr_ns
                    ls 10304 [005]   171.853244: ftrace:function:
        __task_pid_nr_ns
      
      We can see that all the function traces are doubled.
      
      The problem is caused by the inconsistency of the register
      function perf_ftrace_event_register() with the probe function
      perf_ftrace_function_call(). The former registers one probe
      for every perf_event. And the latter handles all perf_events
      on the current cpu. So when two perf_events on the current cpu,
      the traces of them will be doubled.
      
      So this patch adds an extra parameter "event" for perf_tp_event,
      only send sample data to this event when it's not NULL.
      Signed-off-by: NZhou Chengming <zhouchengming1@huawei.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@kernel.org
      Cc: alexander.shishkin@linux.intel.com
      Cc: huawei.libin@huawei.com
      Link: http://lkml.kernel.org/r/1503668977-12526-1-git-send-email-zhouchengming1@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      75e83876
    • M
      perf/core: Fix potential double-fetch bug · f12f42ac
      Meng Xu 提交于
      While examining the kernel source code, I found a dangerous operation that
      could turn into a double-fetch situation (a race condition bug) where the same
      userspace memory region are fetched twice into kernel with sanity checks after
      the first fetch while missing checks after the second fetch.
      
        1. The first fetch happens in line 9573 get_user(size, &uattr->size).
      
        2. Subsequently the 'size' variable undergoes a few sanity checks and
           transformations (line 9577 to 9584).
      
        3. The second fetch happens in line 9610 copy_from_user(attr, uattr, size)
      
        4. Given that 'uattr' can be fully controlled in userspace, an attacker can
           race condition to override 'uattr->size' to arbitrary value (say, 0xFFFFFFFF)
           after the first fetch but before the second fetch. The changed value will be
           copied to 'attr->size'.
      
        5. There is no further checks on 'attr->size' until the end of this function,
           and once the function returns, we lose the context to verify that 'attr->size'
           conforms to the sanity checks performed in step 2 (line 9577 to 9584).
      
        6. My manual analysis shows that 'attr->size' is not used elsewhere later,
           so, there is no working exploit against it right now. However, this could
           easily turns to an exploitable one if careless developers start to use
           'attr->size' later.
      
      To fix this, override 'attr->size' from the second fetch to the one from the
      first fetch, regardless of what is actually copied in.
      
      In this way, it is assured that 'attr->size' is consistent with the checks
      performed after the first fetch.
      Signed-off-by: NMeng Xu <mengxu.gatech@gmail.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@kernel.org
      Cc: alexander.shishkin@linux.intel.com
      Cc: meng.xu@gatech.edu
      Cc: sanidhya@gatech.edu
      Cc: taesoo@gatech.edu
      Link: http://lkml.kernel.org/r/1503522470-35531-1-git-send-email-meng.xu@gatech.eduSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f12f42ac
    • L
      page waitqueue: always add new entries at the end · 9c3a815f
      Linus Torvalds 提交于
      Commit 3510ca20 ("Minor page waitqueue cleanups") made the page
      queue code always add new waiters to the back of the queue, which helps
      upcoming patches to batch the wakeups for some horrid loads where the
      wait queues grow to thousands of entries.
      
      However, I forgot about the nasrt add_page_wait_queue() special case
      code that is only used by the cachefiles code.  That one still continued
      to add the new wait queue entries at the beginning of the list.
      
      Fix it, because any sane batched wakeup will require that we don't
      suddenly start getting new entries at the beginning of the list that we
      already handled in a previous batch.
      
      [ The current code always does the whole list while holding the lock, so
        wait queue ordering doesn't matter for correctness, but even then it's
        better to add later entries at the end from a fairness standpoint ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c3a815f
    • T
      cpumask: fix spurious cpumask_of_node() on non-NUMA multi-node configs · b339752d
      Tejun Heo 提交于
      When !NUMA, cpumask_of_node(@node) equals cpu_online_mask regardless of
      @node.  The assumption seems that if !NUMA, there shouldn't be more than
      one node and thus reporting cpu_online_mask regardless of @node is
      correct.  However, that assumption was broken years ago to support
      DISCONTIGMEM and whether a system has multiple nodes or not is
      separately controlled by NEED_MULTIPLE_NODES.
      
      This means that, on a system with !NUMA && NEED_MULTIPLE_NODES,
      cpumask_of_node() will report cpu_online_mask for all possible nodes,
      indicating that the CPUs are associated with multiple nodes which is an
      impossible configuration.
      
      This bug has been around forever but doesn't look like it has caused any
      noticeable symptoms.  However, it triggers a WARN recently added to
      workqueue to verify NUMA affinity configuration.
      
      Fix it by reporting empty cpumask on non-zero nodes if !NUMA.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-and-tested-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b339752d
    • A
      ARCv2: SMP: Mask only private-per-core IRQ lines on boot at core intc · e8206d2b
      Alexey Brodkin 提交于
      Recent commit a8ec3ee8 "arc: Mask individual IRQ lines during core
      INTC init" breaks interrupt handling on ARCv2 SMP systems.
      
      That commit masked all interrupts at onset, as some controllers on some
      boards (customer as well as internal), would assert interrutps early
      before any handlers were installed.  For SMP systems, the masking was
      done at each cpu's core-intc.  Later, when the IRQ was actually
      requested, it was unmasked, but only on the requesting cpu.
      
      For "common" interrupts, which were wired up from the 2nd level IDU
      intc, this was as issue as they needed to be enabled on ALL the cpus
      (given that IDU IRQs are by default served Round Robin across cpus)
      
      So fix that by NOT masking "common" interrupts at core-intc, but instead
      at the 2nd level IDU intc (latter already being done in idu_of_init())
      
      Fixes: a8ec3ee8 ("arc: Mask individual IRQ lines during core INTC init")
      Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
      [vgupta: reworked changelog, removed the extraneous idu_irq_mask_raw()]
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e8206d2b
    • H
      fs/select: Fix memory corruption in compat_get_fd_set() · 79de3cbe
      Helge Deller 提交于
      Commit 464d6242 ("select: switch compat_{get,put}_fd_set() to
      compat_{get,put}_bitmap()") changed the calculation on how many bytes
      need to be zeroed when userspace handed over a NULL pointer for a fdset
      array in the select syscall.
      
      The calculation was changed in compat_get_fd_set() wrongly from
      	memset(fdset, 0, ((nr + 1) & ~1)*sizeof(compat_ulong_t));
      to
      	memset(fdset, 0, ALIGN(nr, BITS_PER_LONG));
      
      The ALIGN(nr, BITS_PER_LONG) calculates the number of _bits_ which need
      to be zeroed in the target fdset array (rounded up to the next full bits
      for an unsigned long).
      
      But the memset() call expects the number of _bytes_ to be zeroed.
      
      This leads to clearing more memory than wanted (on the stack area or
      even at kmalloc()ed memory areas) and to random kernel crashes as we
      have seen them on the parisc platform.
      
      The correct change should have been
      
      	memset(fdset, 0, (ALIGN(nr, BITS_PER_LONG) / BITS_PER_LONG) * BYTES_PER_LONG);
      
      which is the same as can be archieved with a call to
      
      	zero_fd_set(nr, fdset).
      
      Fixes: 464d6242 ("select: switch compat_{get,put}_fd_set() to compat_{get,put}_bitmap()"
      Acked-by: N: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79de3cbe
    • A
      perf trace beauty: Beautify pkey_{alloc,free,mprotect} arguments · 83bc9c37
      Arnaldo Carvalho de Melo 提交于
      Reuse 'mprotect' beautifiers for 'pkey_mprotect'.
      
      System wide tracing pkey_alloc, pkey_free and pkey_mprotect calls, with
      backtraces:
      
        # perf trace -e pkey_alloc,pkey_mprotect,pkey_free --max-stack=5
           0.000 ( 0.011 ms): pkey/7818 pkey_alloc(init_val: DISABLE_ACCESS|DISABLE_WRITE) = -1 EINVAL Invalid argument
                                             syscall (/usr/lib64/libc-2.25.so)
                                             pkey_alloc (/home/acme/c/pkey)
           0.022 ( 0.003 ms): pkey/7818 pkey_mprotect(start: 0x7f28c3890000, len: 4096, prot: READ|WRITE, pkey: -1) = 0
                                             syscall (/usr/lib64/libc-2.25.so)
                                             pkey_mprotect (/home/acme/c/pkey)
           0.030 ( 0.002 ms): pkey/7818 pkey_free(pkey: -1                               ) = -1 EINVAL Invalid argument
                                             syscall (/usr/lib64/libc-2.25.so)
                                             pkey_free (/home/acme/c/pkey)
      
      The tools/include/uapi/asm-generic/mman-common.h file is used to find
      the access rights defines for the pkey_alloc syscall second argument.
      
      Since we have the detector of changes for the tools/include header files
      versus its kernel origin (include/uapi/asm-generic/mman-common.h), we'll
      get whatever new flag appears for that argument automatically.
      
      This method should be used in other cases where it is easy to generate
      those flags tables because the header has properly namespaced defines
      like PKEY_DISABLE_ACCESS and PKEY_DISABLE_WRITE.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-3xq5312qlks7wtfzv2sk3nct@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      83bc9c37
    • A
      tools headers: Sync cpu features kernel ABI headers with tooling headers · a2105f8a
      Arnaldo Carvalho de Melo 提交于
      These changes made the tools/arch/x86/include/ headers to drift from its
      kernel origins:
      
        910448bb ("perf/x86/amd/uncore: Rename cpufeatures macro for cache counters")
        5442c269 ("x86/cpufeature, kvm/svm: Rename (shorten) the new "virtualized VMSAVE/VMLOAD" CPUID flag")
        cba4671a ("x86/mm: Disable PCID on 32-bit kernels")
      
      Which was detected while building perf:
      
        make: Entering directory '/home/acme/git/linux/tools/perf'
          BUILD:   Doing 'make -j4' parallel build
        Warning: Kernel ABI header at 'tools/arch/x86/include/asm/disabled-features.h' differs from latest version at 'arch/x86/include/asm/disabled-features.h'
        Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
      
      This sync causes just these perf object files to be rebuilt:
      
        CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
        CC       /tmp/build/perf/bench/mem-memset-x86-64-asm.o
      
      And the changes in the above changesets don't entail any need for change
      in the above 'perf bench' files.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-456aafouj911a4x4zwt8stkm@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a2105f8a
    • D
      perf tools: Pass full path of FEATURES_DUMP · 70ff7c6c
      David Carrillo-Cisneros 提交于
      When building with an external FEATURES_DUMP, bpf complains
      that features dump file is not found. Fix it by passing full file path.
      Signed-off-by: NDavid Carrillo-Cisneros <davidcc@google.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/20170827075442.108534-7-davidcc@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      70ff7c6c
    • D
      perf tools: Robustify detection of clang binary · 3866058e
      David Carrillo-Cisneros 提交于
      Prior to this patch, make scripts tested for CLANG with ifeq ($(CC),
      clang), failing to detect CLANG binaries with different names. Fix it by
      testing for the existence of __clang__ macro in the list of compiler
      defined macros.
      Signed-off-by: NDavid Carrillo-Cisneros <davidcc@google.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/20170827075442.108534-5-davidcc@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3866058e
    • D
      tools lib: Allow external definition of CC, AR and LD · 12024aac
      David Carrillo-Cisneros 提交于
      Use already defined values for CC, AR and LD when available.
      Signed-off-by: NDavid Carrillo-Cisneros <davidcc@google.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/20170827075442.108534-4-davidcc@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      12024aac
    • D
      perf tools: Allow external definition of flex and bison binary names · 39a59f1e
      David Carrillo-Cisneros 提交于
      Allow user to define flex and bison binary names by passing FLEX and
      BISON variables.
      Signed-off-by: NDavid Carrillo-Cisneros <davidcc@google.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/20170827075442.108534-3-davidcc@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      39a59f1e
    • D
      tools build tests: Don't hardcode gcc name · ba5d1a48
      David Carrillo-Cisneros 提交于
      Use $(CC) instead of harcoded gcc binary name.
      Signed-off-by: NDavid Carrillo-Cisneros <davidcc@google.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/20170827075442.108534-2-davidcc@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ba5d1a48
    • J
      perf report: Group stat values on global event id · 9933183e
      Jiri Olsa 提交于
      There's no big value on displaying counts for every event ID, which is
      one per every CPU. Rather than that, displaying the whole sum for the
      event.
      
        $ perf record -c 100000 -e cycles:u -s test
        $ perf report -T
      
      Before:
        #  PID   TID  cycles:u  cycles:u  cycles:u  cycles:u  ... [20 more columns of 'cycles:u']
          3339  3339         0         0         0         0
          3340  3340         0         0         0         0
          3341  3341         0         0         0         0
          3342  3342         0         0         0         0
      
      Now:
        #  PID   TID  cycles:u
          3339  3339     19678
          3340  3340     18744
          3341  3341     17335
          3342  3342     26414
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20170824162737.7813-10-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9933183e
    • J
      perf values: Zero value buffers · a1834fc9
      Jiri Olsa 提交于
      We need to make sure the array of value pointers are zero initialized,
      because we use them in realloc later on and uninitialized non zero value
      will cause allocation error and aborted execution.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20170824162737.7813-9-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a1834fc9
    • J
      perf values: Fix allocation check · f4ef3b7c
      Jiri Olsa 提交于
      Bailing out in case the allocation failed, not the other way round.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20170824162737.7813-8-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f4ef3b7c
    • J
      perf values: Fix thread index bug · 64eed1de
      Jiri Olsa 提交于
      We are taking wrong index (+1) for first thread, which leaves thread
      with index 0 unused and uninitialized.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20170824162737.7813-7-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      64eed1de
    • J
      perf report: Add dump_read function · dac7f6b7
      Jiri Olsa 提交于
      Adding dump_read function to gather all the dump output of read
      function. Adding output of enabled and running times and id if enabled
      (3 new lines with '...' prefix below).
      
        $ perf record -s ...
        $ perf report -D
      
        958358311769 0x91f8 [0x40]: PERF_RECORD_READ: 3339 3339 cycles:u 0
        ... time enabled : 958358313731
        ... time running : 958358313731
        ... id           : 80
      
      Committer note:
      
      Do not use 'read' as a variable name as it breaks the build on older
      systems, such as RHEL6:
      
          CC       /tmp/build/perf/util/session.o
        cc1: warnings being treated as errors
        util/session.c: In function 'dump_read':
        util/session.c:1132: error: declaration of 'read' shadows a global declaration
        /usr/include/bits/unistd.h:35: error: shadowed declaration is here
        mv: cannot stat `/tmp/build/perf/util/.session.o.tmp': No such file or directory
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20170824162737.7813-6-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      dac7f6b7
    • L
      Merge tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming · 702e9762
      Linus Torvalds 提交于
      Pull c6x tweaks from Mark Salter.
      
      * tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming:
        c6x: Convert to using %pOF instead of full_name
        c6x: defconfig: Cleanup from old Kconfig options
      702e9762
  4. 28 8月, 2017 10 次提交