1. 10 3月, 2018 1 次提交
  2. 06 3月, 2018 3 次提交
  3. 05 3月, 2018 7 次提交
    • A
      tools headers: Sync x86's cpufeatures.h · 4caea057
      Arnaldo Carvalho de Melo 提交于
      The changes in dd84441a ("x86/speculation: Use IBRS if available
      before calling into firmware") don't need any kind of special treatment
      in the current tools/perf/ codebase, so just update the copy to get rid
      of the perf build warning:
      
        BUILD:   Doing 'make -j4' parallel build
        Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-mzmuxocrf96v922xkerey3ns@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4caea057
    • A
      tools headers: Sync copy of kvm UAPI headers · d976a6e9
      Arnaldo Carvalho de Melo 提交于
      In 801e459a ("KVM: x86: Add a framework for supporting MSR-based
      features") a new ioctl was introduced, which with this sync of the kvm
      UAPI headers, makes 'perf trace' know about it:
      
        $ cd /tmp/build/perf/trace/beauty/generated/ioctl/
        $ diff -u kvm_ioctl_array.c.old kvm_ioctl_array.c
        --- /tmp/kvm_ioctl_array.c	2018-03-05 11:55:38.409145056 -0300
        +++ /tmp/build/perf/trace/beauty/generated/ioctl/kvm_ioctl_array.c	2018-03-05 11:56:17.456153501 -0300
        @@ -6,6 +6,7 @@
       	[0x04] = "GET_VCPU_MMAP_SIZE",
       	[0x05] = "GET_SUPPORTED_CPUID",
       	[0x09] = "GET_EMULATED_CPUID",
        +	[0x0a] = "GET_MSR_FEATURE_INDEX_LIST",
       	[0x40] = "SET_MEMORY_REGION",
       	[0x41] = "CREATE_VCPU",
       	[0x42] = "GET_DIRTY_LOG",
      
      So when using 'perf trace -e ioctl' that will appear along with the
      others, like in this excerpt of a system wide session:
      
        14.556 ( 0.006 ms): CPU 0/KVM/16077 ioctl(fd: 19<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0
        14.565 ( 0.006 ms): CPU 0/KVM/16077 ioctl(fd: 19<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) = 0
        14.573 (         ): CPU 0/KVM/16077 ioctl(fd: 19<anon_inode:kvm-vcpu:0>, cmd: KVM_RUN) ...
        34.075 ( 0.016 ms): gnome-shell/2192 ioctl(fd: 8</dev/dri/card0>, cmd: DRM_I915_GEM_BUSY, arg: 0x7ffe4e73e850) = 0
        40.549 ( 0.012 ms): gnome-shell/2192 ioctl(fd: 8</dev/dri/card0>, cmd: DRM_I915_GEM_BUSY, arg: 0x7ffe4e73ece0) = 0
        40.625 ( 0.005 ms): gnome-shell/2192 ioctl(fd: 8</dev/dri/card0>, cmd: DRM_I915_GEM_BUSY, arg: 0x7ffe4e73e940) = 0
        40.632 ( 0.003 ms): gnome-shell/2192 ioctl(fd: 8</dev/dri/card0>, cmd: DRM_I915_GEM_MADVISE, arg: 0x7ffe4e73e9b0) = 0
      
      This also silences the perf build header copy drift verifier:
      
        make: Entering directory '/home/acme/git/perf/tools/perf'
          BUILD:   Doing 'make -j4' parallel build
        Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-h31oz5g0mt1dh2s2ajq6o6no@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d976a6e9
    • J
      perf record: Fix crash in pipe mode · cfacbabd
      Jiri Olsa 提交于
      Currently we can crash perf record when running in pipe mode, like:
      
        $ perf record ls | perf report
        # To display the perf.data header info, please use --header/--header-only options.
        #
        perf: Segmentation fault
        Error:
        The - file has no samples!
      
      The callstack of the crash is:
      
          0x0000000000515242 in perf_event__synthesize_event_update_name
        3513            ev = event_update_event__new(len + 1, PERF_EVENT_UPDATE__NAME, evsel->id[0]);
        (gdb) bt
        #0  0x0000000000515242 in perf_event__synthesize_event_update_name
        #1  0x00000000005158a4 in perf_event__synthesize_extra_attr
        #2  0x0000000000443347 in record__synthesize
        #3  0x00000000004438e3 in __cmd_record
        #4  0x000000000044514e in cmd_record
        #5  0x00000000004cbc95 in run_builtin
        #6  0x00000000004cbf02 in handle_internal_command
        #7  0x00000000004cc054 in run_argv
        #8  0x00000000004cc422 in main
      
      The reason of the crash is that the evsel does not have ids array
      allocated and the pipe's synthesize code tries to access it.
      
      We don't force evsel ids allocation when we have single event, because
      it's not needed. However we need it when we are in pipe mode even for
      single event as a key for evsel update event.
      
      Fixing this by forcing evsel ids allocation event for single event, when
      we are in pipe mode.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20180302161354.30192-1-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cfacbabd
    • A
      perf annotate browser: Be more robust when drawing jump arrows · 9cf195f8
      Arnaldo Carvalho de Melo 提交于
      This first happened with a gcc function, _cpp_lex_token, that has the
      usual jumps:
      
       │1159e6c: ↓ jne    115aa32 <_cpp_lex_token@@base+0xf92>
      
      I.e. jumps to a label inside that function (_cpp_lex_token), and those
      works, but also this kind:
      
       │1159e8b: ↓ jne    c469be <cpp_named_operator2name@@base+0xa72>
      
      I.e. jumps to another function, outside _cpp_lex_token, which are not
      being correctly handled generating as a side effect references to
      ab->offset[] entries that are set to NULL, so to make this code more
      robust, check that here.
      
      A proper fix for will be put in place, looking at the function name
      right after the '<' token and probably treating this like a 'call'
      instruction.
      
      For now just don't draw the arrow.
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Tested-by: NIngo Molnar <mingo@kernel.org>
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Jin Yao <yao.jin@intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Link: https://lkml.kernel.org/n/tip-5tzvb875ep2sel03aeefgmud@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9cf195f8
    • K
      perf top: Fix annoying fallback message on older kernels · 626af862
      Kan Liang 提交于
      On older (e.g. v4.4) kernels, an annoying fallback message can be
      observed in 'perf top':
      
      	┌─Warning:──────────────────────┐
      	│fall back to non-overwrite mode│
      	│                               │
      	│                               │
      	│Press any key...               │
      	└───────────────────────────────┘
      
      The 'perf top' utility has been changed to overwrite mode since commit
      ebebbf08 ("perf top: Switch default mode to overwrite mode").
      
      For older kernels which don't have overwrite mode support, 'perf top'
      will fall back to non-overwrite mode and print out the fallback message
      using ui__warning(), which needs user's input to close.
      
      The fallback message is not critical for end users. Turning it to debug
      message which is printed when running with -vv.
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Fixes: ebebbf08 ("perf top: Switch default mode to overwrite mode")
      Link: http://lkml.kernel.org/r/1519669030-176549-1-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      626af862
    • S
      perf kallsyms: Fix the usage on the man page · f6d3f35e
      Sangwon Hong 提交于
      First, all man pages highlight only perf and subcommands except 'perf
      kallsyms', which includes the full usage. Fix it for commands to
      monopolize underlines.
      
      Second, options can be ommited when executing 'perf kallsyms', so add
      square brackets between <option>.
      Signed-off-by: NSangwon Hong <qpakzk@gmail.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Link: http://lkml.kernel.org/r/1518377864-20353-1-git-send-email-qpakzk@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f6d3f35e
    • D
      tc-testing: skbmod: fix match value of ethertype · 79f3a8e6
      Davide Caratti 提交于
      iproute2 print_skbmod() prints the configured ethertype using format 0x%X:
      therefore, test 9aa8 systematically fails, because it configures action #4
      using ethertype 0x0031, and expects 0x0031 when it reads it back. Changing
      the expected value to 0x31 lets the test result 'not ok' become 'ok'.
      
      tested with:
       # ./tdc.py -e 9aa8
       Test 9aa8: Get a single skbmod action from a list
       All test results:
      
       1..1
       ok 1 9aa8 Get a single skbmod action from a list
      
      Fixes: cf797ac4 ("tc-testing: Add test cases for police and skbmod")
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79f3a8e6
  4. 03 3月, 2018 1 次提交
  5. 02 3月, 2018 1 次提交
    • M
      selftests/powerpc: Skip the subpage_prot tests if the syscall is unavailable · cd4a6f3a
      Michael Ellerman 提交于
      The subpage_prot syscall is only functional when the system is using
      the Hash MMU. Since commit 5b2b8071 ("powerpc/mm: Invalidate
      subpage_prot() system call on radix platforms") it returns ENOENT when
      the Radix MMU is active. Currently this just makes the test fail.
      
      Additionally the syscall is not available if the kernel is built with
      4K pages, or if CONFIG_PPC_SUBPAGE_PROT=n, in which case it returns
      ENOSYS because the syscall is missing entirely.
      
      So check explicitly for ENOENT and ENOSYS and skip if we see either of
      those.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      cd4a6f3a
  6. 28 2月, 2018 2 次提交
  7. 27 2月, 2018 4 次提交
  8. 26 2月, 2018 1 次提交
  9. 25 2月, 2018 1 次提交
  10. 24 2月, 2018 13 次提交
    • S
      tools/kvm_stat: print 'Total' line for multiple events only · 6789af03
      Stefan Raspl 提交于
      The 'Total' line looks a bit weird when we have a single event only. This
      can happen e.g. due to filters. Therefore suppress when there's only a
      single event in the output.
      Signed-off-by: NStefan Raspl <raspl@linux.vnet.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6789af03
    • S
      tools/kvm_stat: group child events indented after parent · df72ecfc
      Stefan Raspl 提交于
      We keep the current logic that sorts all events (parent and child), but
      re-shuffle the events afterwards, grouping the children after the
      respective parent. Note that the percentage column for child events
      gives the percentage of the parent's total.
      Since we rework the logic anyway, we modify the total average
      calculation to use the raw numbers instead of the (rounded) averages.
      Note that this can result in differing numbers (between total average
      and the sum of the individual averages) due to rounding errors.
      Signed-off-by: NStefan Raspl <raspl@linux.vnet.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      df72ecfc
    • S
      tools/kvm_stat: separate drilldown and fields filtering · 18e8f410
      Stefan Raspl 提交于
      Drilldown (i.e. toggle display of child trace events) was implemented by
      overriding the fields filter. This resulted in inconsistencies: E.g. when
      drilldown was not active, adding a filter that also matches child trace
      events would not only filter fields according to the filter, but also add
      in the child trace events matching the filter. E.g. on x86, setting
      'kvm_userspace_exit' as the fields filter after startup would result in
      display of kvm_userspace_exit(DCR), although that wasn't previously
      present - not exactly what one would expect from a filter.
      This patch addresses the issue by keeping drilldown and fields filter
      separate. While at it, we also fix a PEP8 issue by adding a blank line
      at one place (since we're in the area...).
      We implement this by adding a framework that also allows to define a
      taxonomy among the debugfs events to identify child trace events. I.e.
      drilldown using 'x' can now also work with debugfs. A respective parent-
      child relationship is only known for S390 at the moment, but could be
      added adjusting other platforms' ARCH.dbg_is_child() methods
      accordingly.
      Signed-off-by: NStefan Raspl <raspl@linux.vnet.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      18e8f410
    • S
      tools/kvm_stat: eliminate extra guest/pid selection dialog · 516f1190
      Stefan Raspl 提交于
      We can do with a single dialog that takes both, pids and guest names.
      Note that we keep both interactive commands, 'p' and 'g' for now, to
      avoid confusion among users used to a specific key.
      
      While at it, we improve on some minor glitches regarding curses usage,
      e.g. cursor still visible when not supposed to be.
      Signed-off-by: NStefan Raspl <raspl@linux.vnet.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      516f1190
    • S
      tools/kvm_stat: mark private methods as such · c0e8c21e
      Stefan Raspl 提交于
      Helps quite a bit reading the code when it's obvious when a method is
      intended for internal use only.
      Signed-off-by: NStefan Raspl <raspl@linux.vnet.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c0e8c21e
    • S
      tools/kvm_stat: fix debugfs handling · 1fd6a708
      Stefan Raspl 提交于
      Te checks for debugfs assumed that debugfs is always mounted at
      /sys/kernel/debug - which is likely, but not guaranteed. This is addressed
      by checking /proc/mounts for the actual location.
      Furthermore, when debugfs was mounted, but the kvm module not loaded, a
      misleading error pointing towards debugfs not present was given.
      To reproduce,
      (a) run kvm_stat with debugfs mounted at a place different from
          /sys/kernel/debug
      (b) run kvm_stat with debugfs mounted but kvm module not loaded
      Signed-off-by: NStefan Raspl <raspl@linux.vnet.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1fd6a708
    • S
      tools/kvm_stat: print error on invalid regex · 1cd8bfb1
      Stefan Raspl 提交于
      Entering an invalid regular expression did not produce any indication of an
      error so far.
      To reproduce, press 'f' and enter 'foo(' (with an unescaped bracket).
      Signed-off-by: NStefan Raspl <raspl@linux.vnet.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1cd8bfb1
    • S
      tools/kvm_stat: fix crash when filtering out all non-child trace events · 3df33a0f
      Stefan Raspl 提交于
      When we apply a filter that will only leave child trace events, we
      receive a ZeroDivisionError when calculating the percentages.
      In that case, provide percentages based on child events only.
      To reproduce, run 'kvm_stat -f .*[\(].*'.
      Signed-off-by: NStefan Raspl <raspl@linux.vnet.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3df33a0f
    • M
      tools/kvm_stat: avoid 'is' for equality checks · 369d5a85
      Marc Hartmayer 提交于
      Use '==' for equality checks and 'is' when comparing identities.
      
      An example where '==' and 'is' behave differently:
      >>> a = 4242
      >>> a == 4242
      True
      >>> a is 4242
      False
      Signed-off-by: NMarc Hartmayer <mhartmay@linux.vnet.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      369d5a85
    • M
      tools/kvm_stat: use a more pythonic way to iterate over dictionaries · 0eb57800
      Marc Hartmayer 提交于
      If it's clear that the values of a dictionary will be used then use
      the '.items()' method.
      Signed-off-by: NMarc Hartmayer <mhartmay@linux.vnet.ibm.com>
      Tested-by: NStefan Raspl <raspl@linux.vnet.ibm.com>
      [Include fix for logging mode by Stefan Raspl]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0eb57800
    • M
      tools/kvm_stat: use a namedtuple for storing the values · 006f1548
      Marc Hartmayer 提交于
      Use a namedtuple for storing the values as it allows to access the
      fields of a tuple via names. This makes the overall code much easier
      to read and to understand. Access by index is still possible as
      before.
      Signed-off-by: NMarc Hartmayer <mhartmay@linux.vnet.ibm.com>
      Tested-by: NStefan Raspl <raspl@linux.vnet.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      006f1548
    • M
      tools/kvm_stat: simplify the sortkey function · faa312a5
      Marc Hartmayer 提交于
      The 'sortkey' function references a value in its enclosing
      scope (closure). This is not common practice for a sort key function
      so let's replace it. Additionally, the function 'sorted' has already a
      parameter for reversing the result therefore the inversion of the
      values is unneeded. The check for stats[x][1] is also superfluous as
      it's ensured that this value is initialized with 0.
      Signed-off-by: NMarc Hartmayer <mhartmay@linux.vnet.ibm.com>
      Tested-by: NStefan Raspl <raspl@linux.vnet.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      faa312a5
    • D
      bpf: allow xadd only on aligned memory · ca369602
      Daniel Borkmann 提交于
      The requirements around atomic_add() / atomic64_add() resp. their
      JIT implementations differ across architectures. E.g. while x86_64
      seems just fine with BPF's xadd on unaligned memory, on arm64 it
      triggers via interpreter but also JIT the following crash:
      
        [  830.864985] Unable to handle kernel paging request at virtual address ffff8097d7ed6703
        [...]
        [  830.916161] Internal error: Oops: 96000021 [#1] SMP
        [  830.984755] CPU: 37 PID: 2788 Comm: test_verifier Not tainted 4.16.0-rc2+ #8
        [  830.991790] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.29 07/17/2017
        [  830.998998] pstate: 80400005 (Nzcv daif +PAN -UAO)
        [  831.003793] pc : __ll_sc_atomic_add+0x4/0x18
        [  831.008055] lr : ___bpf_prog_run+0x1198/0x1588
        [  831.012485] sp : ffff00001ccabc20
        [  831.015786] x29: ffff00001ccabc20 x28: ffff8017d56a0f00
        [  831.021087] x27: 0000000000000001 x26: 0000000000000000
        [  831.026387] x25: 000000c168d9db98 x24: 0000000000000000
        [  831.031686] x23: ffff000008203878 x22: ffff000009488000
        [  831.036986] x21: ffff000008b14e28 x20: ffff00001ccabcb0
        [  831.042286] x19: ffff0000097b5080 x18: 0000000000000a03
        [  831.047585] x17: 0000000000000000 x16: 0000000000000000
        [  831.052885] x15: 0000ffffaeca8000 x14: 0000000000000000
        [  831.058184] x13: 0000000000000000 x12: 0000000000000000
        [  831.063484] x11: 0000000000000001 x10: 0000000000000000
        [  831.068783] x9 : 0000000000000000 x8 : 0000000000000000
        [  831.074083] x7 : 0000000000000000 x6 : 000580d428000000
        [  831.079383] x5 : 0000000000000018 x4 : 0000000000000000
        [  831.084682] x3 : ffff00001ccabcb0 x2 : 0000000000000001
        [  831.089982] x1 : ffff8097d7ed6703 x0 : 0000000000000001
        [  831.095282] Process test_verifier (pid: 2788, stack limit = 0x0000000018370044)
        [  831.102577] Call trace:
        [  831.105012]  __ll_sc_atomic_add+0x4/0x18
        [  831.108923]  __bpf_prog_run32+0x4c/0x70
        [  831.112748]  bpf_test_run+0x78/0xf8
        [  831.116224]  bpf_prog_test_run_xdp+0xb4/0x120
        [  831.120567]  SyS_bpf+0x77c/0x1110
        [  831.123873]  el0_svc_naked+0x30/0x34
        [  831.127437] Code: 97fffe97 17ffffec 00000000 f9800031 (885f7c31)
      
      Reason for this is because memory is required to be aligned. In
      case of BPF, we always enforce alignment in terms of stack access,
      but not when accessing map values or packet data when the underlying
      arch (e.g. arm64) has CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS set.
      
      xadd on packet data that is local to us anyway is just wrong, so
      forbid this case entirely. The only place where xadd makes sense in
      fact are map values; xadd on stack is wrong as well, but it's been
      around for much longer. Specifically enforce strict alignment in case
      of xadd, so that we handle this case generically and avoid such crashes
      in the first place.
      
      Fixes: 17a52670 ("bpf: verifier (add verifier core)")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      ca369602
  11. 23 2月, 2018 1 次提交
    • D
      bpf, arm64: fix out of bounds access in tail call · 16338a9b
      Daniel Borkmann 提交于
      I recently noticed a crash on arm64 when feeding a bogus index
      into BPF tail call helper. The crash would not occur when the
      interpreter is used, but only in case of JIT. Output looks as
      follows:
      
        [  347.007486] Unable to handle kernel paging request at virtual address fffb850e96492510
        [...]
        [  347.043065] [fffb850e96492510] address between user and kernel address ranges
        [  347.050205] Internal error: Oops: 96000004 [#1] SMP
        [...]
        [  347.190829] x13: 0000000000000000 x12: 0000000000000000
        [  347.196128] x11: fffc047ebe782800 x10: ffff808fd7d0fd10
        [  347.201427] x9 : 0000000000000000 x8 : 0000000000000000
        [  347.206726] x7 : 0000000000000000 x6 : 001c991738000000
        [  347.212025] x5 : 0000000000000018 x4 : 000000000000ba5a
        [  347.217325] x3 : 00000000000329c4 x2 : ffff808fd7cf0500
        [  347.222625] x1 : ffff808fd7d0fc00 x0 : ffff808fd7cf0500
        [  347.227926] Process test_verifier (pid: 4548, stack limit = 0x000000007467fa61)
        [  347.235221] Call trace:
        [  347.237656]  0xffff000002f3a4fc
        [  347.240784]  bpf_test_run+0x78/0xf8
        [  347.244260]  bpf_prog_test_run_skb+0x148/0x230
        [  347.248694]  SyS_bpf+0x77c/0x1110
        [  347.251999]  el0_svc_naked+0x30/0x34
        [  347.255564] Code: 9100075a d280220a 8b0a002a d37df04b (f86b694b)
        [...]
      
      In this case the index used in BPF r3 is the same as in r1
      at the time of the call, meaning we fed a pointer as index;
      here, it had the value 0xffff808fd7cf0500 which sits in x2.
      
      While I found tail calls to be working in general (also for
      hitting the error cases), I noticed the following in the code
      emission:
      
        # bpftool p d j i 988
        [...]
        38:   ldr     w10, [x1,x10]
        3c:   cmp     w2, w10
        40:   b.ge    0x000000000000007c              <-- signed cmp
        44:   mov     x10, #0x20                      // #32
        48:   cmp     x26, x10
        4c:   b.gt    0x000000000000007c
        50:   add     x26, x26, #0x1
        54:   mov     x10, #0x110                     // #272
        58:   add     x10, x1, x10
        5c:   lsl     x11, x2, #3
        60:   ldr     x11, [x10,x11]                  <-- faulting insn (f86b694b)
        64:   cbz     x11, 0x000000000000007c
        [...]
      
      Meaning, the tests passed because commit ddb55992 ("arm64:
      bpf: implement bpf_tail_call() helper") was using signed compares
      instead of unsigned which as a result had the test wrongly passing.
      
      Change this but also the tail call count test both into unsigned
      and cap the index as u32. Latter we did as well in 90caccdd
      ("bpf: fix bpf_tail_call() x64 JIT") and is needed in addition here,
      too. Tested on HiSilicon Hi1616.
      
      Result after patch:
      
        # bpftool p d j i 268
        [...]
        38:	ldr	w10, [x1,x10]
        3c:	add	w2, w2, #0x0
        40:	cmp	w2, w10
        44:	b.cs	0x0000000000000080
        48:	mov	x10, #0x20                  	// #32
        4c:	cmp	x26, x10
        50:	b.hi	0x0000000000000080
        54:	add	x26, x26, #0x1
        58:	mov	x10, #0x110                 	// #272
        5c:	add	x10, x1, x10
        60:	lsl	x11, x2, #3
        64:	ldr	x11, [x10,x11]
        68:	cbz	x11, 0x0000000000000080
        [...]
      
      Fixes: ddb55992 ("arm64: bpf: implement bpf_tail_call() helper")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      16338a9b
  12. 22 2月, 2018 5 次提交