1. 23 1月, 2021 2 次提交
    • D
      selftests: mlxsw: Add a scale test for physical ports · 5154b1b8
      Danielle Ratson 提交于
      Query the maximum number of supported physical ports using devlink-resource
      and test that this number can be reached by splitting each of the
      splittable ports to its width. Test that an error is returned in case
      the maximum number is exceeded.
      Signed-off-by: NDanielle Ratson <danieller@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      5154b1b8
    • M
      sch_htb: Hierarchical QoS hardware offload · d03b195b
      Maxim Mikityanskiy 提交于
      HTB doesn't scale well because of contention on a single lock, and it
      also consumes CPU. This patch adds support for offloading HTB to
      hardware that supports hierarchical rate limiting.
      
      In the offload mode, HTB passes control commands to the driver using
      ndo_setup_tc. The driver has to replicate the whole hierarchy of classes
      and their settings (rate, ceil) in the NIC. Every modification of the
      HTB tree caused by the admin results in ndo_setup_tc being called.
      
      After this setup, the HTB algorithm is done completely in the NIC. An SQ
      (send queue) is created for every leaf class and attached to the
      hierarchy, so that the NIC can calculate and obey aggregated rate
      limits, too. In the future, it can be changed, so that multiple SQs will
      back a single leaf class.
      
      ndo_select_queue is responsible for selecting the right queue that
      serves the traffic class of each packet.
      
      The data path works as follows: a packet is classified by clsact, the
      driver selects a hardware queue according to its class, and the packet
      is enqueued into this queue's qdisc.
      
      This solution addresses two main problems of scaling HTB:
      
      1. Contention by flow classification. Currently the filters are attached
      to the HTB instance as follows:
      
          # tc filter add dev eth0 parent 1:0 protocol ip flower dst_port 80
          classid 1:10
      
      It's possible to move classification to clsact egress hook, which is
      thread-safe and lock-free:
      
          # tc filter add dev eth0 egress protocol ip flower dst_port 80
          action skbedit priority 1:10
      
      This way classification still happens in software, but the lock
      contention is eliminated, and it happens before selecting the TX queue,
      allowing the driver to translate the class to the corresponding hardware
      queue in ndo_select_queue.
      
      Note that this is already compatible with non-offloaded HTB and doesn't
      require changes to the kernel nor iproute2.
      
      2. Contention by handling packets. HTB is not multi-queue, it attaches
      to a whole net device, and handling of all packets takes the same lock.
      When HTB is offloaded, it registers itself as a multi-queue qdisc,
      similarly to mq: HTB is attached to the netdev, and each queue has its
      own qdisc.
      
      Some features of HTB may be not supported by some particular hardware,
      for example, the maximum number of classes may be limited, the
      granularity of rate and ceil parameters may be different, etc. - so, the
      offload is not enabled by default, a new parameter is used to enable it:
      
          # tc qdisc replace dev eth0 root handle 1: htb offload
      Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      d03b195b
  2. 20 1月, 2021 2 次提交
  3. 16 1月, 2021 12 次提交
    • P
      GTP: add support for flow based tunneling API · 9ab7e76a
      Pravin B Shelar 提交于
      Following patch add support for flow based tunneling API
      to send and recv GTP tunnel packet over tunnel metadata API.
      This would allow this device integration with OVS or eBPF using
      flow based tunneling APIs.
      Signed-off-by: NPravin B Shelar <pbshelar@fb.com>
      Link: https://lore.kernel.org/r/20210110070021.26822-1-pbshelar@fb.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      9ab7e76a
    • A
      perf inject: Correct event attribute sizes · 648b054a
      Al Grant 提交于
      When 'perf inject' reads a perf.data file from an older version of perf,
      it writes event attributes into the output with the original size field,
      but lays them out as if they had the size currently used. Readers see a
      corrupt file. Update the size field to match the layout.
      Signed-off-by: NAl Grant <al.grant@foss.arm.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20201124195818.30603-1-al.grant@arm.comSigned-off-by: NDenis Nikitin <denik@chromium.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      648b054a
    • A
      perf intel-pt: Fix 'CPU too large' error · 5501e922
      Adrian Hunter 提交于
      In some cases, the number of cpus (nr_cpus_online) is confused with the
      maximum cpu number (nr_cpus_avail), which results in the error in the
      example below:
      
      Example on system with 8 cpus:
      
       Before:
         # echo 0 > /sys/devices/system/cpu/cpu2/online
         # ./perf record --kcore -e intel_pt// taskset --cpu-list 7 uname
         Linux
         [ perf record: Woken up 1 times to write data ]
         [ perf record: Captured and wrote 0.147 MB perf.data ]
         # ./perf script --itrace=e
         Requested CPU 7 too large. Consider raising MAX_NR_CPUS
         0x25908 [0x8]: failed to process type: 68 [Invalid argument]
      
       After:
         # ./perf script --itrace=e
         #
      
      Fixes: 8c727469 ("perf machine: Replace MAX_NR_CPUS with perf_env::nr_cpus_online")
      Fixes: 7df4e36a ("perf session: Replace MAX_NR_CPUS with perf_env::nr_cpus_online")
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Tested-by: NKan Liang <kan.liang@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: stable@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20210107174159.24897-1-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5501e922
    • N
      perf stat: Take cgroups into account for shadow stats · a1bf2305
      Namhyung Kim 提交于
      As of now it doesn't consider cgroups when collecting shadow stats and
      metrics so counter values from different cgroups will be saved in a same
      slot.  This resulted in incorrect numbers when those cgroups have
      different workloads.
      
      For example, let's look at the scenario below: cgroups A and C runs same
      workload which burns a cpu while cgroup B runs a light workload.
      
        $ perf stat -a -e cycles,instructions --for-each-cgroup A,B,C  sleep 1
      
         Performance counter stats for 'system wide':
      
           3,958,116,522      cycles                A
           6,722,650,929      instructions          A #    2.53  insn per cycle
               1,132,741      cycles                B
                 571,743      instructions          B #    0.00  insn per cycle
           4,007,799,935      cycles                C
           6,793,181,523      instructions          C #    2.56  insn per cycle
      
             1.001050869 seconds time elapsed
      
      When I run 'perf stat' with single workload, it usually shows IPC around
      1.7.  We can verify it (6,722,650,929.0 / 3,958,116,522 = 1.698) for cgroup A.
      
      But in this case, since cgroups are ignored, cycles are averaged so it
      used the lower value for IPC calculation and resulted in around 2.5.
      
        avg cycle: (3958116522 + 1132741 + 4007799935) / 3 = 2655683066
        IPC (A)  :  6722650929 / 2655683066 = 2.531
        IPC (B)  :      571743 / 2655683066 = 0.0002
        IPC (C)  :  6793181523 / 2655683066 = 2.557
      
      We can simply compare cgroup pointers in the evsel and it'll be NULL
      when cgroups are not specified.  With this patch, I can see correct
      numbers like below:
      
        $ perf stat -a -e cycles,instructions --for-each-cgroup A,B,C  sleep 1
      
        Performance counter stats for 'system wide':
      
           4,171,051,687      cycles                A
           7,219,793,922      instructions          A #    1.73  insn per cycle
               1,051,189      cycles                B
                 583,102      instructions          B #    0.55  insn per cycle
           4,171,124,710      cycles                C
           7,192,944,580      instructions          C #    1.72  insn per cycle
      
             1.007909814 seconds time elapsed
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20210115071139.257042-2-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a1bf2305
    • N
      perf stat: Introduce struct runtime_stat_data · 3ff1e718
      Namhyung Kim 提交于
      To pass more info to the saved_value in the runtime_stat, add a new
      struct runtime_stat_data.  Currently it only has 'ctx' field but later
      patch will add more.
      
      Note that we intentionally pass 0 as ctx to clock-related events for
      compatibility.  It was already there in a few places.  So move the code
      into the saved_value_lookup() explicitly and add a comment.
      Suggested-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20210115071139.257042-1-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3ff1e718
    • I
      libperf tests: Fail when failing to get a tracepoint id · 66dd86b2
      Ian Rogers 提交于
      Permissions are necessary to get a tracepoint id. Fail the test when the
      read fails.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20210114180250.3853825-2-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      66dd86b2
    • I
      libperf tests: If a test fails return non-zero · bba2ea17
      Ian Rogers 提交于
      If a test fails return -1 rather than 0. This is consistent with the
      return value in test-cpumap.c
      Signed-off-by: NIan Rogers <irogers@google.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20210114180250.3853825-1-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bba2ea17
    • I
      libperf tests: Avoid uninitialized variable warning · be82fddc
      Ian Rogers 提交于
      The variable 'bf' is read (for a write call) without being initialized
      triggering a memory sanitizer warning. Use 'bf' in the read and switch
      the write to reading from a string.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20210114212304.4018119-1-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      be82fddc
    • N
      perf test: Fix shadow stat test for non-bash shells · a042a82d
      Namhyung Kim 提交于
      It was using some bash-specific features and failed to parse when
      running with a different shell like below:
      
        root@kbl-ppc:~/kbl-ws/perf-dev/lck-9077/acme.tmp/tools/perf# ./perf test 83 -vv
        83: perf stat metrics (shadow stat) test                            :
        --- start ---
        test child forked, pid 3922
        ./tests/shell/stat+shadow_stat.sh: 19: ./tests/shell/stat+shadow_stat.sh: [[: not found
        ./tests/shell/stat+shadow_stat.sh: 24: ./tests/shell/stat+shadow_stat.sh: [[: not found
        ./tests/shell/stat+shadow_stat.sh: 30: ./tests/shell/stat+shadow_stat.sh: [[: not found
        (standard_in) 2: syntax error
        ./tests/shell/stat+shadow_stat.sh: 36: ./tests/shell/stat+shadow_stat.sh: [[: not found
        ./tests/shell/stat+shadow_stat.sh: 19: ./tests/shell/stat+shadow_stat.sh: [[: not found
        ./tests/shell/stat+shadow_stat.sh: 24: ./tests/shell/stat+shadow_stat.sh: [[: not found
        ./tests/shell/stat+shadow_stat.sh: 30: ./tests/shell/stat+shadow_stat.sh: [[: not found
        (standard_in) 2: syntax error
        ./tests/shell/stat+shadow_stat.sh: 36: ./tests/shell/stat+shadow_stat.sh: [[: not found
        ./tests/shell/stat+shadow_stat.sh: 45: ./tests/shell/stat+shadow_stat.sh: declare: not found
        test child finished with -1
        ---- end ----
        perf stat metrics (shadow stat) test: FAILED!
      Reported-by: NJin Yao <yao.jin@linux.intel.com>
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Laight <david.laight@aculab.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20210114050609.1258820-1-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a042a82d
    • A
      tools headers: Syncronize linux/build_bug.h with the kernel sources · addbdff2
      Arnaldo Carvalho de Melo 提交于
      To pick up the changes in:
      
        3a176b94 ("Revert "kbuild: avoid static_assert for genksyms"")
      
      And silence this perf build warning:
      
        Warning: Kernel ABI header at 'tools/include/linux/build_bug.h' differs from latest version at 'include/linux/build_bug.h'
        diff -u tools/include/linux/build_bug.h include/linux/build_bug.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Masahiro Yamada <masahiroy@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      addbdff2
    • A
      tools headers UAPI: Sync kvm.h headers with the kernel sources · 38c53947
      Arnaldo Carvalho de Melo 提交于
      To pick the changes in:
      
        647daca2 ("KVM: SVM: Add support for booting APs in an SEV-ES guest")
      
      That don't cause any tooling change, just silences this perf build
      warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
        diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      38c53947
    • A
      perf bpf examples: Fix bpf.h header include directive in 5sec.c example · 301f0203
      Arnaldo Carvalho de Melo 提交于
      It was looking at bpf/bpf.h, which caused this problem:
      
        # perf trace -e tools/perf/examples/bpf/5sec.c
        /home/acme/git/perf/tools/perf/examples/bpf/5sec.c:42:10: fatal error: 'bpf/bpf.h' file not found
        #include <bpf/bpf.h>
                 ^~~~~~~~~~~
        1 error generated.
        ERROR:	unable to compile tools/perf/examples/bpf/5sec.c
        Hint:	Check error message shown above.
        Hint:	You can also pre-compile it into .o using:
             		clang -target bpf -O2 -c tools/perf/examples/bpf/5sec.c
             	with proper -I and -D options.
        event syntax error: 'tools/perf/examples/bpf/5sec.c'
                             \___ Failed to load tools/perf/examples/bpf/5sec.c from source: Error when compiling BPF scriptlet
        #
      
      Change that to plain bpf.h, to make it work again:
      
        # perf trace -e tools/perf/examples/bpf/5sec.c sleep 5s
             0.000 perf_bpf_probe:hrtimer_nanosleep(__probe_ip: -1776891872, rqtp: 5000000000)
        # perf trace -e tools/perf/examples/bpf/5sec.c/max-stack=16/ sleep 5s
             0.000 perf_bpf_probe:hrtimer_nanosleep(__probe_ip: -1776891872, rqtp: 5000000000)
                                               hrtimer_nanosleep ([kernel.kallsyms])
                                               common_nsleep ([kernel.kallsyms])
                                               __x64_sys_clock_nanosleep ([kernel.kallsyms])
                                               do_syscall_64 ([kernel.kallsyms])
                                               entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
                                               __clock_nanosleep_2 (/usr/lib64/libc-2.32.so)
        # perf trace -e tools/perf/examples/bpf/5sec.c sleep 4s
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      301f0203
  4. 15 1月, 2021 19 次提交
  5. 14 1月, 2021 5 次提交