1. 18 12月, 2020 1 次提交
  2. 03 11月, 2020 6 次提交
    • A
      tools include UAPI: Update linux/mount.h copy · 42cc0e70
      Arnaldo Carvalho de Melo 提交于
      To pick the changes from:
      
        dab741e0 ("Add a "nosymfollow" mount option.")
      
      That ends up adding support for the new MS_NOSYMFOLLOW mount flag:
      
        $ tools/perf/trace/beauty/mount_flags.sh > before
        $ cp include/uapi/linux/mount.h tools/include/uapi/linux/mount.h
        $ tools/perf/trace/beauty/mount_flags.sh > after
        $ diff -u before after
        --- before	2020-11-03 08:51:28.117997454 -0300
        +++ after	2020-11-03 08:51:38.992218869 -0300
        @@ -7,6 +7,7 @@
         	[32 ? (ilog2(32) + 1) : 0] = "REMOUNT",
         	[64 ? (ilog2(64) + 1) : 0] = "MANDLOCK",
         	[128 ? (ilog2(128) + 1) : 0] = "DIRSYNC",
        +	[256 ? (ilog2(256) + 1) : 0] = "NOSYMFOLLOW",
         	[1024 ? (ilog2(1024) + 1) : 0] = "NOATIME",
         	[2048 ? (ilog2(2048) + 1) : 0] = "NODIRATIME",
         	[4096 ? (ilog2(4096) + 1) : 0] = "BIND",
        $
      
      So now one can use it in --filter expressions for tracepoints.
      
      This silences this perf build warnings:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/mount.h' differs from latest version at 'include/uapi/linux/mount.h'
        diff -u tools/include/uapi/linux/mount.h include/uapi/linux/mount.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mattias Nissler <mnissler@chromium.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      42cc0e70
    • A
      tools headers UAPI: Update tools's copy of linux/perf_event.h · a9e27f5f
      Arnaldo Carvalho de Melo 提交于
      The diff is just tabs versus spaces, trivial.
      
      This silences this perf tools build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/perf_event.h' differs from latest version at 'include/uapi/linux/perf_event.h'
        diff -u tools/include/uapi/linux/perf_event.h include/uapi/linux/perf_event.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a9e27f5f
    • A
      tools kvm headers: Update KVM headers from the kernel sources · aa04899a
      Arnaldo Carvalho de Melo 提交于
      Some should cause changes in tooling, like the one adding LAST_EXCP, but
      the way it is structured end up not making that happen.
      
      The new SVM_EXIT_INVPCID should get used by arch/x86/util/kvm-stat.c,
      in the svm_exit_reasons table.
      
      The tools/perf/trace/beauty part has scripts to catch changes and
      automagically create tables, like tools/perf/trace/beauty/kvm_ioctl.sh,
      but changes are needed to make tools/perf/arch/x86/util/kvm-stat.c catch
      those automatically.
      
      These were handled by the existing scripts:
      
        $ tools/perf/trace/beauty/kvm_ioctl.sh > before
        $ cp include/uapi/linux/kvm.h tools/include/uapi/linux/kvm.h
        $ tools/perf/trace/beauty/kvm_ioctl.sh > after
        $ diff -u before after
        --- before	2020-11-03 08:43:52.910728608 -0300
        +++ after	2020-11-03 08:44:04.273959984 -0300
        @@ -89,6 +89,7 @@
         	[0xbf] = "SET_NESTED_STATE",
         	[0xc0] = "CLEAR_DIRTY_LOG",
         	[0xc1] = "GET_SUPPORTED_HV_CPUID",
        +	[0xc6] = "X86_SET_MSR_FILTER",
         	[0xe0] = "CREATE_DEVICE",
         	[0xe1] = "SET_DEVICE_ATTR",
         	[0xe2] = "GET_DEVICE_ATTR",
        $
        $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > before
        $ cp include/uapi/linux/vhost.h tools/include/uapi/linux/vhost.h
        $
        $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > after
        $ diff -u before after
        --- before	2020-11-03 08:45:55.522225198 -0300
        +++ after	2020-11-03 08:46:12.881578666 -0300
        @@ -37,4 +37,5 @@
         	[0x71] = "VDPA_GET_STATUS",
         	[0x73] = "VDPA_GET_CONFIG",
         	[0x76] = "VDPA_GET_VRING_NUM",
        +	[0x78] = "VDPA_GET_IOVA_RANGE",
         };
        $
      
      This addresses these perf build warnings:
      
        Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h'
        diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h
        Warning: Kernel ABI header at 'tools/arch/s390/include/uapi/asm/sie.h' differs from latest version at 'arch/s390/include/uapi/asm/sie.h'
        diff -u tools/arch/s390/include/uapi/asm/sie.h arch/s390/include/uapi/asm/sie.h
        Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/kvm.h' differs from latest version at 'arch/x86/include/uapi/asm/kvm.h'
        diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h
        Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/svm.h' differs from latest version at 'arch/x86/include/uapi/asm/svm.h'
        diff -u tools/arch/x86/include/uapi/asm/svm.h arch/x86/include/uapi/asm/svm.h
        Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
        diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
        Warning: Kernel ABI header at 'tools/include/uapi/linux/vhost.h' differs from latest version at 'include/uapi/linux/vhost.h'
        diff -u tools/include/uapi/linux/vhost.h include/uapi/linux/vhost.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      aa04899a
    • A
      tools UAPI: Update copy of linux/mman.h from the kernel sources · 97a3863b
      Arnaldo Carvalho de Melo 提交于
        e47168f3 ("powerpc/8xx: Support 16k hugepages with 4k pages")
      
      That don't cause any changes in tooling, just addresses this perf build
      warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/mman.h' differs from latest version at 'include/uapi/linux/mman.h'
        diff -u tools/include/uapi/linux/mman.h include/uapi/linux/mman.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      97a3863b
    • A
      tools headers UAPI: Update fscrypt.h copy · d0448d6a
      Arnaldo Carvalho de Melo 提交于
      To get the changes from:
      
        c7f0207b ("fscrypt: make "#define fscrypt_policy" user-only")
      
      That don't cause any changes in tools/perf, only addresses this perf
      tools build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/fscrypt.h' differs from latest version at 'include/uapi/linux/fscrypt.h'
        diff -u tools/include/uapi/linux/fscrypt.h include/uapi/linux/fscrypt.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d0448d6a
    • A
      tools headers UAPI: Sync prctl.h with the kernel sources · ad6330ac
      Arnaldo Carvalho de Melo 提交于
      To get the changes in:
      
        1c101da8 ("arm64: mte: Allow user control of the tag check mode via prctl()")
        af5ce952 ("arm64: mte: Allow user control of the generated random tags via prctl()")
      
      Which don't cause any change in tooling, only addresses this perf build
      warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/prctl.h' differs from latest version at 'include/uapi/linux/prctl.h'
        diff -u tools/include/uapi/linux/prctl.h include/uapi/linux/prctl.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ad6330ac
  3. 22 10月, 2020 1 次提交
  4. 12 10月, 2020 3 次提交
    • D
      bpf: Allow for map-in-map with dynamic inner array map entries · 4a8f87e6
      Daniel Borkmann 提交于
      Recent work in f4d05259 ("bpf: Add map_meta_equal map ops") and 134fede4
      ("bpf: Relax max_entries check for most of the inner map types") added support
      for dynamic inner max elements for most map-in-map types. Exceptions were maps
      like array or prog array where the map_gen_lookup() callback uses the maps'
      max_entries field as a constant when emitting instructions.
      
      We recently implemented Maglev consistent hashing into Cilium's load balancer
      which uses map-in-map with an outer map being hash and inner being array holding
      the Maglev backend table for each service. This has been designed this way in
      order to reduce overall memory consumption given the outer hash map allows to
      avoid preallocating a large, flat memory area for all services. Also, the
      number of service mappings is not always known a-priori.
      
      The use case for dynamic inner array map entries is to further reduce memory
      overhead, for example, some services might just have a small number of back
      ends while others could have a large number. Right now the Maglev backend table
      for small and large number of backends would need to have the same inner array
      map entries which adds a lot of unneeded overhead.
      
      Dynamic inner array map entries can be realized by avoiding the inlined code
      generation for their lookup. The lookup will still be efficient since it will
      be calling into array_map_lookup_elem() directly and thus avoiding retpoline.
      The patch adds a BPF_F_INNER_MAP flag to map creation which therefore skips
      inline code generation and relaxes array_map_meta_equal() check to ignore both
      maps' max_entries. This also still allows to have faster lookups for map-in-map
      when BPF_F_INNER_MAP is not specified and hence dynamic max_entries not needed.
      
      Example code generation where inner map is dynamic sized array:
      
        # bpftool p d x i 125
        int handle__sys_enter(void * ctx):
        ; int handle__sys_enter(void *ctx)
           0: (b4) w1 = 0
        ; int key = 0;
           1: (63) *(u32 *)(r10 -4) = r1
           2: (bf) r2 = r10
        ;
           3: (07) r2 += -4
        ; inner_map = bpf_map_lookup_elem(&outer_arr_dyn, &key);
           4: (18) r1 = map[id:468]
           6: (07) r1 += 272
           7: (61) r0 = *(u32 *)(r2 +0)
           8: (35) if r0 >= 0x3 goto pc+5
           9: (67) r0 <<= 3
          10: (0f) r0 += r1
          11: (79) r0 = *(u64 *)(r0 +0)
          12: (15) if r0 == 0x0 goto pc+1
          13: (05) goto pc+1
          14: (b7) r0 = 0
          15: (b4) w6 = -1
        ; if (!inner_map)
          16: (15) if r0 == 0x0 goto pc+6
          17: (bf) r2 = r10
        ;
          18: (07) r2 += -4
        ; val = bpf_map_lookup_elem(inner_map, &key);
          19: (bf) r1 = r0                               | No inlining but instead
          20: (85) call array_map_lookup_elem#149280     | call to array_map_lookup_elem()
        ; return val ? *val : -1;                        | for inner array lookup.
          21: (15) if r0 == 0x0 goto pc+1
        ; return val ? *val : -1;
          22: (61) r6 = *(u32 *)(r0 +0)
        ; }
          23: (bc) w0 = w6
          24: (95) exit
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20201010234006.7075-4-daniel@iogearbox.net
      4a8f87e6
    • D
      bpf: Add redirect_peer helper · 9aa1206e
      Daniel Borkmann 提交于
      Add an efficient ingress to ingress netns switch that can be used out of tc BPF
      programs in order to redirect traffic from host ns ingress into a container
      veth device ingress without having to go via CPU backlog queue [0]. For local
      containers this can also be utilized and path via CPU backlog queue only needs
      to be taken once, not twice. On a high level this borrows from ipvlan which does
      similar switch in __netif_receive_skb_core() and then iterates via another_round.
      This helps to reduce latency for mentioned use cases.
      
      Pod to remote pod with redirect(), TCP_RR [1]:
      
        # percpu_netperf 10.217.1.33
                RT_LATENCY:         122.450         (per CPU:         122.666         122.401         122.333         122.401 )
              MEAN_LATENCY:         121.210         (per CPU:         121.100         121.260         121.320         121.160 )
            STDDEV_LATENCY:         120.040         (per CPU:         119.420         119.910         125.460         115.370 )
               MIN_LATENCY:          46.500         (per CPU:          47.000          47.000          47.000          45.000 )
               P50_LATENCY:         118.500         (per CPU:         118.000         119.000         118.000         119.000 )
               P90_LATENCY:         127.500         (per CPU:         127.000         128.000         127.000         128.000 )
               P99_LATENCY:         130.750         (per CPU:         131.000         131.000         129.000         132.000 )
      
          TRANSACTION_RATE:       32666.400         (per CPU:        8152.200        8169.842        8174.439        8169.897 )
      
      Pod to remote pod with redirect_peer(), TCP_RR:
      
        # percpu_netperf 10.217.1.33
                RT_LATENCY:          44.449         (per CPU:          43.767          43.127          45.279          45.622 )
              MEAN_LATENCY:          45.065         (per CPU:          44.030          45.530          45.190          45.510 )
            STDDEV_LATENCY:          84.823         (per CPU:          66.770          97.290          84.380          90.850 )
               MIN_LATENCY:          33.500         (per CPU:          33.000          33.000          34.000          34.000 )
               P50_LATENCY:          43.250         (per CPU:          43.000          43.000          43.000          44.000 )
               P90_LATENCY:          46.750         (per CPU:          46.000          47.000          47.000          47.000 )
               P99_LATENCY:          52.750         (per CPU:          51.000          54.000          53.000          53.000 )
      
          TRANSACTION_RATE:       90039.500         (per CPU:       22848.186       23187.089       22085.077       21919.130 )
      
        [0] https://linuxplumbersconf.org/event/7/contributions/674/attachments/568/1002/plumbers_2020_cilium_load_balancer.pdf
        [1] https://github.com/borkmann/netperf_scripts/blob/master/percpu_netperfSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20201010234006.7075-3-daniel@iogearbox.net
      9aa1206e
    • D
      bpf: Improve bpf_redirect_neigh helper description · dd2ce6a5
      Daniel Borkmann 提交于
      Follow-up to address David's feedback that we should better describe internals
      of the bpf_redirect_neigh() helper.
      Suggested-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Link: https://lore.kernel.org/bpf/20201010234006.7075-2-daniel@iogearbox.net
      dd2ce6a5
  5. 09 10月, 2020 1 次提交
  6. 08 10月, 2020 1 次提交
  7. 03 10月, 2020 3 次提交
  8. 01 10月, 2020 3 次提交
  9. 30 9月, 2020 1 次提交
  10. 29 9月, 2020 3 次提交
  11. 26 9月, 2020 4 次提交
  12. 16 9月, 2020 1 次提交
  13. 15 9月, 2020 2 次提交
    • A
      tools headers UAPI: update linux/in.h copy · 2fa3fc95
      Arnaldo Carvalho de Melo 提交于
      To get the changes from:
      
        645f0897 ("net: Fix some comments")
      
      That don't cause any changes in tooling, its just a typo fix.
      
      This silences this tools/perf build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/in.h' differs from latest version at 'include/uapi/linux/in.h'
        diff -u tools/include/uapi/linux/in.h include/uapi/linux/in.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2fa3fc95
    • A
      tools headers UAPI: Sync kvm.h headers with the kernel sources · 8d761d2c
      Arnaldo Carvalho de Melo 提交于
      To pick the changes in:
      
        15e9e35c ("KVM: MIPS: Change the definition of kvm type")
        004a0124 ("arm64/x86: KVM: Introduce steal-time cap")
      
      That do not result in any change in tooling, as the additions are not
      being used in any table generator.
      
      This silences these perf build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
        diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andrew Jones <drjones@redhat.com>
      Cc: Huacai Chen <chenhc@lemote.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8d761d2c
  14. 11 9月, 2020 1 次提交
  15. 07 9月, 2020 1 次提交
  16. 01 9月, 2020 1 次提交
  17. 29 8月, 2020 2 次提交
    • A
      bpf: Add bpf_copy_from_user() helper. · 07be4c4a
      Alexei Starovoitov 提交于
      Sleepable BPF programs can now use copy_from_user() to access user memory.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NKP Singh <kpsingh@google.com>
      Link: https://lore.kernel.org/bpf/20200827220114.69225-4-alexei.starovoitov@gmail.com
      07be4c4a
    • A
      bpf: Introduce sleepable BPF programs · 1e6c62a8
      Alexei Starovoitov 提交于
      Introduce sleepable BPF programs that can request such property for themselves
      via BPF_F_SLEEPABLE flag at program load time. In such case they will be able
      to use helpers like bpf_copy_from_user() that might sleep. At present only
      fentry/fexit/fmod_ret and lsm programs can request to be sleepable and only
      when they are attached to kernel functions that are known to allow sleeping.
      
      The non-sleepable programs are relying on implicit rcu_read_lock() and
      migrate_disable() to protect life time of programs, maps that they use and
      per-cpu kernel structures used to pass info between bpf programs and the
      kernel. The sleepable programs cannot be enclosed into rcu_read_lock().
      migrate_disable() maps to preempt_disable() in non-RT kernels, so the progs
      should not be enclosed in migrate_disable() as well. Therefore
      rcu_read_lock_trace is used to protect the life time of sleepable progs.
      
      There are many networking and tracing program types. In many cases the
      'struct bpf_prog *' pointer itself is rcu protected within some other kernel
      data structure and the kernel code is using rcu_dereference() to load that
      program pointer and call BPF_PROG_RUN() on it. All these cases are not touched.
      Instead sleepable bpf programs are allowed with bpf trampoline only. The
      program pointers are hard-coded into generated assembly of bpf trampoline and
      synchronize_rcu_tasks_trace() is used to protect the life time of the program.
      The same trampoline can hold both sleepable and non-sleepable progs.
      
      When rcu_read_lock_trace is held it means that some sleepable bpf program is
      running from bpf trampoline. Those programs can use bpf arrays and preallocated
      hash/lru maps. These map types are waiting on programs to complete via
      synchronize_rcu_tasks_trace();
      
      Updates to trampoline now has to do synchronize_rcu_tasks_trace() and
      synchronize_rcu_tasks() to wait for sleepable progs to finish and for
      trampoline assembly to finish.
      
      This is the first step of introducing sleepable progs. Eventually dynamically
      allocated hash maps can be allowed and networking program types can become
      sleepable too.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NKP Singh <kpsingh@google.com>
      Link: https://lore.kernel.org/bpf/20200827220114.69225-3-alexei.starovoitov@gmail.com
      1e6c62a8
  18. 28 8月, 2020 1 次提交
  19. 26 8月, 2020 4 次提交