1. 02 2月, 2019 20 次提交
    • D
      Merge branch 'bpf-xdp-sample-libbpf' · 473c5daa
      Daniel Borkmann 提交于
      Maciej Fijalkowski says:
      
      ====================
      This patchset tries to address the situation where:
      * user loads a particular xdp sample application that does stats polling
      * user loads another sample application on the same interface
      * then, user sends SIGINT/SIGTERM to the app that was attached as a first one
      * second application ends up with an unloaded xdp program
      
      1st patch contains a helper libbpf function for getting the map fd by a
      given map name.
      In patch 2 Jesper removes the read_trace_pipe usage from xdp_redirect_cpu which
      was a blocker for converting this sample to libbpf usage.
      3rd patch updates a bunch of xdp samples to make the use of libbpf.
      Patch 4 adjusts RLIMIT_MEMLOCK for two samples touched in this patchset.
      In patch 5 extack messages are added for cases where dev_change_xdp_fd returns
      with an error so user has an idea what was the reason for not attaching the
      xdp program onto interface.
      Patch 6 makes the samples behavior similar to what iproute2 does when loading
      xdp prog - the "force" flag is introduced.
      Patch 7 introduces the libbpf function that will query the driver from
      userspace about the currently attached xdp prog id.
      
      Use it in samples that do polling by checking the prog id in signal handler
      and comparing it with previously stored one which is the scope of patch 8.
      
      Thanks!
      
      v1->v2:
      * add a libbpf helper for getting a prog via relative index
      * include xdp_redirect_cpu into conversion
      
      v2->v3: mostly addressing Daniel's/Jesper's comments
      * get rid of the helper from v1->v2
      * feed the xdp_redirect_cpu with program name instead of number
      
      v3->v4:
      * fix help message in xdp_sample_pkts
      
      v4->v5:
      * in get_link_xdp_fd, assign prog_id only when libbpf_nl_get_link returned
        with 0
      * add extack messages in dev_change_xdp_fd
      * check the return value of bpf_get_link_xdp_id when exiting from sample progs
      
      v5->v6:
      * rebase
      ====================
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      473c5daa
    • M
      samples/bpf: Check the prog id before exiting · 3b7a8ec2
      Maciej Fijalkowski 提交于
      Check the program id within the signal handler on polling xdp samples
      that were previously converted to libbpf usage. Avoid the situation of
      unloading the program that was not attached by sample that is exiting.
      Handle also the case where bpf_get_link_xdp_id didn't exit with an error
      but the xdp program was not found on an interface.
      Reported-by: NMichal Papaj <michal.papaj@intel.com>
      Reported-by: NJakub Spizewski <jakub.spizewski@intel.com>
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      3b7a8ec2
    • M
      libbpf: Add a support for getting xdp prog id on ifindex · 50db9f07
      Maciej Fijalkowski 提交于
      Since we have a dedicated netlink attributes for xdp setup on a
      particular interface, it is now possible to retrieve the program id that
      is currently attached to the interface. The use case is targeted for
      sample xdp programs, which will store the program id just after loading
      bpf program onto iface. On shutdown, the sample will make sure that it
      can unload the program by querying again the iface and verifying that
      both program id's matches.
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      50db9f07
    • M
      samples/bpf: Add a "force" flag to XDP samples · 743e568c
      Maciej Fijalkowski 提交于
      Make xdp samples consistent with iproute2 behavior and set the
      XDP_FLAGS_UPDATE_IF_NOEXIST by default when setting the xdp program on
      interface. Provide an option for user to force the program loading,
      which as a result will not include the mentioned flag in
      bpf_set_link_xdp_fd call.
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      743e568c
    • M
      xdp: Provide extack messages when prog attachment failed · 01dde20c
      Maciej Fijalkowski 提交于
      In order to provide more meaningful messages to user when the process of
      loading xdp program onto network interface failed, let's add extack
      messages within dev_change_xdp_fd.
      Suggested-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      01dde20c
    • M
      samples/bpf: Extend RLIMIT_MEMLOCK for xdp_{sample_pkts, router_ipv4} · 6a545761
      Maciej Fijalkowski 提交于
      There is a common problem with xdp samples that happens when user wants
      to run a particular sample and some bpf program is already loaded. The
      default 64kb RLIMIT_MEMLOCK resource limit will cause a following error
      (assuming that xdp sample that is failing was converted to libbpf
      usage):
      
      libbpf: Error in bpf_object__probe_name():Operation not permitted(1).
      Couldn't load basic 'r0 = 0' BPF program.
      libbpf: failed to load object './xdp_sample_pkts_kern.o'
      
      Fix it in xdp_sample_pkts and xdp_router_ipv4 by setting RLIMIT_MEMLOCK
      to RLIM_INFINITY.
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      6a545761
    • M
      samples/bpf: Convert XDP samples to libbpf usage · bbaf6029
      Maciej Fijalkowski 提交于
      Some of XDP samples that are attaching the bpf program to the interface
      via libbpf's bpf_set_link_xdp_fd are still using the bpf_load.c for
      loading and manipulating the ebpf program and maps. Convert them to do
      this through libbpf usage and remove bpf_load from the picture.
      
      While at it remove what looks like debug leftover in
      xdp_redirect_map_user.c
      
      In xdp_redirect_cpu, change the way that the program to be loaded onto
      interface is chosen - user now needs to pass the program's section name
      instead of the relative number. In case of typo print out the section
      names to choose from.
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      bbaf6029
    • J
      samples/bpf: xdp_redirect_cpu have not need for read_trace_pipe · 7313798b
      Jesper Dangaard Brouer 提交于
      The sample xdp_redirect_cpu is not using helper bpf_trace_printk.
      Thus it makes no sense that the --debug option us reading
      from /sys/kernel/debug/tracing/trace_pipe via read_trace_pipe.
      Simply remove it.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      7313798b
    • M
      libbpf: Add a helper for retrieving a map fd for a given name · f3cea32d
      Maciej Fijalkowski 提交于
      XDP samples are mostly cooperating with eBPF maps through their file
      descriptors. In case of a eBPF program that contains multiple maps it
      might be tiresome to iterate through them and call bpf_map__fd for each
      one. Add a helper mostly based on bpf_object__find_map_by_name, but
      instead of returning the struct bpf_map pointer, return map fd.
      Suggested-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      f3cea32d
    • S
      bpf: powerpc64: add JIT support for bpf line info · 6f20c71d
      Sandipan Das 提交于
      This adds support for generating bpf line info for
      JITed programs.
      Signed-off-by: NSandipan Das <sandipan@linux.ibm.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      6f20c71d
    • D
      Merge branch 'bpf-spinlocks' · 2863debf
      Daniel Borkmann 提交于
      Alexei Starovoitov says:
      
      ====================
      Many algorithms need to read and modify several variables atomically.
      Until now it was hard to impossible to implement such algorithms in BPF.
      Hence introduce support for bpf_spin_lock.
      
      The api consists of 'struct bpf_spin_lock' that should be placed
      inside hash/array/cgroup_local_storage element
      and bpf_spin_lock/unlock() helper function.
      
      Example:
      struct hash_elem {
          int cnt;
          struct bpf_spin_lock lock;
      };
      struct hash_elem * val = bpf_map_lookup_elem(&hash_map, &key);
      if (val) {
          bpf_spin_lock(&val->lock);
          val->cnt++;
          bpf_spin_unlock(&val->lock);
      }
      
      and BPF_F_LOCK flag for lookup/update bpf syscall commands that
      allows user space to read/write map elements under lock.
      
      Together these primitives allow race free access to map elements
      from bpf programs and from user space.
      
      Key restriction: root only.
      Key requirement: maps must be annotated with BTF.
      
      This concept was discussed at Linux Plumbers Conference 2018.
      Thank you everyone who participated and helped to iron out details
      of api and implementation.
      
      Patch 1: bpf_spin_lock support in the verifier, BTF, hash, array.
      Patch 2: bpf_spin_lock in cgroup local storage.
      Patches 3,4,5: tests
      Patch 6: BPF_F_LOCK flag to lookup/update
      Patches 7,8,9: tests
      
      v6->v7:
      - fixed this_cpu->__this_cpu per Peter's suggestion and added Ack.
      - simplified bpf_spin_lock and load/store overlap check in the verifier
        as suggested by Andrii
      - rebase
      
      v5->v6:
      - adopted arch_spinlock approach suggested by Peter
      - switched to spin_lock_irqsave equivalent as the simplest way
        to avoid deadlocks in rare case of nested networking progs
        (cgroup-bpf prog in preempt_disable vs clsbpf in softirq sharing
        the same map with bpf_spin_lock)
        bpf_spin_lock is only allowed in networking progs that don't
        have arbitrary entry points unlike tracing progs.
      - rebase and split test_verifier tests
      
      v4->v5:
      - disallow bpf_spin_lock for tracing progs due to insufficient preemption checks
      - socket filter progs cannot use bpf_spin_lock due to missing preempt_disable
      - fix atomic_set_release. Spotted by Peter.
      - fixed hash_of_maps
      
      v3->v4:
      - fix BPF_EXIST | BPF_NOEXIST check patch 6. Spotted by Jakub. Thanks!
      - rebase
      
      v2->v3:
      - fixed build on ia64 and archs where qspinlock is not supported
      - fixed missing lock init during lookup w/o BPF_F_LOCK. Spotted by Martin
      
      v1->v2:
      - addressed several issues spotted by Daniel and Martin in patch 1
      - added test11 to patch 4 as suggested by Daniel
      ====================
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      2863debf
    • A
      selftests/bpf: test for BPF_F_LOCK · ba72a7b4
      Alexei Starovoitov 提交于
      Add C based test that runs 4 bpf programs in parallel
      that update the same hash and array maps.
      And another 2 threads that read from these two maps
      via lookup(key, value, BPF_F_LOCK) api
      to make sure the user space sees consistent value in both
      hash and array elements while user space races with kernel bpf progs.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      ba72a7b4
    • A
      libbpf: introduce bpf_map_lookup_elem_flags() · df5d22fa
      Alexei Starovoitov 提交于
      Introduce
      int bpf_map_lookup_elem_flags(int fd, const void *key, void *value, __u64 flags)
      helper to lookup array/hash/cgroup_local_storage elements with BPF_F_LOCK flag.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      df5d22fa
    • A
      tools/bpf: sync uapi/bpf.h · e44ac9a2
      Alexei Starovoitov 提交于
      add BPF_F_LOCK definition to tools/include/uapi/linux/bpf.h
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      e44ac9a2
    • A
      bpf: introduce BPF_F_LOCK flag · 96049f3a
      Alexei Starovoitov 提交于
      Introduce BPF_F_LOCK flag for map_lookup and map_update syscall commands
      and for map_update() helper function.
      In all these cases take a lock of existing element (which was provided
      in BTF description) before copying (in or out) the rest of map value.
      
      Implementation details that are part of uapi:
      
      Array:
      The array map takes the element lock for lookup/update.
      
      Hash:
      hash map also takes the lock for lookup/update and tries to avoid the bucket lock.
      If old element exists it takes the element lock and updates the element in place.
      If element doesn't exist it allocates new one and inserts into hash table
      while holding the bucket lock.
      In rare case the hashmap has to take both the bucket lock and the element lock
      to update old value in place.
      
      Cgroup local storage:
      It is similar to array. update in place and lookup are done with lock taken.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      96049f3a
    • A
      selftests/bpf: add bpf_spin_lock C test · ab963beb
      Alexei Starovoitov 提交于
      add bpf_spin_lock C based test that requires latest llvm with BTF support
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      ab963beb
    • A
      selftests/bpf: add bpf_spin_lock verifier tests · b4d4556c
      Alexei Starovoitov 提交于
      add bpf_spin_lock tests to test_verifier.c that don't require
      latest llvm with BTF support
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      b4d4556c
    • A
      tools/bpf: sync include/uapi/linux/bpf.h · 7dac3ae4
      Alexei Starovoitov 提交于
      sync bpf.h
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      7dac3ae4
    • A
      bpf: add support for bpf_spin_lock to cgroup local storage · e16d2f1a
      Alexei Starovoitov 提交于
      Allow 'struct bpf_spin_lock' to reside inside cgroup local storage.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      e16d2f1a
    • A
      bpf: introduce bpf_spin_lock · d83525ca
      Alexei Starovoitov 提交于
      Introduce 'struct bpf_spin_lock' and bpf_spin_lock/unlock() helpers to let
      bpf program serialize access to other variables.
      
      Example:
      struct hash_elem {
          int cnt;
          struct bpf_spin_lock lock;
      };
      struct hash_elem * val = bpf_map_lookup_elem(&hash_map, &key);
      if (val) {
          bpf_spin_lock(&val->lock);
          val->cnt++;
          bpf_spin_unlock(&val->lock);
      }
      
      Restrictions and safety checks:
      - bpf_spin_lock is only allowed inside HASH and ARRAY maps.
      - BTF description of the map is mandatory for safety analysis.
      - bpf program can take one bpf_spin_lock at a time, since two or more can
        cause dead locks.
      - only one 'struct bpf_spin_lock' is allowed per map element.
        It drastically simplifies implementation yet allows bpf program to use
        any number of bpf_spin_locks.
      - when bpf_spin_lock is taken the calls (either bpf2bpf or helpers) are not allowed.
      - bpf program must bpf_spin_unlock() before return.
      - bpf program can access 'struct bpf_spin_lock' only via
        bpf_spin_lock()/bpf_spin_unlock() helpers.
      - load/store into 'struct bpf_spin_lock lock;' field is not allowed.
      - to use bpf_spin_lock() helper the BTF description of map value must be
        a struct and have 'struct bpf_spin_lock anyname;' field at the top level.
        Nested lock inside another struct is not allowed.
      - syscall map_lookup doesn't copy bpf_spin_lock field to user space.
      - syscall map_update and program map_update do not update bpf_spin_lock field.
      - bpf_spin_lock cannot be on the stack or inside networking packet.
        bpf_spin_lock can only be inside HASH or ARRAY map value.
      - bpf_spin_lock is available to root only and to all program types.
      - bpf_spin_lock is not allowed in inner maps of map-in-map.
      - ld_abs is not allowed inside spin_lock-ed region.
      - tracing progs and socket filter progs cannot use bpf_spin_lock due to
        insufficient preemption checks
      
      Implementation details:
      - cgroup-bpf class of programs can nest with xdp/tc programs.
        Hence bpf_spin_lock is equivalent to spin_lock_irqsave.
        Other solutions to avoid nested bpf_spin_lock are possible.
        Like making sure that all networking progs run with softirq disabled.
        spin_lock_irqsave is the simplest and doesn't add overhead to the
        programs that don't use it.
      - arch_spinlock_t is used when its implemented as queued_spin_lock
      - archs can force their own arch_spinlock_t
      - on architectures where queued_spin_lock is not available and
        sizeof(arch_spinlock_t) != sizeof(__u32) trivial lock is used.
      - presence of bpf_spin_lock inside map value could have been indicated via
        extra flag during map_create, but specifying it via BTF is cleaner.
        It provides introspection for map key/value and reduces user mistakes.
      
      Next steps:
      - allow bpf_spin_lock in other map types (like cgroup local storage)
      - introduce BPF_F_LOCK flag for bpf_map_update() syscall and helper
        to request kernel to grab bpf_spin_lock before rewriting the value.
        That will serialize access to map elements.
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      d83525ca
  2. 31 1月, 2019 20 次提交