1. 21 3月, 2015 1 次提交
    • D
      ebpf: add sched_act_type and map it to sk_filter's verifier ops · 94caee8c
      Daniel Borkmann 提交于
      In order to prepare eBPF support for tc action, we need to add
      sched_act_type, so that the eBPF verifier is aware of what helper
      function act_bpf may use, that it can load skb data and read out
      currently available skb fields.
      
      This is bascially analogous to 96be4325 ("ebpf: add sched_cls_type
      and map it to sk_filter's verifier ops").
      
      BPF_PROG_TYPE_SCHED_CLS and BPF_PROG_TYPE_SCHED_ACT need to be
      separate since both will have a different set of functionality in
      future (classifier vs action), thus we won't run into ABI troubles
      when the point in time comes to diverge functionality from the
      classifier.
      
      The future plan for act_bpf would be that it will be able to write
      into skb->data and alter selected fields mirrored in struct __sk_buff.
      
      For an initial support, it's sufficient to map it to sk_filter_ops.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Reviewed-by: NJiri Pirko <jiri@resnulli.us>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      94caee8c
  2. 18 3月, 2015 1 次提交
  3. 16 3月, 2015 3 次提交
  4. 02 3月, 2015 2 次提交
    • D
      ebpf: add sched_cls_type and map it to sk_filter's verifier ops · 96be4325
      Daniel Borkmann 提交于
      As discussed recently and at netconf/netdev01, we want to prevent making
      bpf_verifier_ops registration available for modules, but have them at a
      controlled place inside the kernel instead.
      
      The reason for this is, that out-of-tree modules can go crazy and define
      and register any verfifier ops they want, doing all sorts of crap, even
      bypassing available GPLed eBPF helper functions. We don't want to offer
      such a shiny playground, of course, but keep strict control to ourselves
      inside the core kernel.
      
      This also encourages us to design eBPF user helpers carefully and
      generically, so they can be shared among various subsystems using eBPF.
      
      For the eBPF traffic classifier (cls_bpf), it's a good start to share
      the same helper facilities as we currently do in eBPF for socket filters.
      
      That way, we have BPF_PROG_TYPE_SCHED_CLS look like it's own type, thus
      one day if there's a good reason to diverge the set of helper functions
      from the set available to socket filters, we keep ABI compatibility.
      
      In future, we could place all bpf_prog_type_list at a central place,
      perhaps.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96be4325
    • D
      ebpf: export BPF_PSEUDO_MAP_FD to uapi · f1a66f85
      Daniel Borkmann 提交于
      We need to export BPF_PSEUDO_MAP_FD to user space, as it's used in the
      ELF BPF loader where instructions are being loaded that need map fixups.
      
      An initial stage loads all maps into the kernel, and later on replaces
      related instructions in the eBPF blob with BPF_PSEUDO_MAP_FD as source
      register and the actual fd as immediate value.
      
      The kernel verifier recognizes this keyword and replaces the map fd with
      a real pointer internally.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f1a66f85
  5. 06 12月, 2014 1 次提交
  6. 19 11月, 2014 4 次提交
    • A
      bpf: allow eBPF programs to use maps · d0003ec0
      Alexei Starovoitov 提交于
      expose bpf_map_lookup_elem(), bpf_map_update_elem(), bpf_map_delete_elem()
      map accessors to eBPF programs
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0003ec0
    • A
      bpf: add array type of eBPF maps · 28fbcfa0
      Alexei Starovoitov 提交于
      add new map type BPF_MAP_TYPE_ARRAY and its implementation
      
      - optimized for fastest possible lookup()
        . in the future verifier/JIT may recognize lookup() with constant key
          and optimize it into constant pointer. Can optimize non-constant
          key into direct pointer arithmetic as well, since pointers and
          value_size are constant for the life of the eBPF program.
          In other words array_map_lookup_elem() may be 'inlined' by verifier/JIT
          while preserving concurrent access to this map from user space
      
      - two main use cases for array type:
        . 'global' eBPF variables: array of 1 element with key=0 and value is a
          collection of 'global' variables which programs can use to keep the state
          between events
        . aggregation of tracing events into fixed set of buckets
      
      - all array elements pre-allocated and zero initialized at init time
      
      - key as an index in array and can only be 4 byte
      
      - map_delete_elem() returns EINVAL, since elements cannot be deleted
      
      - map_update_elem() replaces elements in an non-atomic way
        (for atomic updates hashtable type should be used instead)
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28fbcfa0
    • A
      bpf: add hashtable type of eBPF maps · 0f8e4bd8
      Alexei Starovoitov 提交于
      add new map type BPF_MAP_TYPE_HASH and its implementation
      
      - maps are created/destroyed by userspace. Both userspace and eBPF programs
        can lookup/update/delete elements from the map
      
      - eBPF programs can be called in_irq(), so use spin_lock_irqsave() mechanism
        for concurrent updates
      
      - key/value are opaque range of bytes (aligned to 8 bytes)
      
      - user space provides 3 configuration attributes via BPF syscall:
        key_size, value_size, max_entries
      
      - map takes care of allocating/freeing key/value pairs
      
      - map_update_elem() must fail to insert new element when max_entries
        limit is reached to make sure that eBPF programs cannot exhaust memory
      
      - map_update_elem() replaces elements in an atomic way
      
      - optimized for speed of lookup() which can be called multiple times from
        eBPF program which itself is triggered by high volume of events
        . in the future JIT compiler may recognize lookup() call and optimize it
          further, since key_size is constant for life of eBPF program
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0f8e4bd8
    • A
      bpf: add 'flags' attribute to BPF_MAP_UPDATE_ELEM command · 3274f520
      Alexei Starovoitov 提交于
      the current meaning of BPF_MAP_UPDATE_ELEM syscall command is:
      either update existing map element or create a new one.
      Initially the plan was to add a new command to handle the case of
      'create new element if it didn't exist', but 'flags' style looks
      cleaner and overall diff is much smaller (more code reused), so add 'flags'
      attribute to BPF_MAP_UPDATE_ELEM command with the following meaning:
       #define BPF_ANY	0 /* create new element or update existing */
       #define BPF_NOEXIST	1 /* create new element if it didn't exist */
       #define BPF_EXIST	2 /* update existing element */
      
      bpf_update_elem(fd, key, value, BPF_NOEXIST) call can fail with EEXIST
      if element already exists.
      
      bpf_update_elem(fd, key, value, BPF_EXIST) can fail with ENOENT
      if element doesn't exist.
      
      Userspace will call it as:
      int bpf_update_elem(int fd, void *key, void *value, __u64 flags)
      {
          union bpf_attr attr = {
              .map_fd = fd,
              .key = ptr_to_u64(key),
              .value = ptr_to_u64(value),
              .flags = flags;
          };
      
          return bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
      }
      
      First two bits of 'flags' are used to encode style of bpf_update_elem() command.
      Bits 2-63 are reserved for future use.
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3274f520
  7. 15 10月, 2014 1 次提交
  8. 27 9月, 2014 4 次提交
    • A
      bpf: verifier (add ability to receive verification log) · cbd35700
      Alexei Starovoitov 提交于
      add optional attributes for BPF_PROG_LOAD syscall:
      union bpf_attr {
          struct {
      	...
      	__u32         log_level; /* verbosity level of eBPF verifier */
      	__u32         log_size;  /* size of user buffer */
      	__aligned_u64 log_buf;   /* user supplied 'char *buffer' */
          };
      };
      
      when log_level > 0 the verifier will return its verification log in the user
      supplied buffer 'log_buf' which can be used by program author to analyze why
      verifier rejected given program.
      
      'Understanding eBPF verifier messages' section of Documentation/networking/filter.txt
      provides several examples of these messages, like the program:
      
        BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
        BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
        BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
        BPF_LD_MAP_FD(BPF_REG_1, 0),
        BPF_CALL_FUNC(BPF_FUNC_map_lookup_elem),
        BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
        BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0),
        BPF_EXIT_INSN(),
      
      will be rejected with the following multi-line message in log_buf:
      
        0: (7a) *(u64 *)(r10 -8) = 0
        1: (bf) r2 = r10
        2: (07) r2 += -8
        3: (b7) r1 = 0
        4: (85) call 1
        5: (15) if r0 == 0x0 goto pc+1
         R0=map_ptr R10=fp
        6: (7a) *(u64 *)(r0 +4) = 0
        misaligned access off 4 size 8
      
      The format of the output can change at any time as verifier evolves.
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cbd35700
    • A
      bpf: expand BPF syscall with program load/unload · 09756af4
      Alexei Starovoitov 提交于
      eBPF programs are similar to kernel modules. They are loaded by the user
      process and automatically unloaded when process exits. Each eBPF program is
      a safe run-to-completion set of instructions. eBPF verifier statically
      determines that the program terminates and is safe to execute.
      
      The following syscall wrapper can be used to load the program:
      int bpf_prog_load(enum bpf_prog_type prog_type,
                        const struct bpf_insn *insns, int insn_cnt,
                        const char *license)
      {
          union bpf_attr attr = {
              .prog_type = prog_type,
              .insns = ptr_to_u64(insns),
              .insn_cnt = insn_cnt,
              .license = ptr_to_u64(license),
          };
      
          return bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
      }
      where 'insns' is an array of eBPF instructions and 'license' is a string
      that must be GPL compatible to call helper functions marked gpl_only
      
      Upon succesful load the syscall returns prog_fd.
      Use close(prog_fd) to unload the program.
      
      User space tests and examples follow in the later patches
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09756af4
    • A
      bpf: add lookup/update/delete/iterate methods to BPF maps · db20fd2b
      Alexei Starovoitov 提交于
      'maps' is a generic storage of different types for sharing data between kernel
      and userspace.
      
      The maps are accessed from user space via BPF syscall, which has commands:
      
      - create a map with given type and attributes
        fd = bpf(BPF_MAP_CREATE, union bpf_attr *attr, u32 size)
        returns fd or negative error
      
      - lookup key in a given map referenced by fd
        err = bpf(BPF_MAP_LOOKUP_ELEM, union bpf_attr *attr, u32 size)
        using attr->map_fd, attr->key, attr->value
        returns zero and stores found elem into value or negative error
      
      - create or update key/value pair in a given map
        err = bpf(BPF_MAP_UPDATE_ELEM, union bpf_attr *attr, u32 size)
        using attr->map_fd, attr->key, attr->value
        returns zero or negative error
      
      - find and delete element by key in a given map
        err = bpf(BPF_MAP_DELETE_ELEM, union bpf_attr *attr, u32 size)
        using attr->map_fd, attr->key
      
      - iterate map elements (based on input key return next_key)
        err = bpf(BPF_MAP_GET_NEXT_KEY, union bpf_attr *attr, u32 size)
        using attr->map_fd, attr->key, attr->next_key
      
      - close(fd) deletes the map
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      db20fd2b
    • A
      bpf: introduce BPF syscall and maps · 99c55f7d
      Alexei Starovoitov 提交于
      BPF syscall is a multiplexor for a range of different operations on eBPF.
      This patch introduces syscall with single command to create a map.
      Next patch adds commands to access maps.
      
      'maps' is a generic storage of different types for sharing data between kernel
      and userspace.
      
      Userspace example:
      /* this syscall wrapper creates a map with given type and attributes
       * and returns map_fd on success.
       * use close(map_fd) to delete the map
       */
      int bpf_create_map(enum bpf_map_type map_type, int key_size,
                         int value_size, int max_entries)
      {
          union bpf_attr attr = {
              .map_type = map_type,
              .key_size = key_size,
              .value_size = value_size,
              .max_entries = max_entries
          };
      
          return bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
      }
      
      'union bpf_attr' is backwards compatible with future extensions.
      
      More details in Documentation/networking/filter.txt and in manpage
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      99c55f7d
  9. 10 9月, 2014 1 次提交