1. 16 6月, 2018 3 次提交
    • T
      xdp: Fix handling of devmap in generic XDP · 6d5fc195
      Toshiaki Makita 提交于
      Commit 67f29e07 ("bpf: devmap introduce dev_map_enqueue") changed
      the return value type of __devmap_lookup_elem() from struct net_device *
      to struct bpf_dtab_netdev * but forgot to modify generic XDP code
      accordingly.
      
      Thus generic XDP incorrectly used struct bpf_dtab_netdev where struct
      net_device is expected, then skb->dev was set to invalid value.
      
      v2:
      - Fix compiler warning without CONFIG_BPF_SYSCALL.
      
      Fixes: 67f29e07 ("bpf: devmap introduce dev_map_enqueue")
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: NYonghong Song <yhs@fb.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      6d5fc195
    • D
      bpf: reject any prog that failed read-only lock · 9facc336
      Daniel Borkmann 提交于
      We currently lock any JITed image as read-only via bpf_jit_binary_lock_ro()
      as well as the BPF image as read-only through bpf_prog_lock_ro(). In
      the case any of these would fail we throw a WARN_ON_ONCE() in order to
      yell loudly to the log. Perhaps, to some extend, this may be comparable
      to an allocation where __GFP_NOWARN is explicitly not set.
      
      Added via 65869a47 ("bpf: improve read-only handling"), this behavior
      is slightly different compared to any of the other in-kernel set_memory_ro()
      users who do not check the return code of set_memory_ro() and friends /at
      all/ (e.g. in the case of module_enable_ro() / module_disable_ro()). Given
      in BPF this is mandatory hardening step, we want to know whether there
      are any issues that would leave both BPF data writable. So it happens
      that syzkaller enabled fault injection and it triggered memory allocation
      failure deep inside x86's change_page_attr_set_clr() which was triggered
      from set_memory_ro().
      
      Now, there are two options: i) leaving everything as is, and ii) reworking
      the image locking code in order to have a final checkpoint out of the
      central bpf_prog_select_runtime() which probes whether any of the calls
      during prog setup weren't successful, and then bailing out with an error.
      Option ii) is a better approach since this additional paranoia avoids
      altogether leaving any potential W+X pages from BPF side in the system.
      Therefore, lets be strict about it, and reject programs in such unlikely
      occasion. While testing I noticed also that one bpf_prog_lock_ro()
      call was missing on the outer dummy prog in case of calls, e.g. in the
      destructor we call bpf_prog_free_deferred() on the main prog where we
      try to bpf_prog_unlock_free() the program, and since we go via
      bpf_prog_select_runtime() do that as well.
      
      Reported-by: syzbot+3b889862e65a98317058@syzkaller.appspotmail.com
      Reported-by: syzbot+9e762b52dd17e616a7a5@syzkaller.appspotmail.com
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      9facc336
    • D
      bpf: fix panic in prog load calls cleanup · 7d1982b4
      Daniel Borkmann 提交于
      While testing I found that when hitting error path in bpf_prog_load()
      where we jump to free_used_maps and prog contained BPF to BPF calls
      that were JITed earlier, then we never clean up the bpf_prog_kallsyms_add()
      done under jit_subprogs(). Add proper API to make BPF kallsyms deletion
      more clear and fix that.
      
      Fixes: 1c2a088a ("bpf: x64: add JIT support for multi-function programs")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      7d1982b4
  2. 13 6月, 2018 4 次提交
    • K
      treewide: Use array_size() in vzalloc() · fad953ce
      Kees Cook 提交于
      The vzalloc() function has no 2-factor argument form, so multiplication
      factors need to be wrapped in array_size(). This patch replaces cases of:
      
              vzalloc(a * b)
      
      with:
              vzalloc(array_size(a, b))
      
      as well as handling cases of:
      
              vzalloc(a * b * c)
      
      with:
      
              vzalloc(array3_size(a, b, c))
      
      This does, however, attempt to ignore constant size factors like:
      
              vzalloc(4 * 1024)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        vzalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        vzalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        vzalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
        vzalloc(
      -	sizeof(TYPE) * (COUNT_ID)
      +	array_size(COUNT_ID, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT_ID
      +	array_size(COUNT_ID, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * (COUNT_CONST)
      +	array_size(COUNT_CONST, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT_CONST
      +	array_size(COUNT_CONST, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT_ID)
      +	array_size(COUNT_ID, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT_ID
      +	array_size(COUNT_ID, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT_CONST)
      +	array_size(COUNT_CONST, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT_CONST
      +	array_size(COUNT_CONST, sizeof(THING))
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
        vzalloc(
      -	SIZE * COUNT
      +	array_size(COUNT, SIZE)
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        vzalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        vzalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        vzalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        vzalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        vzalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        vzalloc(C1 * C2 * C3, ...)
      |
        vzalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants.
      @@
      expression E1, E2;
      constant C1, C2;
      @@
      
      (
        vzalloc(C1 * C2, ...)
      |
        vzalloc(
      -	E1 * E2
      +	array_size(E1, E2)
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      fad953ce
    • K
      treewide: kvzalloc() -> kvcalloc() · 778e1cdd
      Kees Cook 提交于
      The kvzalloc() function has a 2-factor argument form, kvcalloc(). This
      patch replaces cases of:
      
              kvzalloc(a * b, gfp)
      
      with:
              kvcalloc(a * b, gfp)
      
      as well as handling cases of:
      
              kvzalloc(a * b * c, gfp)
      
      with:
      
              kvzalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kvcalloc(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kvzalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kvzalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kvzalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kvzalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kvzalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kvzalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kvzalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kvzalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kvzalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kvzalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kvzalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kvzalloc
      + kvcalloc
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kvzalloc
      + kvcalloc
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kvzalloc
      + kvcalloc
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kvzalloc
      + kvcalloc
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kvzalloc
      + kvcalloc
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kvzalloc
      + kvcalloc
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kvzalloc
      + kvcalloc
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kvzalloc
      + kvcalloc
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kvzalloc
      + kvcalloc
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kvzalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kvzalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kvzalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kvzalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kvzalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kvzalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kvzalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kvzalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kvzalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kvzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kvzalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kvzalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kvzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kvzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kvzalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvzalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvzalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvzalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvzalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvzalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvzalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kvzalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kvzalloc(C1 * C2 * C3, ...)
      |
        kvzalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kvzalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kvzalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kvzalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kvzalloc(sizeof(THING) * C2, ...)
      |
        kvzalloc(sizeof(TYPE) * C2, ...)
      |
        kvzalloc(C1 * C2 * C3, ...)
      |
        kvzalloc(C1 * C2, ...)
      |
      - kvzalloc
      + kvcalloc
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kvzalloc
      + kvcalloc
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kvzalloc
      + kvcalloc
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kvzalloc
      + kvcalloc
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kvzalloc
      + kvcalloc
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kvzalloc
      + kvcalloc
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kvzalloc
      + kvcalloc
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      778e1cdd
    • K
      treewide: kzalloc() -> kcalloc() · 6396bb22
      Kees Cook 提交于
      The kzalloc() function has a 2-factor argument form, kcalloc(). This
      patch replaces cases of:
      
              kzalloc(a * b, gfp)
      
      with:
              kcalloc(a * b, gfp)
      
      as well as handling cases of:
      
              kzalloc(a * b * c, gfp)
      
      with:
      
              kzalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kzalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kzalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kzalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kzalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kzalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kzalloc
      + kcalloc
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kzalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kzalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kzalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc(C1 * C2 * C3, ...)
      |
        kzalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc(sizeof(THING) * C2, ...)
      |
        kzalloc(sizeof(TYPE) * C2, ...)
      |
        kzalloc(C1 * C2 * C3, ...)
      |
        kzalloc(C1 * C2, ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      6396bb22
    • K
      treewide: kmalloc() -> kmalloc_array() · 6da2ec56
      Kees Cook 提交于
      The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
      patch replaces cases of:
      
              kmalloc(a * b, gfp)
      
      with:
              kmalloc_array(a * b, gfp)
      
      as well as handling cases of:
      
              kmalloc(a * b * c, gfp)
      
      with:
      
              kmalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kmalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kmalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The tools/ directory was manually excluded, since it has its own
      implementation of kmalloc().
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kmalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kmalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kmalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kmalloc
      + kmalloc_array
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kmalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kmalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kmalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kmalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kmalloc(C1 * C2 * C3, ...)
      |
        kmalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kmalloc(sizeof(THING) * C2, ...)
      |
        kmalloc(sizeof(TYPE) * C2, ...)
      |
        kmalloc(C1 * C2 * C3, ...)
      |
        kmalloc(C1 * C2, ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      6da2ec56
  3. 09 6月, 2018 1 次提交
    • D
      bpf: implement dummy fops for bpf objects · b1655857
      Daniel Borkmann 提交于
      syzkaller was able to trigger the following warning in
      do_dentry_open():
      
        WARNING: CPU: 1 PID: 4508 at fs/open.c:778 do_dentry_open+0x4ad/0xe40 fs/open.c:778
        Kernel panic - not syncing: panic_on_warn set ...
      
        CPU: 1 PID: 4508 Comm: syz-executor867 Not tainted 4.17.0+ #90
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        Call Trace:
        [...]
         vfs_open+0x139/0x230 fs/open.c:908
         do_last fs/namei.c:3370 [inline]
         path_openat+0x1717/0x4dc0 fs/namei.c:3511
         do_filp_open+0x249/0x350 fs/namei.c:3545
         do_sys_open+0x56f/0x740 fs/open.c:1101
         __do_sys_openat fs/open.c:1128 [inline]
         __se_sys_openat fs/open.c:1122 [inline]
         __x64_sys_openat+0x9d/0x100 fs/open.c:1122
         do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Problem was that prog and map inodes in bpf fs did not
      implement a dummy file open operation that would return an
      error. The patch in do_dentry_open() checks whether f_ops
      are present and if not bails out with an error. While this
      may be fine, we really shouldn't be throwing a warning
      though. Thus follow the model similar to bad_file_ops and
      reject the request unconditionally with -EIO.
      
      Fixes: b2197755 ("bpf: add support for persistent maps/progs")
      Reported-by: syzbot+2e7fcab0f56fdbb330b8@syzkaller.appspotmail.com
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      b1655857
  4. 08 6月, 2018 1 次提交
    • D
      bpf: reject passing modified ctx to helper functions · 58990d1f
      Daniel Borkmann 提交于
      As commit 28e33f9d ("bpf: disallow arithmetic operations on
      context pointer") already describes, f1174f77 ("bpf/verifier:
      rework value tracking") removed the specific white-listed cases
      we had previously where we would allow for pointer arithmetic in
      order to further generalize it, and allow e.g. context access via
      modified registers. While the dereferencing of modified context
      pointers had been forbidden through 28e33f9d, syzkaller did
      recently manage to trigger several KASAN splats for slab out of
      bounds access and use after frees by simply passing a modified
      context pointer to a helper function which would then do the bad
      access since verifier allowed it in adjust_ptr_min_max_vals().
      
      Rejecting arithmetic on ctx pointer in adjust_ptr_min_max_vals()
      generally could break existing programs as there's a valid use
      case in tracing in combination with passing the ctx to helpers as
      bpf_probe_read(), where the register then becomes unknown at
      verification time due to adding a non-constant offset to it. An
      access sequence may look like the following:
      
        offset = args->filename;  /* field __data_loc filename */
        bpf_probe_read(&dst, len, (char *)args + offset); // args is ctx
      
      There are two options: i) we could special case the ctx and as
      soon as we add a constant or bounded offset to it (hence ctx type
      wouldn't change) we could turn the ctx into an unknown scalar, or
      ii) we generalize the sanity test for ctx member access into a
      small helper and assert it on the ctx register that was passed
      as a function argument. Fwiw, latter is more obvious and less
      complex at the same time, and one case that may potentially be
      legitimate in future for ctx member access at least would be for
      ctx to carry a const offset. Therefore, fix follows approach
      from ii) and adds test cases to BPF kselftests.
      
      Fixes: f1174f77 ("bpf/verifier: rework value tracking")
      Reported-by: syzbot+3d0b2441dbb71751615e@syzkaller.appspotmail.com
      Reported-by: syzbot+c8504affd4fdd0c1b626@syzkaller.appspotmail.com
      Reported-by: syzbot+e5190cb881d8660fb1a3@syzkaller.appspotmail.com
      Reported-by: syzbot+efae31b384d5badbd620@syzkaller.appspotmail.com
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NYonghong Song <yhs@fb.com>
      Acked-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      58990d1f
  5. 04 6月, 2018 1 次提交
    • Y
      bpf: implement bpf_get_current_cgroup_id() helper · bf6fa2c8
      Yonghong Song 提交于
      bpf has been used extensively for tracing. For example, bcc
      contains an almost full set of bpf-based tools to trace kernel
      and user functions/events. Most tracing tools are currently
      either filtered based on pid or system-wide.
      
      Containers have been used quite extensively in industry and
      cgroup is often used together to provide resource isolation
      and protection. Several processes may run inside the same
      container. It is often desirable to get container-level tracing
      results as well, e.g. syscall count, function count, I/O
      activity, etc.
      
      This patch implements a new helper, bpf_get_current_cgroup_id(),
      which will return cgroup id based on the cgroup within which
      the current task is running.
      
      The later patch will provide an example to show that
      userspace can get the same cgroup id so it could
      configure a filter or policy in the bpf program based on
      task cgroup id.
      
      The helper is currently implemented for tracing. It can
      be added to other program types as well when needed.
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      bf6fa2c8
  6. 03 6月, 2018 8 次提交
    • J
      bpf/xdp: devmap can avoid calling ndo_xdp_flush · c1ece6b2
      Jesper Dangaard Brouer 提交于
      The XDP_REDIRECT map devmap can avoid using ndo_xdp_flush, by instead
      instructing ndo_xdp_xmit to flush via XDP_XMIT_FLUSH flag in
      appropriate places.
      
      Notice after this patch it is possible to remove ndo_xdp_flush
      completely, as this is the last user of ndo_xdp_flush. This is left
      for later patches, to keep driver changes separate.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      c1ece6b2
    • J
      xdp: add flags argument to ndo_xdp_xmit API · 42b33468
      Jesper Dangaard Brouer 提交于
      This patch only change the API and reject any use of flags. This is an
      intermediate step that allows us to implement the flush flag operation
      later, for each individual driver in a separate patch.
      
      The plan is to implement flush operation via XDP_XMIT_FLUSH flag
      and then remove XDP_XMIT_FLAGS_NONE when done.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      42b33468
    • D
      bpf: fix context access in tracing progs on 32 bit archs · bc23105c
      Daniel Borkmann 提交于
      Wang reported that all the testcases for BPF_PROG_TYPE_PERF_EVENT
      program type in test_verifier report the following errors on x86_32:
      
        172/p unpriv: spill/fill of different pointers ldx FAIL
        Unexpected error message!
        0: (bf) r6 = r10
        1: (07) r6 += -8
        2: (15) if r1 == 0x0 goto pc+3
        R1=ctx(id=0,off=0,imm=0) R6=fp-8,call_-1 R10=fp0,call_-1
        3: (bf) r2 = r10
        4: (07) r2 += -76
        5: (7b) *(u64 *)(r6 +0) = r2
        6: (55) if r1 != 0x0 goto pc+1
        R1=ctx(id=0,off=0,imm=0) R2=fp-76,call_-1 R6=fp-8,call_-1 R10=fp0,call_-1 fp-8=fp
        7: (7b) *(u64 *)(r6 +0) = r1
        8: (79) r1 = *(u64 *)(r6 +0)
        9: (79) r1 = *(u64 *)(r1 +68)
        invalid bpf_context access off=68 size=8
      
        378/p check bpf_perf_event_data->sample_period byte load permitted FAIL
        Failed to load prog 'Permission denied'!
        0: (b7) r0 = 0
        1: (71) r0 = *(u8 *)(r1 +68)
        invalid bpf_context access off=68 size=1
      
        379/p check bpf_perf_event_data->sample_period half load permitted FAIL
        Failed to load prog 'Permission denied'!
        0: (b7) r0 = 0
        1: (69) r0 = *(u16 *)(r1 +68)
        invalid bpf_context access off=68 size=2
      
        380/p check bpf_perf_event_data->sample_period word load permitted FAIL
        Failed to load prog 'Permission denied'!
        0: (b7) r0 = 0
        1: (61) r0 = *(u32 *)(r1 +68)
        invalid bpf_context access off=68 size=4
      
        381/p check bpf_perf_event_data->sample_period dword load permitted FAIL
        Failed to load prog 'Permission denied'!
        0: (b7) r0 = 0
        1: (79) r0 = *(u64 *)(r1 +68)
        invalid bpf_context access off=68 size=8
      
      Reason is that struct pt_regs on x86_32 doesn't fully align to 8 byte
      boundary due to its size of 68 bytes. Therefore, bpf_ctx_narrow_access_ok()
      will then bail out saying that off & (size_default - 1) which is 68 & 7
      doesn't cleanly align in the case of sample_period access from struct
      bpf_perf_event_data, hence verifier wrongly thinks we might be doing an
      unaligned access here though underlying arch can handle it just fine.
      Therefore adjust this down to machine size and check and rewrite the
      offset for narrow access on that basis. We also need to fix corresponding
      pe_prog_is_valid_access(), since we hit the check for off % size != 0
      (e.g. 68 % 8 -> 4) in the first and last test. With that in place, progs
      for tracing work on x86_32.
      Reported-by: NWang YanQing <udknight@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Tested-by: NWang YanQing <udknight@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      bc23105c
    • D
      bpf: avoid retpoline for lookup/update/delete calls on maps · 09772d92
      Daniel Borkmann 提交于
      While some of the BPF map lookup helpers provide a ->map_gen_lookup()
      callback for inlining the map lookup altogether it is not available
      for every map, so the remaining ones have to call bpf_map_lookup_elem()
      helper which does a dispatch to map->ops->map_lookup_elem(). In
      times of retpolines, this will control and trap speculative execution
      rather than letting it do its work for the indirect call and will
      therefore cause a slowdown. Likewise, bpf_map_update_elem() and
      bpf_map_delete_elem() do not have an inlined version and need to call
      into their map->ops->map_update_elem() resp. map->ops->map_delete_elem()
      handlers.
      
      Before:
      
        # bpftool prog dump xlated id 1
          0: (bf) r2 = r10
          1: (07) r2 += -8
          2: (7a) *(u64 *)(r2 +0) = 0
          3: (18) r1 = map[id:1]
          5: (85) call __htab_map_lookup_elem#232656
          6: (15) if r0 == 0x0 goto pc+4
          7: (71) r1 = *(u8 *)(r0 +35)
          8: (55) if r1 != 0x0 goto pc+1
          9: (72) *(u8 *)(r0 +35) = 1
         10: (07) r0 += 56
         11: (15) if r0 == 0x0 goto pc+4
         12: (bf) r2 = r0
         13: (18) r1 = map[id:1]
         15: (85) call bpf_map_delete_elem#215008  <-- indirect call via
         16: (95) exit                                 helper
      
      After:
      
        # bpftool prog dump xlated id 1
          0: (bf) r2 = r10
          1: (07) r2 += -8
          2: (7a) *(u64 *)(r2 +0) = 0
          3: (18) r1 = map[id:1]
          5: (85) call __htab_map_lookup_elem#233328
          6: (15) if r0 == 0x0 goto pc+4
          7: (71) r1 = *(u8 *)(r0 +35)
          8: (55) if r1 != 0x0 goto pc+1
          9: (72) *(u8 *)(r0 +35) = 1
         10: (07) r0 += 56
         11: (15) if r0 == 0x0 goto pc+4
         12: (bf) r2 = r0
         13: (18) r1 = map[id:1]
         15: (85) call htab_lru_map_delete_elem#238240  <-- direct call
         16: (95) exit
      
      In all three lookup/update/delete cases however we can use the actual
      address of the map callback directly if we find that there's only a
      single path with a map pointer leading to the helper call, meaning
      when the map pointer has not been poisoned from verifier side.
      Example code can be seen above for the delete case.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      09772d92
    • D
      bpf: show prog and map id in fdinfo · 4316b409
      Daniel Borkmann 提交于
      Its trivial and straight forward to expose it for scripts that can
      then use it along with bpftool in order to inspect an individual
      application's used maps and progs. Right now we dump some basic
      information in the fdinfo file but with the help of the map/prog
      id full introspection becomes possible now.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      4316b409
    • D
      bpf: fixup error message from gpl helpers on license mismatch · 3fe2867c
      Daniel Borkmann 提交于
      Stating 'proprietary program' in the error is just silly since it
      can also be a different open source license than that which is just
      not compatible.
      
      Reference: https://twitter.com/majek04/status/998531268039102465Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      3fe2867c
    • M
      bpf: btf: Ensure t->type == 0 for BTF_KIND_FWD · 8175383f
      Martin KaFai Lau 提交于
      The t->type in BTF_KIND_FWD is not used.  It must be 0.
      This patch ensures that and also adds a test case in test_btf.c
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      8175383f
    • M
      bpf: btf: Check array t->size · b9308ae6
      Martin KaFai Lau 提交于
      This patch ensures array's t->size is 0.
      
      The array size is decided by its individual elem's size and the
      number of elements.  Hence, t->size is not used and
      it must be 0.
      
      A test case is added to test_btf.c
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      b9308ae6
  7. 31 5月, 2018 1 次提交
  8. 30 5月, 2018 2 次提交
  9. 28 5月, 2018 3 次提交
    • A
      bpf: Hooks for sys_sendmsg · 1cedee13
      Andrey Ignatov 提交于
      In addition to already existing BPF hooks for sys_bind and sys_connect,
      the patch provides new hooks for sys_sendmsg.
      
      It leverages existing BPF program type `BPF_PROG_TYPE_CGROUP_SOCK_ADDR`
      that provides access to socket itlself (properties like family, type,
      protocol) and user-passed `struct sockaddr *` so that BPF program can
      override destination IP and port for system calls such as sendto(2) or
      sendmsg(2) and/or assign source IP to the socket.
      
      The hooks are implemented as two new attach types:
      `BPF_CGROUP_UDP4_SENDMSG` and `BPF_CGROUP_UDP6_SENDMSG` for UDPv4 and
      UDPv6 correspondingly.
      
      UDPv4 and UDPv6 separate attach types for same reason as sys_bind and
      sys_connect hooks, i.e. to prevent reading from / writing to e.g.
      user_ip6 fields when user passes sockaddr_in since it'd be out-of-bound.
      
      The difference with already existing hooks is sys_sendmsg are
      implemented only for unconnected UDP.
      
      For TCP it doesn't make sense to change user-provided `struct sockaddr *`
      at sendto(2)/sendmsg(2) time since socket either was already connected
      and has source/destination set or wasn't connected and call to
      sendto(2)/sendmsg(2) would lead to ENOTCONN anyway.
      
      Connected UDP is already handled by sys_connect hooks that can override
      source/destination at connect time and use fast-path later, i.e. these
      hooks don't affect UDP fast-path.
      
      Rewriting source IP is implemented differently than that in sys_connect
      hooks. When sys_sendmsg is used with unconnected UDP it doesn't work to
      just bind socket to desired local IP address since source IP can be set
      on per-packet basis by using ancillary data (cmsg(3)). So no matter if
      socket is bound or not, source IP has to be rewritten on every call to
      sys_sendmsg.
      
      To do so two new fields are added to UAPI `struct bpf_sock_addr`;
      * `msg_src_ip4` to set source IPv4 for UDPv4;
      * `msg_src_ip6` to set source IPv6 for UDPv6.
      Signed-off-by: NAndrey Ignatov <rdna@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      1cedee13
    • A
      bpf: avoid -Wmaybe-uninitialized warning · dc3b8ae9
      Arnd Bergmann 提交于
      The stack_map_get_build_id_offset() function is too long for gcc to track
      whether 'work' may or may not be initialized at the end of it, leading
      to a false-positive warning:
      
      kernel/bpf/stackmap.c: In function 'stack_map_get_build_id_offset':
      kernel/bpf/stackmap.c:334:13: error: 'work' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      
      This removes the 'in_nmi_ctx' flag and uses the state of that variable
      itself to see if it got initialized.
      
      Fixes: bae77c5e ("bpf: enable stackmap with build_id in nmi context")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      dc3b8ae9
    • A
      bpf: btf: avoid -Wreturn-type warning · 53c8036c
      Arnd Bergmann 提交于
      gcc warns about a noreturn function possibly returning in
      some configurations:
      
      kernel/bpf/btf.c: In function 'env_type_is_resolve_sink':
      kernel/bpf/btf.c:729:1: error: control reaches end of non-void function [-Werror=return-type]
      
      Using BUG() instead of BUG_ON() avoids that warning and otherwise
      does the exact same thing.
      
      Fixes: eb3f595d ("bpf: btf: Validate type reference")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      53c8036c
  10. 25 5月, 2018 7 次提交
    • J
      xdp/trace: extend tracepoint in devmap with an err · e74de52e
      Jesper Dangaard Brouer 提交于
      Extending tracepoint xdp:xdp_devmap_xmit in devmap with an err code
      allow people to easier identify the reason behind the ndo_xdp_xmit
      call to a given driver is failing.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      e74de52e
    • J
      xdp: change ndo_xdp_xmit API to support bulking · 735fc405
      Jesper Dangaard Brouer 提交于
      This patch change the API for ndo_xdp_xmit to support bulking
      xdp_frames.
      
      When kernel is compiled with CONFIG_RETPOLINE, XDP sees a huge slowdown.
      Most of the slowdown is caused by DMA API indirect function calls, but
      also the net_device->ndo_xdp_xmit() call.
      
      Benchmarked patch with CONFIG_RETPOLINE, using xdp_redirect_map with
      single flow/core test (CPU E5-1650 v4 @ 3.60GHz), showed
      performance improved:
       for driver ixgbe: 6,042,682 pps -> 6,853,768 pps = +811,086 pps
       for driver i40e : 6,187,169 pps -> 6,724,519 pps = +537,350 pps
      
      With frames avail as a bulk inside the driver ndo_xdp_xmit call,
      further optimizations are possible, like bulk DMA-mapping for TX.
      
      Testing without CONFIG_RETPOLINE show the same performance for
      physical NIC drivers.
      
      The virtual NIC driver tun sees a huge performance boost, as it can
      avoid doing per frame producer locking, but instead amortize the
      locking cost over the bulk.
      
      V2: Fix compile errors reported by kbuild test robot <lkp@intel.com>
      V4: Isolated ndo, driver changes and callers.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      735fc405
    • J
      xdp: introduce xdp_return_frame_rx_napi · 389ab7f0
      Jesper Dangaard Brouer 提交于
      When sending an xdp_frame through xdp_do_redirect call, then error
      cases can happen where the xdp_frame needs to be dropped, and
      returning an -errno code isn't sufficient/possible any-longer
      (e.g. for cpumap case). This is already fully supported, by simply
      calling xdp_return_frame.
      
      This patch is an optimization, which provides xdp_return_frame_rx_napi,
      which is a faster variant for these error cases.  It take advantage of
      the protection provided by XDP RX running under NAPI protection.
      
      This change is mostly relevant for drivers using the page_pool
      allocator as it can take advantage of this. (Tested with mlx5).
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      389ab7f0
    • J
      xdp: add tracepoint for devmap like cpumap have · 38edddb8
      Jesper Dangaard Brouer 提交于
      Notice how this allow us get XDP statistic without affecting the XDP
      performance, as tracepoint is no-longer activated on a per packet basis.
      
      V5: Spotted by John Fastabend.
       Fix 'sent' also counted 'drops' in this patch, a later patch corrected
       this, but it was a mistake in this intermediate step.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      38edddb8
    • J
      bpf: devmap prepare xdp frames for bulking · 5d053f9d
      Jesper Dangaard Brouer 提交于
      Like cpumap create queue for xdp frames that will be bulked.  For now,
      this patch simply invoke ndo_xdp_xmit foreach frame.  This happens,
      either when the map flush operation is envoked, or when the limit
      DEV_MAP_BULK_SIZE is reached.
      
      V5: Avoid memleak on error path in dev_map_update_elem()
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      5d053f9d
    • J
      bpf: devmap introduce dev_map_enqueue · 67f29e07
      Jesper Dangaard Brouer 提交于
      Functionality is the same, but the ndo_xdp_xmit call is now
      simply invoked from inside the devmap.c code.
      
      V2: Fix compile issue reported by kbuild test robot <lkp@intel.com>
      
      V5: Cleanups requested by Daniel
       - Newlines before func definition
       - Use BUILD_BUG_ON checks
       - Remove unnecessary use return value store in dev_map_enqueue
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      67f29e07
    • Y
      bpf: introduce bpf subcommand BPF_TASK_FD_QUERY · 41bdc4b4
      Yonghong Song 提交于
      Currently, suppose a userspace application has loaded a bpf program
      and attached it to a tracepoint/kprobe/uprobe, and a bpf
      introspection tool, e.g., bpftool, wants to show which bpf program
      is attached to which tracepoint/kprobe/uprobe. Such attachment
      information will be really useful to understand the overall bpf
      deployment in the system.
      
      There is a name field (16 bytes) for each program, which could
      be used to encode the attachment point. There are some drawbacks
      for this approaches. First, bpftool user (e.g., an admin) may not
      really understand the association between the name and the
      attachment point. Second, if one program is attached to multiple
      places, encoding a proper name which can imply all these
      attachments becomes difficult.
      
      This patch introduces a new bpf subcommand BPF_TASK_FD_QUERY.
      Given a pid and fd, if the <pid, fd> is associated with a
      tracepoint/kprobe/uprobe perf event, BPF_TASK_FD_QUERY will return
         . prog_id
         . tracepoint name, or
         . k[ret]probe funcname + offset or kernel addr, or
         . u[ret]probe filename + offset
      to the userspace.
      The user can use "bpftool prog" to find more information about
      bpf program itself with prog_id.
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      41bdc4b4
  11. 24 5月, 2018 7 次提交
    • D
      bpf: properly enforce index mask to prevent out-of-bounds speculation · c93552c4
      Daniel Borkmann 提交于
      While reviewing the verifier code, I recently noticed that the
      following two program variants in relation to tail calls can be
      loaded.
      
      Variant 1:
      
        # bpftool p d x i 15
          0: (15) if r1 == 0x0 goto pc+3
          1: (18) r2 = map[id:5]
          3: (05) goto pc+2
          4: (18) r2 = map[id:6]
          6: (b7) r3 = 7
          7: (35) if r3 >= 0xa0 goto pc+2
          8: (54) (u32) r3 &= (u32) 255
          9: (85) call bpf_tail_call#12
         10: (b7) r0 = 1
         11: (95) exit
      
        # bpftool m s i 5
          5: prog_array  flags 0x0
              key 4B  value 4B  max_entries 4  memlock 4096B
        # bpftool m s i 6
          6: prog_array  flags 0x0
              key 4B  value 4B  max_entries 160  memlock 4096B
      
      Variant 2:
      
        # bpftool p d x i 20
          0: (15) if r1 == 0x0 goto pc+3
          1: (18) r2 = map[id:8]
          3: (05) goto pc+2
          4: (18) r2 = map[id:7]
          6: (b7) r3 = 7
          7: (35) if r3 >= 0x4 goto pc+2
          8: (54) (u32) r3 &= (u32) 3
          9: (85) call bpf_tail_call#12
         10: (b7) r0 = 1
         11: (95) exit
      
        # bpftool m s i 8
          8: prog_array  flags 0x0
              key 4B  value 4B  max_entries 160  memlock 4096B
        # bpftool m s i 7
          7: prog_array  flags 0x0
              key 4B  value 4B  max_entries 4  memlock 4096B
      
      In both cases the index masking inserted by the verifier in order
      to control out of bounds speculation from a CPU via b2157399
      ("bpf: prevent out-of-bounds speculation") seems to be incorrect
      in what it is enforcing. In the 1st variant, the mask is applied
      from the map with the significantly larger number of entries where
      we would allow to a certain degree out of bounds speculation for
      the smaller map, and in the 2nd variant where the mask is applied
      from the map with the smaller number of entries, we get buggy
      behavior since we truncate the index of the larger map.
      
      The original intent from commit b2157399 is to reject such
      occasions where two or more different tail call maps are used
      in the same tail call helper invocation. However, the check on
      the BPF_MAP_PTR_POISON is never hit since we never poisoned the
      saved pointer in the first place! We do this explicitly for map
      lookups but in case of tail calls we basically used the tail
      call map in insn_aux_data that was processed in the most recent
      path which the verifier walked. Thus any prior path that stored
      a pointer in insn_aux_data at the helper location was always
      overridden.
      
      Fix it by moving the map pointer poison logic into a small helper
      that covers both BPF helpers with the same logic. After that in
      fixup_bpf_calls() the poison check is then hit for tail calls
      and the program rejected. Latter only happens in unprivileged
      case since this is the *only* occasion where a rewrite needs to
      happen, and where such rewrite is specific to the map (max_entries,
      index_mask). In the privileged case the rewrite is generic for
      the insn->imm / insn->code update so multiple maps from different
      paths can be handled just fine since all the remaining logic
      happens in the instruction processing itself. This is similar
      to the case of map lookups: in case there is a collision of
      maps in fixup_bpf_calls() we must skip the inlined rewrite since
      this will turn the generic instruction sequence into a non-
      generic one. Thus the patch_call_imm will simply update the
      insn->imm location where the bpf_map_lookup_elem() will later
      take care of the dispatch. Given we need this 'poison' state
      as a check, the information of whether a map is an unpriv_array
      gets lost, so enforcing it prior to that needs an additional
      state. In general this check is needed since there are some
      complex and tail call intensive BPF programs out there where
      LLVM tends to generate such code occasionally. We therefore
      convert the map_ptr rather into map_state to store all this
      w/o extra memory overhead, and the bit whether one of the maps
      involved in the collision was from an unpriv_array thus needs
      to be retained as well there.
      
      Fixes: b2157399 ("bpf: prevent out-of-bounds speculation")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      c93552c4
    • M
      ipv6: sr: Add seg6local action End.BPF · 004d4b27
      Mathieu Xhonneux 提交于
      This patch adds the End.BPF action to the LWT seg6local infrastructure.
      This action works like any other seg6local End action, meaning that an IPv6
      header with SRH is needed, whose DA has to be equal to the SID of the
      action. It will also advance the SRH to the next segment, the BPF program
      does not have to take care of this.
      
      Since the BPF program may not be a source of instability in the kernel, it
      is important to ensure that the integrity of the packet is maintained
      before yielding it back to the IPv6 layer. The hook hence keeps track if
      the SRH has been altered through the helpers, and re-validates its
      content if needed with seg6_validate_srh. The state kept for validation is
      stored in a per-CPU buffer. The BPF program is not allowed to directly
      write into the packet, and only some fields of the SRH can be altered
      through the helper bpf_lwt_seg6_store_bytes.
      
      Performances profiling has shown that the SRH re-validation does not induce
      a significant overhead. If the altered SRH is deemed as invalid, the packet
      is dropped.
      
      This validation is also done before executing any action through
      bpf_lwt_seg6_action, and will not be performed again if the SRH is not
      modified after calling the action.
      
      The BPF program may return 3 types of return codes:
          - BPF_OK: the End.BPF action will look up the next destination through
                   seg6_lookup_nexthop.
          - BPF_REDIRECT: if an action has been executed through the
                bpf_lwt_seg6_action helper, the BPF program should return this
                value, as the skb's destination is already set and the default
                lookup should not be performed.
          - BPF_DROP : the packet will be dropped.
      Signed-off-by: NMathieu Xhonneux <m.xhonneux@gmail.com>
      Acked-by: NDavid Lebrun <dlebrun@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      004d4b27
    • S
      bpf: get JITed image lengths of functions via syscall · 815581c1
      Sandipan Das 提交于
      This adds new two new fields to struct bpf_prog_info. For
      multi-function programs, these fields can be used to pass
      a list of the JITed image lengths of each function for a
      given program to userspace using the bpf system call with
      the BPF_OBJ_GET_INFO_BY_FD command.
      
      This can be used by userspace applications like bpftool
      to split up the contiguous JITed dump, also obtained via
      the system call, into more relatable chunks corresponding
      to each function.
      Signed-off-by: NSandipan Das <sandipan@linux.vnet.ibm.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      815581c1
    • S
      bpf: fix multi-function JITed dump obtained via syscall · 4d56a76e
      Sandipan Das 提交于
      Currently, for multi-function programs, we cannot get the JITed
      instructions using the bpf system call's BPF_OBJ_GET_INFO_BY_FD
      command. Because of this, userspace tools such as bpftool fail
      to identify a multi-function program as being JITed or not.
      
      With the JIT enabled and the test program running, this can be
      verified as follows:
      
        # cat /proc/sys/net/core/bpf_jit_enable
        1
      
      Before applying this patch:
      
        # bpftool prog list
        1: kprobe  name foo  tag b811aab41a39ad3d  gpl
                loaded_at 2018-05-16T11:43:38+0530  uid 0
                xlated 216B  not jited  memlock 65536B
        ...
      
        # bpftool prog dump jited id 1
        no instructions returned
      
      After applying this patch:
      
        # bpftool prog list
        1: kprobe  name foo  tag b811aab41a39ad3d  gpl
                loaded_at 2018-05-16T12:13:01+0530  uid 0
                xlated 216B  jited 308B  memlock 65536B
        ...
      
        # bpftool prog dump jited id 1
           0:   nop
           4:   nop
           8:   mflr    r0
           c:   std     r0,16(r1)
          10:   stdu    r1,-112(r1)
          14:   std     r31,104(r1)
          18:   addi    r31,r1,48
          1c:   li      r3,10
        ...
      Signed-off-by: NSandipan Das <sandipan@linux.vnet.ibm.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      4d56a76e
    • S
      bpf: get kernel symbol addresses via syscall · dbecd738
      Sandipan Das 提交于
      This adds new two new fields to struct bpf_prog_info. For
      multi-function programs, these fields can be used to pass
      a list of kernel symbol addresses for all functions in a
      given program to userspace using the bpf system call with
      the BPF_OBJ_GET_INFO_BY_FD command.
      
      When bpf_jit_kallsyms is enabled, we can get the address
      of the corresponding kernel symbol for a callee function
      and resolve the symbol's name. The address is determined
      by adding the value of the call instruction's imm field
      to __bpf_call_base. This offset gets assigned to the imm
      field by the verifier.
      
      For some architectures, such as powerpc64, the imm field
      is not large enough to hold this offset.
      
      We resolve this by:
      
      [1] Assigning the subprog id to the imm field of a call
          instruction in the verifier instead of the offset of
          the callee's symbol's address from __bpf_call_base.
      
      [2] Determining the address of a callee's corresponding
          symbol by using the imm field as an index for the
          list of kernel symbol addresses now available from
          the program info.
      Suggested-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NSandipan Das <sandipan@linux.vnet.ibm.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      dbecd738
    • S
      bpf: support 64-bit offsets for bpf function calls · 2162fed4
      Sandipan Das 提交于
      The imm field of a bpf instruction is a signed 32-bit integer.
      For JITed bpf-to-bpf function calls, it holds the offset of the
      start address of the callee's JITed image from __bpf_call_base.
      
      For some architectures, such as powerpc64, this offset may be
      as large as 64 bits and cannot be accomodated in the imm field
      without truncation.
      
      We resolve this by:
      
      [1] Additionally using the auxiliary data of each function to
          keep a list of start addresses of the JITed images for all
          functions determined by the verifier.
      
      [2] Retaining the subprog id inside the off field of the call
          instructions and using it to index into the list mentioned
          above and lookup the callee's address.
      
      To make sure that the existing JIT compilers continue to work
      without requiring changes, we keep the imm field as it is.
      Signed-off-by: NSandipan Das <sandipan@linux.vnet.ibm.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      2162fed4
    • M
      bpf: btf: Avoid variable length array · a2889a4c
      Martin KaFai Lau 提交于
      Sparse warning:
      kernel/bpf/btf.c:1985:34: warning: Variable length array is used.
      
      This patch directly uses ARRAY_SIZE().
      
      Fixes: f80442a4 ("bpf: btf: Change how section is supported in btf_header")
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      a2889a4c
  12. 23 5月, 2018 2 次提交
    • M
      bpf: btf: Rename btf_key_id and btf_value_id in bpf_map_info · 9b2cf328
      Martin KaFai Lau 提交于
      In "struct bpf_map_info", the name "btf_id", "btf_key_id" and "btf_value_id"
      could cause confusion because the "id" of "btf_id" means the BPF obj id
      given to the BTF object while
      "btf_key_id" and "btf_value_id" means the BTF type id within
      that BTF object.
      
      To make it clear, btf_key_id and btf_value_id are
      renamed to btf_key_type_id and btf_value_type_id.
      Suggested-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      9b2cf328
    • M
      bpf: btf: Remove unused bits from uapi/linux/btf.h · aea2f7b8
      Martin KaFai Lau 提交于
      This patch does the followings:
      1. Limit BTF_MAX_TYPES and BTF_MAX_NAME_OFFSET to 64k.  We can
         raise it later.
      
      2. Remove the BTF_TYPE_PARENT and BTF_STR_TBL_ELF_ID.  They are
         currently encoded at the highest bit of a u32.
         It is because the current use case does not require supporting
         parent type (i.e type_id referring to a type in another BTF file).
         It also does not support referring to a string in ELF.
      
         The BTF_TYPE_PARENT and BTF_STR_TBL_ELF_ID checks are replaced
         by BTF_TYPE_ID_CHECK and BTF_STR_OFFSET_CHECK which are
         defined in btf.c instead of uapi/linux/btf.h.
      
      3. Limit the BTF_INFO_KIND from 5 bits to 4 bits which is enough.
         There is unused bits headroom if we ever needed it later.
      
      4. The root bit in BTF_INFO is also removed because it is not
         used in the current use case.
      
      5. Remove BTF_INT_VARARGS since func type is not supported now.
         The BTF_INT_ENCODING is limited to 4 bits instead of 8 bits.
      
      The above can be added back later because the verifier
      ensures the unused bits are zeros.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      aea2f7b8