1. 18 5月, 2018 26 次提交
  2. 17 5月, 2018 14 次提交
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · b9f672af
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2018-05-17
      
      The following pull-request contains BPF updates for your *net-next* tree.
      
      The main changes are:
      
      1) Provide a new BPF helper for doing a FIB and neighbor lookup
         in the kernel tables from an XDP or tc BPF program. The helper
         provides a fast-path for forwarding packets. The API supports
         IPv4, IPv6 and MPLS protocols, but currently IPv4 and IPv6 are
         implemented in this initial work, from David (Ahern).
      
      2) Just a tiny diff but huge feature enabled for nfp driver by
         extending the BPF offload beyond a pure host processing offload.
         Offloaded XDP programs are allowed to set the RX queue index and
         thus opening the door for defining a fully programmable RSS/n-tuple
         filter replacement. Once BPF decided on a queue already, the device
         data-path will skip the conventional RSS processing completely,
         from Jakub.
      
      3) The original sockmap implementation was array based similar to
         devmap. However unlike devmap where an ifindex has a 1:1 mapping
         into the map there are use cases with sockets that need to be
         referenced using longer keys. Hence, sockhash map is added reusing
         as much of the sockmap code as possible, from John.
      
      4) Introduce BTF ID. The ID is allocatd through an IDR similar as
         with BPF maps and progs. It also makes BTF accessible to user
         space via BPF_BTF_GET_FD_BY_ID and adds exposure of the BTF data
         through BPF_OBJ_GET_INFO_BY_FD, from Martin.
      
      5) Enable BPF stackmap with build_id also in NMI context. Due to the
         up_read() of current->mm->mmap_sem build_id cannot be parsed.
         This work defers the up_read() via a per-cpu irq_work so that
         at least limited support can be enabled, from Song.
      
      6) Various BPF JIT follow-up cleanups and fixups after the LD_ABS/LD_IND
         JIT conversion as well as implementation of an optimized 32/64 bit
         immediate load in the arm64 JIT that allows to reduce the number of
         emitted instructions; in case of tested real-world programs they
         were shrinking by three percent, from Daniel.
      
      7) Add ifindex parameter to the libbpf loader in order to enable
         BPF offload support. Right now only iproute2 can load offloaded
         BPF and this will also enable libbpf for direct integration into
         other applications, from David (Beckett).
      
      8) Convert the plain text documentation under Documentation/bpf/ into
         RST format since this is the appropriate standard the kernel is
         moving to for all documentation. Also add an overview README.rst,
         from Jesper.
      
      9) Add __printf verification attribute to the bpf_verifier_vlog()
         helper. Though it uses va_list we can still allow gcc to check
         the format string, from Mathieu.
      
      10) Fix a bash reference in the BPF selftest's Makefile. The '|& ...'
          is a bash 4.0+ feature which is not guaranteed to be available
          when calling out to shell, therefore use a more portable variant,
          from Joe.
      
      11) Fix a 64 bit division in xdp_umem_reg() by using div_u64()
          instead of relying on the gcc built-in, from Björn.
      
      12) Fix a sock hashmap kmalloc warning reported by syzbot when an
          overly large key size is used in hashmap then causing overflows
          in htab->elem_size. Reject bogus attr->key_size early in the
          sock_hash_alloc(), from Yonghong.
      
      13) Ensure in BPF selftests when urandom_read is being linked that
          --build-id is always enabled so that test_stacktrace_build_id[_nmi]
          won't be failing, from Alexei.
      
      14) Add bitsperlong.h as well as errno.h uapi headers into the tools
          header infrastructure which point to one of the arch specific
          uapi headers. This was needed in order to fix a build error on
          some systems for the BPF selftests, from Sirio.
      
      15) Allow for short options to be used in the xdp_monitor BPF sample
          code. And also a bpf.h tools uapi header sync in order to fix a
          selftest build failure. Both from Prashant.
      
      16) More formally clarify the meaning of ID in the direct packet access
          section of the BPF documentation, from Wang.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b9f672af
    • J
      bpf: sockmap, on update propagate errors back to userspace · e23afe5e
      John Fastabend 提交于
      When an error happens in the update sockmap element logic also pass
      the err up to the user.
      
      Fixes: e5cd3abc ("bpf: sockmap, refactor sockmap routines to work with hashmap")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      e23afe5e
    • Y
      bpf: fix sock hashmap kmalloc warning · 683d2ac3
      Yonghong Song 提交于
      syzbot reported a kernel warning below:
        WARNING: CPU: 0 PID: 4499 at mm/slab_common.c:996 kmalloc_slab+0x56/0x70 mm/slab_common.c:996
        Kernel panic - not syncing: panic_on_warn set ...
      
        CPU: 0 PID: 4499 Comm: syz-executor050 Not tainted 4.17.0-rc3+ #9
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        Call Trace:
         __dump_stack lib/dump_stack.c:77 [inline]
         dump_stack+0x1b9/0x294 lib/dump_stack.c:113
         panic+0x22f/0x4de kernel/panic.c:184
         __warn.cold.8+0x163/0x1b3 kernel/panic.c:536
         report_bug+0x252/0x2d0 lib/bug.c:186
         fixup_bug arch/x86/kernel/traps.c:178 [inline]
         do_error_trap+0x1de/0x490 arch/x86/kernel/traps.c:296
         do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
         invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
        RIP: 0010:kmalloc_slab+0x56/0x70 mm/slab_common.c:996
        RSP: 0018:ffff8801d907fc58 EFLAGS: 00010246
        RAX: 0000000000000000 RBX: ffff8801aeecb280 RCX: ffffffff8185ebd7
        RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000ffffffe1
        RBP: ffff8801d907fc58 R08: ffff8801adb5e1c0 R09: ffffed0035a84700
        R10: ffffed0035a84700 R11: ffff8801ad423803 R12: ffff8801aeecb280
        R13: 00000000fffffff4 R14: ffff8801ad891a00 R15: 00000000014200c0
         __do_kmalloc mm/slab.c:3713 [inline]
         __kmalloc+0x25/0x760 mm/slab.c:3727
         kmalloc include/linux/slab.h:517 [inline]
         map_get_next_key+0x24a/0x640 kernel/bpf/syscall.c:858
         __do_sys_bpf kernel/bpf/syscall.c:2131 [inline]
         __se_sys_bpf kernel/bpf/syscall.c:2096 [inline]
         __x64_sys_bpf+0x354/0x4f0 kernel/bpf/syscall.c:2096
         do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The test case is against sock hashmap with a key size 0xffffffe1.
      Such a large key size will cause the below code in function
      sock_hash_alloc() overflowing and produces a smaller elem_size,
      hence map creation will be successful.
          htab->elem_size = sizeof(struct htab_elem) +
                            round_up(htab->map.key_size, 8);
      
      Later, when map_get_next_key is called and kernel tries
      to allocate the key unsuccessfully, it will issue
      the above warning.
      
      Similar to hashtab, ensure the key size is at most
      MAX_BPF_STACK for a successful map creation.
      
      Fixes: 81110384 ("bpf: sockmap, add hash map support")
      Reported-by: syzbot+e4566d29080e7f3460ff@syzkaller.appspotmail.com
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      683d2ac3
    • D
      libbpf: add ifindex to enable offload support · f0307a7e
      David Beckett 提交于
      BPF programs currently can only be offloaded using iproute2. This
      patch will allow programs to be offloaded using libbpf calls.
      Signed-off-by: NDavid Beckett <david.beckett@netronome.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      f0307a7e
    • M
      bpf: add __printf verification to bpf_verifier_vlog · be2d04d1
      Mathieu Malaterre 提交于
      __printf is useful to verify format and arguments. ‘bpf_verifier_vlog’
      function is used twice in verifier.c in both cases the caller function
      already uses the __printf gcc attribute.
      
      Remove the following warning, triggered with W=1:
      
        kernel/bpf/verifier.c:176:2: warning: function might be possible candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format]
      Signed-off-by: NMathieu Malaterre <malat@debian.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      be2d04d1
    • D
      samples/bpf: Decrement ttl in fib forwarding example · 44edef77
      David Ahern 提交于
      Only consider forwarding packets if ttl in received packet is > 1 and
      decrement ttl before handing off to bpf_redirect_map.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      44edef77
    • D
      Merge branch 'bpf-sock-hashmap' · 5b26ace6
      Daniel Borkmann 提交于
      John Fastabend says:
      
      ====================
      In the original sockmap implementation we got away with using an
      array similar to devmap. However, unlike devmap where an ifindex
      has a nice 1:1 function into the map we have found some use cases
      with sockets that need to be referenced using longer keys.
      
      This series adds support for a sockhash map reusing as much of
      the sockmap code as possible. I made the decision to add sockhash
      specific helpers vs trying to generalize the existing helpers
      because (a) they have sockmap in the name and (b) the keys are
      different types. I prefer to be explicit here rather than play
      type games or do something else tricky.
      
      To test this we duplicate all the sockmap testing except swap out
      the sockmap with a sockhash.
      
      v2: fix file stats and add v2 tag
      v3: move tool updates into test patch, move bpftool updates into
          its own patch, and fixup the test patch stats to catch the
          renamed file and provide only diffs ± on that.
      v4: Add documentation to UAPI bpf.h
      v5: Add documentation to tools UAPI bpf.h
      v6: 'git add' test_sockhash_kern.c which was previously missing
          but was not causing issues because of typo in test script,
          noticed by Daniel. After this the git format-patch -M option
          no longer tracks the rename of the test_sockmap_kern files for
          some reason. I guess the diff has exceeded some threshold.
      ====================
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      5b26ace6
    • J
      bpf: bpftool, support for sockhash · 62c52d1f
      John Fastabend 提交于
      This adds the SOCKHASH map type to bpftools so that we get correct
      pretty printing.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      62c52d1f
    • J
      bpf: selftest additions for SOCKHASH · b8b394fa
      John Fastabend 提交于
      This runs existing SOCKMAP tests with SOCKHASH map type. To do this
      we push programs into include file and build two BPF programs. One
      for SOCKHASH and one for SOCKMAP.
      
      We then run the entire test suite with each type.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      b8b394fa
    • R
      cxgb4: update LE-TCAM collection for T6 · 8e725f7c
      Rahul Lakkireddy 提交于
      For T6, clip table is separated from main TCAM. So, update LE-TCAM
      collection logic to collect clip table TCAM as well. IPv6 takes
      4 entries in clip table TCAM compared to 2 entries in main TCAM.
      
      Also, in case of errors, keep LE-TCAM collected so far and set the
      status to partial dump.
      Signed-off-by: NRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8e725f7c
    • D
      Merge branch 'qed-LL2-fixes' · 7e360d9d
      David S. Miller 提交于
      Michal Kalderon says:
      
      ====================
      qed: LL2 fixes
      
      This series fixes some issues in ll2 related to synchronization
      and resource freeing
      ====================
      Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com>
      Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
      7e360d9d
    • M
      qed: Fix LL2 race during connection terminate · fc16f56b
      Michal Kalderon 提交于
      Stress on qedi/qedr load unload lead to list_del corruption.
      This is due to ll2 connection terminate freeing resources without
      verifying that no more ll2 processing will occur.
      
      This patch unregisters the ll2 status block before terminating
      the connection to assure this race does not occur.
      
      Fixes: 1d6cff4f ("qed: Add iSCSI out of order packet handling")
      Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com>
      Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc16f56b
    • M
      qed: Fix possibility of list corruption during rmmod flows · 6291c608
      Michal Kalderon 提交于
      The ll2 flows of flushing the txq/rxq need to be synchronized with the
      regular fp processing. Caused list corruption during load/unload stress
      tests.
      
      Fixes: 0a7fb11c ("qed: Add Light L2 support")
      Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com>
      Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6291c608
    • M
      qed: LL2 flush isles when connection is closed · 974f6c04
      Michal Kalderon 提交于
      Driver should free all pending isles once it gets a FLUSH cqe from FW.
      Part of iSCSI out of order flow.
      
      Fixes: 1d6cff4f ("qed: Add iSCSI out of order packet handling")
      Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com>
      Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      974f6c04