1. 23 1月, 2018 1 次提交
  2. 20 1月, 2018 2 次提交
  3. 19 1月, 2018 2 次提交
  4. 18 1月, 2018 1 次提交
  5. 17 1月, 2018 2 次提交
  6. 15 1月, 2018 8 次提交
  7. 10 1月, 2018 11 次提交
  8. 31 12月, 2017 1 次提交
  9. 28 12月, 2017 1 次提交
  10. 21 12月, 2017 2 次提交
    • J
      nfp: bpf: keep track of the offloaded program · d3f89b98
      Jakub Kicinski 提交于
      After TC offloads were converted to callbacks we have no choice
      but keep track of the offloaded filter in the driver.
      
      The check for nn->dp.bpf_offload_xdp was a stop gap solution
      to make sure failed TC offload won't disable XDP, it's no longer
      necessary.  nfp_net_bpf_offload() will return -EBUSY on
      TC vs XDP conflicts.
      
      Fixes: 3f7889c4 ("net: sched: cls_bpf: call block callbacks for offload")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3f89b98
    • J
      cls_bpf: fix offload assumptions after callback conversion · 102740bd
      Jakub Kicinski 提交于
      cls_bpf used to take care of tracking what offload state a filter
      is in, i.e. it would track if offload request succeeded or not.
      This information would then be used to issue correct requests to
      the driver, e.g. requests for statistics only on offloaded filters,
      removing only filters which were offloaded, using add instead of
      replace if previous filter was not added etc.
      
      This tracking of offload state no longer functions with the new
      callback infrastructure.  There could be multiple entities trying
      to offload the same filter.
      
      Throw out all the tracking and corresponding commands and simply
      pass to the drivers both old and new bpf program.  Drivers will
      have to deal with offload state tracking by themselves.
      
      Fixes: 3f7889c4 ("net: sched: cls_bpf: call block callbacks for offload")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      102740bd
  11. 16 12月, 2017 1 次提交
  12. 15 12月, 2017 4 次提交
  13. 03 12月, 2017 1 次提交
  14. 02 12月, 2017 3 次提交
    • J
      nfp: bpf: detect load/store sequences lowered from memory copy · 6bc7103c
      Jiong Wang 提交于
      This patch add the optimization frontend, but adding a new eBPF IR scan
      pass "nfp_bpf_opt_ldst_gather".
      
      The pass will traverse the IR to recognize the load/store pairs sequences
      that come from lowering of memory copy builtins.
      
      The gathered memory copy information will be kept in the meta info
      structure of the first load instruction in the sequence and will be
      consumed by the optimization backend added in the previous patches.
      
      NOTE: a sequence with cross memory access doesn't qualify this
      optimization, i.e. if one load in the sequence will load from place that
      has been written by previous store. This is because when we turn the
      sequence into single CPP operation, we are reading all contents at once
      into NFP transfer registers, then write them out as a whole. This is not
      identical with what the original load/store sequence is doing.
      
      Detecting cross memory access for two random pointers will be difficult,
      fortunately under XDP/eBPF's restrictied runtime environment, the copy
      normally happen among map, packet data and stack, they do not overlap with
      each other.
      
      And for cases supported by NFP, cross memory access will only happen on
      PTR_TO_PACKET. Fortunately for this, there is ID information that we could
      do accurate memory alias check.
      Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      6bc7103c
    • J
      nfp: bpf: implement memory bulk copy for length bigger than 32-bytes · 8c900538
      Jiong Wang 提交于
      When the gathered copy length is bigger than 32-bytes and within 128-bytes
      (the maximum length a single CPP Pull/Push request can finish), the
      strategy of read/write are changeed into:
      
        * Read.
            - use direct reference mode when length is within 32-bytes.
            - use indirect mode when length is bigger than 32-bytes.
      
        * Write.
            - length <= 8-bytes
              use write8 (direct_ref).
            - length <= 32-byte and 4-bytes aligned
              use write32 (direct_ref).
            - length <= 32-bytes but not 4-bytes aligned
              use write8 (indirect_ref).
            - length > 32-bytes and 4-bytes aligned
              use write32 (indirect_ref).
            - length > 32-bytes and not 4-bytes aligned and <= 40-bytes
              use write32 (direct_ref) to finish the first 32-bytes.
              use write8 (direct_ref) to finish all remaining hanging part.
            - length > 32-bytes and not 4-bytes aligned
              use write32 (indirect_ref) to finish those 4-byte aligned parts.
              use write8 (direct_ref) to finish all remaining hanging part.
      Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      8c900538
    • J
      nfp: bpf: implement memory bulk copy for length within 32-bytes · 9879a381
      Jiong Wang 提交于
      For NFP, we want to re-group a sequence of load/store pairs lowered from
      memcpy/memmove into single memory bulk operation which then could be
      accelerated using NFP CPP bus.
      
      This patch extends the existing load/store auxiliary information by adding
      two new fields:
      
      	struct bpf_insn *paired_st;
      	s16 ldst_gather_len;
      
      Both fields are supposed to be carried by the the load instruction at the
      head of the sequence. "paired_st" is the corresponding store instruction at
      the head and "ldst_gather_len" is the gathered length.
      
      If "ldst_gather_len" is negative, then the sequence is doing memory
      load/store in descending order, otherwise it is in ascending order. We need
      this information to detect overlapped memory access.
      
      This patch then optimize memory bulk copy when the copy length is within
      32-bytes.
      
      The strategy of read/write used is:
      
        * Read.
          Use read32 (direct_ref), always.
      
        * Write.
          - length <= 8-bytes
            write8 (direct_ref).
          - length <= 32-bytes and is 4-byte aligned
            write32 (direct_ref).
          - length <= 32-bytes but is not 4-byte aligned
            write8 (indirect_ref).
      
      NOTE: the optimization should not change program semantics. The destination
      register of the last load instruction should contain the same value before
      and after this optimization.
      Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      9879a381