1. 03 12月, 2017 3 次提交
  2. 02 12月, 2017 14 次提交
    • D
      Merge branch 'bpf-nfp-jmp-memcpy-improvements' · 44851665
      Daniel Borkmann 提交于
      Jiong Wang says:
      
      ====================
      Currently, compiler will lower memcpy function call in XDP/eBPF C program
      into a sequence of eBPF load/store pairs for some scenarios.
      
      Compiler is thinking this "inline" optimiation is beneficial as it could
      avoid function call and also increase code locality.
      
      However, Netronome NPU is not an tranditional load/store architecture that
      doing a sequence of individual load/store actions are not efficient.
      
      This patch set tries to identify the load/store sequences composed of
      load/store pairs that comes from memcpy lowering, then accelerates them
      through NPU's Command Push Pull (CPP) instruction.
      
      This patch set registered an new optimization pass before doing the actual
      JIT work, it traverse through eBPF IR, once found candidate sequence then
      record the memory copy source, destination and length information in the
      first load instruction starting the sequence and marks all remaining
      instructions in the sequence into skipable status. Later, when JITing the
      first load instructoin, optimal instructions will be generated using those
      record information.
      
      For this safety of this transformation:
      
        - jump into the middle of the sequence will cancel the optimization.
      
        - overlapped memory access will cancel the optimization.
      
        - the load destination register still contains the same value as before
          the transformation.
      ====================
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      44851665
    • J
      nfp: bpf: detect load/store sequences lowered from memory copy · 6bc7103c
      Jiong Wang 提交于
      This patch add the optimization frontend, but adding a new eBPF IR scan
      pass "nfp_bpf_opt_ldst_gather".
      
      The pass will traverse the IR to recognize the load/store pairs sequences
      that come from lowering of memory copy builtins.
      
      The gathered memory copy information will be kept in the meta info
      structure of the first load instruction in the sequence and will be
      consumed by the optimization backend added in the previous patches.
      
      NOTE: a sequence with cross memory access doesn't qualify this
      optimization, i.e. if one load in the sequence will load from place that
      has been written by previous store. This is because when we turn the
      sequence into single CPP operation, we are reading all contents at once
      into NFP transfer registers, then write them out as a whole. This is not
      identical with what the original load/store sequence is doing.
      
      Detecting cross memory access for two random pointers will be difficult,
      fortunately under XDP/eBPF's restrictied runtime environment, the copy
      normally happen among map, packet data and stack, they do not overlap with
      each other.
      
      And for cases supported by NFP, cross memory access will only happen on
      PTR_TO_PACKET. Fortunately for this, there is ID information that we could
      do accurate memory alias check.
      Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      6bc7103c
    • J
      nfp: bpf: implement memory bulk copy for length bigger than 32-bytes · 8c900538
      Jiong Wang 提交于
      When the gathered copy length is bigger than 32-bytes and within 128-bytes
      (the maximum length a single CPP Pull/Push request can finish), the
      strategy of read/write are changeed into:
      
        * Read.
            - use direct reference mode when length is within 32-bytes.
            - use indirect mode when length is bigger than 32-bytes.
      
        * Write.
            - length <= 8-bytes
              use write8 (direct_ref).
            - length <= 32-byte and 4-bytes aligned
              use write32 (direct_ref).
            - length <= 32-bytes but not 4-bytes aligned
              use write8 (indirect_ref).
            - length > 32-bytes and 4-bytes aligned
              use write32 (indirect_ref).
            - length > 32-bytes and not 4-bytes aligned and <= 40-bytes
              use write32 (direct_ref) to finish the first 32-bytes.
              use write8 (direct_ref) to finish all remaining hanging part.
            - length > 32-bytes and not 4-bytes aligned
              use write32 (indirect_ref) to finish those 4-byte aligned parts.
              use write8 (direct_ref) to finish all remaining hanging part.
      Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      8c900538
    • J
      nfp: bpf: implement memory bulk copy for length within 32-bytes · 9879a381
      Jiong Wang 提交于
      For NFP, we want to re-group a sequence of load/store pairs lowered from
      memcpy/memmove into single memory bulk operation which then could be
      accelerated using NFP CPP bus.
      
      This patch extends the existing load/store auxiliary information by adding
      two new fields:
      
      	struct bpf_insn *paired_st;
      	s16 ldst_gather_len;
      
      Both fields are supposed to be carried by the the load instruction at the
      head of the sequence. "paired_st" is the corresponding store instruction at
      the head and "ldst_gather_len" is the gathered length.
      
      If "ldst_gather_len" is negative, then the sequence is doing memory
      load/store in descending order, otherwise it is in ascending order. We need
      this information to detect overlapped memory access.
      
      This patch then optimize memory bulk copy when the copy length is within
      32-bytes.
      
      The strategy of read/write used is:
      
        * Read.
          Use read32 (direct_ref), always.
      
        * Write.
          - length <= 8-bytes
            write8 (direct_ref).
          - length <= 32-bytes and is 4-byte aligned
            write32 (direct_ref).
          - length <= 32-bytes but is not 4-byte aligned
            write8 (indirect_ref).
      
      NOTE: the optimization should not change program semantics. The destination
      register of the last load instruction should contain the same value before
      and after this optimization.
      Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      9879a381
    • J
      nfp: bpf: factor out is_mbpf_load & is_mbpf_store · 5e4d6d20
      Jiong Wang 提交于
      It is usual that we need to check if one BPF insn is for loading/storeing
      data from/to memory.
      
      Therefore, it makes sense to factor out related code to become common
      helper functions.
      Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      5e4d6d20
    • J
      nfp: bpf: encode indirect commands · 5468a8b9
      Jakub Kicinski 提交于
      Add support for emitting commands with field overwrites.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      5468a8b9
    • J
      nfp: bpf: correct the encoding for No-Dest immed · 3239e7bb
      Jiong Wang 提交于
      When immed is used with No-Dest, the emitter should use reg.dst instead of
      reg.areg for the destination, using the latter will actually encode
      register zero.
      Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      3239e7bb
    • J
      nfp: bpf: relax source operands check · 08859f15
      Jiong Wang 提交于
      The NFP normally requires the source operands to be difference addressing
      modes, but we should rule out the very special NN_REG_NONE type.
      
      There are instruction that ignores both A/B operands, for example:
      
        local_csr_rd
      
      For these instructions, we might pass the same operand type, NN_REG_NONE,
      for both A/B operands.
      
      NOTE: in current NFP ISA, it is only possible for instructions with
      unrestricted operands to take none operands, but in case there is new and
      similar instructoin in restricted form, they would follow similar rules,
      so swreg_to_restricted is updated as well.
      Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      08859f15
    • J
      nfp: bpf: don't do ld/shifts combination if shifts are jump destination · 29fe46ef
      Jiong Wang 提交于
      If any of the shift insns in the ld/shift sequence is jump destination,
      don't do combination.
      Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      29fe46ef
    • J
      nfp: bpf: don't do ld/mask combination if mask is jump destination · 1266f5d6
      Jiong Wang 提交于
      If the mask insn in the ld/mask pair is jump destination, then don't do
      combination.
      Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      1266f5d6
    • J
      nfp: bpf: flag jump destination to guide insn combine optimizations · a09d5c52
      Jiong Wang 提交于
      NFP eBPF offload JIT engine is doing some instruction combine based
      optimizations which however must not be safe if the combined sequences
      are across basic block boarders.
      
      Currently, there are post checks during fixing jump destinations. If the
      jump destination is found to be eBPF insn that has been combined into
      another one, then JIT engine will raise error and abort.
      
      This is not optimal. The JIT engine ought to disable the optimization on
      such cross-bb-border sequences instead of abort.
      
      As there is no control flow information in eBPF infrastructure that we
      can't do basic block based optimizations, this patch extends the existing
      jump destination record pass to also flag the jump destination, then in
      instruction combine passes we could skip the optimizations if insns in the
      sequence are jump targets.
      Suggested-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      a09d5c52
    • J
      nfp: bpf: record jump destination to simplify jump fixup · 5b674140
      Jiong Wang 提交于
      eBPF insns are internally organized as dual-list inside NFP offload JIT.
      Random access to an insn needs to be done by either forward or backward
      traversal along the list.
      
      One place we need to do such traversal is at nfp_fixup_branches where one
      traversal is needed for each jump insn to find the destination. Such
      traversals could be avoided if jump destinations are collected through a
      single travesal in a pre-scan pass, and such information could also be
      useful in other places where jump destination info are needed.
      
      This patch adds such jump destination collection in nfp_prog_prepare.
      Suggested-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      5b674140
    • J
      nfp: bpf: support backward jump · 854dc87d
      Jiong Wang 提交于
      This patch adds support for backward jump on NFP.
      
        - restrictions on backward jump in various functions have been removed.
        - nfp_fixup_branches now supports backward jump.
      
      There is one thing to note, currently an input eBPF JMP insn may generate
      several NFP insns, for example,
      
        NFP imm move insn A \
        NFP compare insn  B  --> 3 NFP insn jited from eBPF JMP insn M
        NFP branch insn   C /
        ---
        NFP insn X           --> 1 NFP insn jited from eBPF insn N
        ---
        ...
      
      therefore, we are doing sanity check to make sure the last jited insn from
      an eBPF JMP is a NFP branch instruction.
      
      Once backward jump is allowed, it is possible an eBPF JMP insn is at the
      end of the program. This is however causing trouble for the sanity check.
      Because the sanity check requires the end index of the NFP insns jited from
      one eBPF insn while only the start index is recorded before this patch that
      we can only get the end index by:
      
        start_index_of_the_next_eBPF_insn - 1
      
      or for the above example:
      
        start_index_of_eBPF_insn_N (which is the index of NFP insn X) - 1
      
      nfp_fixup_branches was using nfp_for_each_insn_walk2 to expose *next* insn
      to each iteration during the traversal so the last index could be
      calculated from which. Now, it needs some extra code to handle the last
      insn. Meanwhile, the use of walk2 is actually unnecessary, we could simply
      use generic single instruction walk to do this, the next insn could be
      easily calculated using list_next_entry.
      
      So, this patch migrates the jump fixup traversal method to
      *list_for_each_entry*, this simplifies the code logic a little bit.
      
      The other thing to note is a new state variable "last_bpf_off" is
      introduced to track the index of the last jited NFP insn. This is necessary
      because NFP is generating special purposes epilogue sequences, so the index
      of the last jited NFP insn is *not* always nfp_prog->prog_len - 1.
      Suggested-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      854dc87d
    • J
      nfp: fix old kdoc issues · a646c9b2
      Jakub Kicinski 提交于
      Since commit 3a025e1d ("Add optional check for bad kernel-doc
      comments") when built with W=1 build will complain about kdoc errors.
      Fix the kdoc issues we have.  kdoc is still confused by defines in
      nfp_net_ctrl.h but those are not really errors.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      a646c9b2
  3. 01 12月, 2017 10 次提交
  4. 30 11月, 2017 13 次提交
    • P
      net/reuseport: drop legacy code · e94a62f5
      Paolo Abeni 提交于
      Since commit e32ea7e7 ("soreuseport: fast reuseport UDP socket
      selection") and commit c125e80b ("soreuseport: fast reuseport
      TCP socket selection") the relevant reuseport socket matching the current
      packet is selected by the reuseport_select_sock() call. The only
      exceptions are invalid BPF filters/filters returning out-of-range
      indices.
      In the latter case the code implicitly falls back to using the hash
      demultiplexing, but instead of selecting the socket inside the
      reuseport_select_sock() function, it relies on the hash selection
      logic introduced with the early soreuseport implementation.
      
      With this patch, in case of a BPF filter returning a bad socket
      index value, we fall back to hash-based selection inside the
      reuseport_select_sock() body, so that we can drop some duplicate
      code in the ipv4 and ipv6 stack.
      
      This also allows faster lookup in the above scenario and will allow
      us to avoid computing the hash value for successful, BPF based
      demultiplexing - in a later patch.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NCraig Gallek <kraig@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e94a62f5
    • L
      Documentation: net: dsa: Cut set_addr() documentation · 0fc66ddf
      Linus Walleij 提交于
      This is not supported anymore, devices needing a MAC address
      just assign one at random, it's just a driver pecularity.
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0fc66ddf
    • D
      Merge branch 'net-dst_entry-shrink' · 3d8068c5
      David S. Miller 提交于
      David Miller says:
      
      ====================
      net: Significantly shrink the size of routes.
      
      Through a combination of several things, our route structures are
      larger than they need to be.
      
      Mostly this stems from having members in dst_entry which are only used
      by one class of routes.  So the majority of the work in this series is
      about "un-commoning" these members and pushing them into the type
      specific structures.
      
      Unfortunately, IPSEC needed the most surgery.  The majority of the
      changes here had to do with bundle creation and management.
      
      The other issue is the refcount alignment in dst_entry.  Once we get
      rid of the not-so-common members, it really opens the door to removing
      that alignment entirely.
      
      I think the new layout looks really nice, so I'll reproduce it here:
      
      	struct net_device       *dev;
      	struct  dst_ops	        *ops;
      	unsigned long		_metrics;
      	unsigned long           expires;
      	struct xfrm_state	*xfrm;
      	int			(*input)(struct sk_buff *);
      	int			(*output)(struct net *net, struct sock *sk, struct sk_buff *skb);
      	unsigned short		flags;
      	short			obsolete;
      	unsigned short		header_len;
      	unsigned short		trailer_len;
      	atomic_t		__refcnt;
      	int			__use;
      	unsigned long		lastuse;
      	struct lwtunnel_state   *lwtstate;
      	struct rcu_head		rcu_head;
      	short			error;
      	short			__pad;
      	__u32			tclassid;
      
      (This is for 64-bit, on 32-bit the __refcnt comes at the very end)
      
      So, the good news:
      
      1) struct dst_entry shrinks from 160 to 112 bytes.
      
      2) struct rtable shrinks from 216 to 168 bytes.
      
      3) struct rt6_info shrinks from 384 to 320 bytes.
      
      Enjoy.
      
      v2:
      	Collapse some patches logically based upon feedback.
      	Fix the strange patch #7.
      
      v3:	xfrm_dst_path() needs inline keyword
      	Properly align __refcnt on 32-bit.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3d8068c5
    • D
      net: Remove dst->next · 7149f813
      David Miller 提交于
      There are no more users.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      7149f813
    • D
      xfrm: Stop using dst->next in bundle construction. · 5492093d
      David Miller 提交于
      While building ipsec bundles, blocks of xfrm dsts are linked together
      using dst->next from bottom to the top.
      
      The only thing this is used for is initializing the pmtu values of the
      xfrm stack, and for updating the mtu values at xfrm_bundle_ok() time.
      
      The bundle pmtu entries must be processed in this order so that pmtu
      values lower in the stack of routes can propagate up to the higher
      ones.
      
      Avoid using dst->next by simply maintaining an array of dst pointers
      as we already do for the xfrm_state objects when building the bundle.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      5492093d
    • D
      net: Rearrange dst_entry layout to avoid useless padding. · 8b207e73
      David Miller 提交于
      We have padding to try and align the refcount on a separate cache
      line.  But after several simplifications the padding has increased
      substantially.
      
      So now it's easy to change the layout to get rid of the padding
      entirely.
      
      We group the write-heavy __refcnt and __use with less often used
      items such as the rcu_head and the error code.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      8b207e73
    • D
      xfrm: Move dst->path into struct xfrm_dst · 0f6c480f
      David Miller 提交于
      The first member of an IPSEC route bundle chain sets it's dst->path to
      the underlying ipv4/ipv6 route that carries the bundle.
      
      Stated another way, if one were to follow the xfrm_dst->child chain of
      the bundle, the final non-NULL pointer would be the path and point to
      either an ipv4 or an ipv6 route.
      
      This is largely used to make sure that PMTU events propagate down to
      the correct ipv4 or ipv6 route.
      
      When we don't have the top of an IPSEC bundle 'dst->path == dst'.
      
      Move it down into xfrm_dst and key off of dst->xfrm.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      0f6c480f
    • D
      ipv6: Move dst->from into struct rt6_info. · 3a2232e9
      David Miller 提交于
      The dst->from value is only used by ipv6 routes to track where
      a route "came from".
      
      Any time we clone or copy a core ipv6 route in the ipv6 routing
      tables, we have the copy/clone's ->from point to the base route.
      
      This is used to handle route expiration properly.
      
      Only ipv6 uses this mechanism, and only ipv6 code references
      it.  So it is safe to move it into rt6_info.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      3a2232e9
    • D
      xfrm: Move child route linkage into xfrm_dst. · b6ca8bd5
      David Miller 提交于
      XFRM bundle child chains look like this:
      
      	xdst1 --> xdst2 --> xdst3 --> path_dst
      
      All of xdstN are xfrm_dst objects and xdst->u.dst.xfrm is non-NULL.
      The final child pointer in the chain, here called 'path_dst', is some
      other kind of route such as an ipv4 or ipv6 one.
      
      The xfrm output path pops routes, one at a time, via the child
      pointer, until we hit one which has a dst->xfrm pointer which
      is NULL.
      
      We can easily preserve the above mechanisms with child sitting
      only in the xfrm_dst structure.  All children in the chain
      before we break out of the xfrm_output() loop have dst->xfrm
      non-NULL and are therefore xfrm_dst objects.
      
      Since we break out of the loop when we find dst->xfrm NULL, we
      will not try to dereference 'dst' as if it were an xfrm_dst.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6ca8bd5
    • D
      ipsec: Create and use new helpers for dst child access. · 45b018be
      David Miller 提交于
      This will make a future change moving the dst->child pointer less
      invasive.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      45b018be
    • D
      net: Create and use new helper xfrm_dst_child(). · b92cf4aa
      David Miller 提交于
      Only IPSEC routes have a non-NULL dst->child pointer.  And IPSEC
      routes are identified by a non-NULL dst->xfrm pointer.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b92cf4aa
    • D
    • D
      fe736e77