1. 09 12月, 2016 1 次提交
  2. 06 12月, 2016 1 次提交
  3. 03 12月, 2016 2 次提交
  4. 02 12月, 2016 1 次提交
    • T
      bpf: BPF for lightweight tunnel infrastructure · 3a0af8fd
      Thomas Graf 提交于
      Registers new BPF program types which correspond to the LWT hooks:
        - BPF_PROG_TYPE_LWT_IN   => dst_input()
        - BPF_PROG_TYPE_LWT_OUT  => dst_output()
        - BPF_PROG_TYPE_LWT_XMIT => lwtunnel_xmit()
      
      The separate program types are required to differentiate between the
      capabilities each LWT hook allows:
      
       * Programs attached to dst_input() or dst_output() are restricted and
         may only read the data of an skb. This prevent modification and
         possible invalidation of already validated packet headers on receive
         and the construction of illegal headers while the IP headers are
         still being assembled.
      
       * Programs attached to lwtunnel_xmit() are allowed to modify packet
         content as well as prepending an L2 header via a newly introduced
         helper bpf_skb_change_head(). This is safe as lwtunnel_xmit() is
         invoked after the IP header has been assembled completely.
      
      All BPF programs receive an skb with L3 headers attached and may return
      one of the following error codes:
      
       BPF_OK - Continue routing as per nexthop
       BPF_DROP - Drop skb and return EPERM
       BPF_REDIRECT - Redirect skb to device as per redirect() helper.
                      (Only valid in lwtunnel_xmit() context)
      
      The return codes are binary compatible with their TC_ACT_
      relatives to ease compatibility.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a0af8fd
  5. 28 11月, 2016 1 次提交
  6. 26 11月, 2016 2 次提交
  7. 13 11月, 2016 1 次提交
    • M
      bpf: Fix bpf_redirect to an ipip/ip6tnl dev · 4e3264d2
      Martin KaFai Lau 提交于
      If the bpf program calls bpf_redirect(dev, 0) and dev is
      an ipip/ip6tnl, it currently includes the mac header.
      e.g. If dev is ipip, the end result is IP-EthHdr-IP instead
      of IP-IP.
      
      The fix is to pull the mac header.  At ingress, skb_postpull_rcsum()
      is not needed because the ethhdr should have been pulled once already
      and then got pushed back just before calling the bpf_prog.
      At egress, this patch calls skb_postpull_rcsum().
      
      If bpf_redirect(dev, BPF_F_INGRESS) is called,
      it also fails now because it calls dev_forward_skb() which
      eventually calls eth_type_trans(skb, dev).  The eth_type_trans()
      will set skb->type = PACKET_OTHERHOST because the mac address
      does not match the redirecting dev->dev_addr.  The PACKET_OTHERHOST
      will eventually cause the ip_rcv() errors out.  To fix this,
      ____dev_forward_skb() is added.
      
      Joint work with Daniel Borkmann.
      
      Fixes: cfc7381b ("ip_tunnel: add collect_md mode to IPIP tunnel")
      Fixes: 8d79266b ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@fb.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e3264d2
  8. 23 10月, 2016 1 次提交
  9. 23 9月, 2016 3 次提交
  10. 21 9月, 2016 1 次提交
    • D
      bpf: direct packet write and access for helpers for clsact progs · 36bbef52
      Daniel Borkmann 提交于
      This work implements direct packet access for helpers and direct packet
      write in a similar fashion as already available for XDP types via commits
      4acf6c0b ("bpf: enable direct packet data write for xdp progs") and
      6841de8b ("bpf: allow helpers access the packet directly"), and as a
      complementary feature to the already available direct packet read for tc
      (cls/act) programs.
      
      For enabling this, we need to introduce two helpers, bpf_skb_pull_data()
      and bpf_csum_update(). The first is generally needed for both, read and
      write, because they would otherwise only be limited to the current linear
      skb head. Usually, when the data_end test fails, programs just bail out,
      or, in the direct read case, use bpf_skb_load_bytes() as an alternative
      to overcome this limitation. If such data sits in non-linear parts, we
      can just pull them in once with the new helper, retest and eventually
      access them.
      
      At the same time, this also makes sure the skb is uncloned, which is, of
      course, a necessary condition for direct write. As this needs to be an
      invariant for the write part only, the verifier detects writes and adds
      a prologue that is calling bpf_skb_pull_data() to effectively unclone the
      skb from the very beginning in case it is indeed cloned. The heuristic
      makes use of a similar trick that was done in 233577a2 ("net: filter:
      constify detection of pkt_type_offset"). This comes at zero cost for other
      programs that do not use the direct write feature. Should a program use
      this feature only sparsely and has read access for the most parts with,
      for example, drop return codes, then such write action can be delegated
      to a tail called program for mitigating this cost of potential uncloning
      to a late point in time where it would have been paid similarly with the
      bpf_skb_store_bytes() as well. Advantage of direct write is that the
      writes are inlined whereas the helper cannot make any length assumptions
      and thus needs to generate a call to memcpy() also for small sizes, as well
      as cost of helper call itself with sanity checks are avoided. Plus, when
      direct read is already used, we don't need to cache or perform rechecks
      on the data boundaries (due to verifier invalidating previous checks for
      helpers that change skb->data), so more complex programs using rewrites
      can benefit from switching to direct read plus write.
      
      For direct packet access to helpers, we save the otherwise needed copy into
      a temp struct sitting on stack memory when use-case allows. Both facilities
      are enabled via may_access_direct_pkt_data() in verifier. For now, we limit
      this to map helpers and csum_diff, and can successively enable other helpers
      where we find it makes sense. Helpers that definitely cannot be allowed for
      this are those part of bpf_helper_changes_skb_data() since they can change
      underlying data, and those that write into memory as this could happen for
      packet typed args when still cloned. bpf_csum_update() helper accommodates
      for the fact that we need to fixup checksum_complete when using direct write
      instead of bpf_skb_store_bytes(), meaning the programs can use available
      helpers like bpf_csum_diff(), and implement csum_add(), csum_sub(),
      csum_block_add(), csum_block_sub() equivalents in eBPF together with the
      new helper. A usage example will be provided for iproute2's examples/bpf/
      directory.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      36bbef52
  11. 10 9月, 2016 4 次提交
    • D
      bpf: add BPF_CALL_x macros for declaring helpers · f3694e00
      Daniel Borkmann 提交于
      This work adds BPF_CALL_<n>() macros and converts all the eBPF helper functions
      to use them, in a similar fashion like we do with SYSCALL_DEFINE<n>() macros
      that are used today. Motivation for this is to hide all the register handling
      and all necessary casts from the user, so that it is done automatically in the
      background when adding a BPF_CALL_<n>() call.
      
      This makes current helpers easier to review, eases to write future helpers,
      avoids getting the casting mess wrong, and allows for extending all helpers at
      once (f.e. build time checks, etc). It also helps detecting more easily in
      code reviews that unused registers are not instrumented in the code by accident,
      breaking compatibility with existing programs.
      
      BPF_CALL_<n>() internals are quite similar to SYSCALL_DEFINE<n>() ones with some
      fundamental differences, for example, for generating the actual helper function
      that carries all u64 regs, we need to fill unused regs, so that we always end up
      with 5 u64 regs as an argument.
      
      I reviewed several 0-5 generated BPF_CALL_<n>() variants of the .i results and
      they look all as expected. No sparse issue spotted. We let this also sit for a
      few days with Fengguang's kbuild test robot, and there were no issues seen. On
      s390, it barked on the "uses dynamic stack allocation" notice, which is an old
      one from bpf_perf_event_output{,_tp}() reappearing here due to the conversion
      to the call wrapper, just telling that the perf raw record/frag sits on stack
      (gcc with s390's -mwarn-dynamicstack), but that's all. Did various runtime tests
      and they were fine as well. All eBPF helpers are now converted to use these
      macros, getting rid of a good chunk of all the raw castings.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f3694e00
    • D
      bpf: add own ctx rewriter on ifindex for clsact progs · 374fb54e
      Daniel Borkmann 提交于
      When fetching ifindex, we don't need to test dev for being NULL since
      we're always guaranteed to have a valid dev for clsact programs. Thus,
      avoid this test in fast path.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      374fb54e
    • D
      bpf: add BPF_SIZEOF and BPF_FIELD_SIZEOF macros · f035a515
      Daniel Borkmann 提交于
      Add BPF_SIZEOF() and BPF_FIELD_SIZEOF() macros to improve the code a bit
      which otherwise often result in overly long bytes_to_bpf_size(sizeof())
      and bytes_to_bpf_size(FIELD_SIZEOF()) lines. So place them into a macro
      helper instead. Moreover, we currently have a BUILD_BUG_ON(BPF_FIELD_SIZEOF())
      check in convert_bpf_extensions(), but we should rather make that generic
      as well and add a BUILD_BUG_ON() test in all BPF_SIZEOF()/BPF_FIELD_SIZEOF()
      users to detect any rewriter size issues at compile time. Note, there are
      currently none, but we want to assert that it stays this way.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f035a515
    • D
      bpf: minor cleanups in helpers · 6088b582
      Daniel Borkmann 提交于
      Some minor misc cleanups, f.e. use sizeof(__u32) instead of hardcoding
      and in __bpf_skb_max_len(), I missed that we always have skb->dev valid
      anyway, so we can drop the unneeded test for dev; also few more other
      misc bits addressed here.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6088b582
  12. 19 8月, 2016 4 次提交
  13. 14 8月, 2016 1 次提交
    • D
      bpf: fix write helpers with regards to non-linear parts · 0ed661d5
      Daniel Borkmann 提交于
      Fix the bpf_try_make_writable() helper and all call sites we have in BPF,
      it's currently defect with regards to skbs when the write_len spans into
      non-linear parts, no matter if cloned or not.
      
      There are multiple issues at once. First, using skb_store_bits() is not
      correct since even if we have a cloned skb, page frags can still be shared.
      To really make them private, we need to pull them in via __pskb_pull_tail()
      first, which also gets us a private head via pskb_expand_head() implicitly.
      
      This is for helpers like bpf_skb_store_bytes(), bpf_l3_csum_replace(),
      bpf_l4_csum_replace(). Really, the only thing reasonable and working here
      is to call skb_ensure_writable() before any write operation. Meaning, via
      pskb_may_pull() it makes sure that parts we want to access are pulled in and
      if not does so plus unclones the skb implicitly. If our write_len still fits
      the headlen and we're cloned and our header of the clone is not writable,
      then we need to make a private copy via pskb_expand_head(). skb_store_bits()
      is a bit misleading and only safe to store into non-linear data in different
      contexts such as 357b40a1 ("[IPV6]: IPV6_CHECKSUM socket option can
      corrupt kernel memory").
      
      For above BPF helper functions, it means after fixed bpf_try_make_writable(),
      we've pulled in enough, so that we operate always based on skb->data. Thus,
      the call to skb_header_pointer() and skb_store_bits() becomes superfluous.
      In bpf_skb_store_bytes(), the len check is unnecessary too since it can
      only pass in maximum of BPF stack size, so adding offset is guaranteed to
      never overflow. Also bpf_l3/4_csum_replace() helpers must test for proper
      offset alignment since they use __sum16 pointer for writing resulting csum.
      
      The remaining helpers that change skb data not discussed here yet are
      bpf_skb_vlan_push(), bpf_skb_vlan_pop() and bpf_skb_change_proto(). The
      vlan helpers internally call either skb_ensure_writable() (pop case) and
      skb_cow_head() (push case, for head expansion), respectively. Similarly,
      bpf_skb_proto_xlat() takes care to not mangle page frags.
      
      Fixes: 608cd71a ("tc: bpf: generalize pedit action")
      Fixes: 91bc4822 ("tc: bpf: add checksum helpers")
      Fixes: 3697649f ("bpf: try harder on clones when writing into skb")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ed661d5
  14. 13 8月, 2016 1 次提交
  15. 09 8月, 2016 3 次提交
    • D
      bpf: fix checksum for vlan push/pop helper · 8065694e
      Daniel Borkmann 提交于
      When having skbs on ingress with CHECKSUM_COMPLETE, tc BPF programs don't
      push rcsum of mac header back in and after BPF run back pull out again as
      opposed to some other subsystems (ovs, for example).
      
      For cases like q-in-q, meaning when a vlan tag for offloading is already
      present and we're about to push another one, then skb_vlan_push() pushes the
      inner one into the skb, increasing mac header and skb_postpush_rcsum()'ing
      the 4 bytes vlan header diff. Likewise, for the reverse operation in
      skb_vlan_pop() for the case where vlan header needs to be pulled out of the
      skb, we're decreasing the mac header and skb_postpull_rcsum()'ing the 4 bytes
      rcsum of the vlan header that was removed.
      
      However mangling the rcsum here will lead to hw csum failure for BPF case,
      since we're pulling or pushing data that was not part of the current rcsum.
      Changing tc BPF programs in general to push/pull rcsum around BPF_PROG_RUN()
      is also not really an option since current behaviour is ABI by now, but apart
      from that would also mean to do quite a bit of useless work in the sense that
      usually 12 bytes need to be rcsum pushed/pulled also when we don't need to
      touch this vlan related corner case. One way to fix it would be to push the
      necessary rcsum fixup down into vlan helpers that are (mostly) slow-path
      anyway.
      
      Fixes: 4e10df9a ("bpf: introduce bpf_skb_vlan_push/pop() helpers")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8065694e
    • D
      bpf: fix checksum fixups on bpf_skb_store_bytes · 479ffccc
      Daniel Borkmann 提交于
      bpf_skb_store_bytes() invocations above L2 header need BPF_F_RECOMPUTE_CSUM
      flag for updates, so that CHECKSUM_COMPLETE will be fixed up along the way.
      Where we ran into an issue with bpf_skb_store_bytes() is when we did a
      single-byte update on the IPv6 hoplimit despite using BPF_F_RECOMPUTE_CSUM
      flag; simple ping via ICMPv6 triggered a hw csum failure as a result. The
      underlying issue has been tracked down to a buffer alignment issue.
      
      Meaning, that csum_partial() computations via skb_postpull_rcsum() and
      skb_postpush_rcsum() pair invoked had a wrong result since they operated on
      an odd address for the hoplimit, while other computations were done on an
      even address. This mix doesn't work as-is with skb_postpull_rcsum(),
      skb_postpush_rcsum() pair as it always expects at least half-word alignment
      of input buffers, which is normally the case. Thus, instead of these helpers
      using csum_sub() and (implicitly) csum_add(), we need to use csum_block_sub(),
      csum_block_add(), respectively. For unaligned offsets, they rotate the sum
      to align it to a half-word boundary again, otherwise they work the same as
      csum_sub() and csum_add().
      
      Adding __skb_postpull_rcsum(), __skb_postpush_rcsum() variants that take the
      offset as an input and adapting bpf_skb_store_bytes() to them fixes the hw
      csum failures again. The skb_postpull_rcsum(), skb_postpush_rcsum() helpers
      use a 0 constant for offset so that the compiler optimizes the offset & 1
      test away and generates the same code as with csum_sub()/_add().
      
      Fixes: 608cd71a ("tc: bpf: generalize pedit action")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      479ffccc
    • D
      bpf: also call skb_postpush_rcsum on xmit occasions · a2bfe6bf
      Daniel Borkmann 提交于
      Follow-up to commit f8ffad69 ("bpf: add skb_postpush_rcsum and fix
      dev_forward_skb occasions") to fix an issue for dev_queue_xmit() redirect
      locations which need CHECKSUM_COMPLETE fixups on ingress.
      
      For the same reasons as described in f8ffad69 already, we of course
      also need this here, since dev_queue_xmit() on a veth device will let us
      end up in the dev_forward_skb() helper again to cross namespaces.
      
      Latter then calls into skb_postpull_rcsum() to pull out L2 header, so
      that netif_rx_internal() sees CHECKSUM_COMPLETE as it is expected. That
      is, CHECKSUM_COMPLETE on ingress covering L2 _payload_, not L2 headers.
      
      Also here we have to address bpf_redirect() and bpf_clone_redirect().
      
      Fixes: 3896d655 ("bpf: introduce bpf_clone_redirect() helper")
      Fixes: 27b29f63 ("bpf: add bpf_redirect() helper")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2bfe6bf
  16. 26 7月, 2016 1 次提交
    • D
      bpf, events: fix offset in skb copy handler · aa7145c1
      Daniel Borkmann 提交于
      This patch fixes the __output_custom() routine we currently use with
      bpf_skb_copy(). I missed that when len is larger than the size of the
      current handle, we can issue multiple invocations of copy_func, and
      __output_custom() advances destination but also source buffer by the
      written amount of bytes. When we have __output_custom(), this is actually
      wrong since in that case the source buffer points to a non-linear object,
      in our case an skb, which the copy_func helper is supposed to walk.
      Therefore, since this is non-linear we thus need to pass the offset into
      the helper, so that copy_func can use it for extracting the data from
      the source object.
      
      Therefore, adjust the callback signatures properly and pass offset
      into the skb_header_pointer() invoked from bpf_skb_copy() callback. The
      __DEFINE_OUTPUT_COPY_BODY() is adjusted to accommodate for two things:
      i) to pass in whether we should advance source buffer or not; this is
      a compile-time constant condition, ii) to pass in the offset for
      __output_custom(), which we do with help of __VA_ARGS__, so everything
      can stay inlined as is currently. Both changes allow for adapting the
      __output_* fast-path helpers w/o extra overhead.
      
      Fixes: 555c8a86 ("bpf: avoid stack copy and use skb ctx for event output")
      Fixes: 7e3f977e ("perf, events: add non-linear data support for raw records")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa7145c1
  17. 20 7月, 2016 1 次提交
    • B
      bpf: add XDP prog type for early driver filter · 6a773a15
      Brenden Blanco 提交于
      Add a new bpf prog type that is intended to run in early stages of the
      packet rx path. Only minimal packet metadata will be available, hence a
      new context type, struct xdp_md, is exposed to userspace. So far only
      expose the packet start and end pointers, and only in read mode.
      
      An XDP program must return one of the well known enum values, all other
      return codes are reserved for future use. Unfortunately, this
      restriction is hard to enforce at verification time, so take the
      approach of warning at runtime when such programs are encountered. Out
      of bounds return codes should alias to XDP_ABORTED.
      Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a773a15
  18. 16 7月, 2016 1 次提交
    • D
      bpf: avoid stack copy and use skb ctx for event output · 555c8a86
      Daniel Borkmann 提交于
      This work addresses a couple of issues bpf_skb_event_output()
      helper currently has: i) We need two copies instead of just a
      single one for the skb data when it should be part of a sample.
      The data can be non-linear and thus needs to be extracted via
      bpf_skb_load_bytes() helper first, and then copied once again
      into the ring buffer slot. ii) Since bpf_skb_load_bytes()
      currently needs to be used first, the helper needs to see a
      constant size on the passed stack buffer to make sure BPF
      verifier can do sanity checks on it during verification time.
      Thus, just passing skb->len (or any other non-constant value)
      wouldn't work, but changing bpf_skb_load_bytes() is also not
      the proper solution, since the two copies are generally still
      needed. iii) bpf_skb_load_bytes() is just for rather small
      buffers like headers, since they need to sit on the limited
      BPF stack anyway. Instead of working around in bpf_skb_load_bytes(),
      this work improves the bpf_skb_event_output() helper to address
      all 3 at once.
      
      We can make use of the passed in skb context that we have in
      the helper anyway, and use some of the reserved flag bits as
      a length argument. The helper will use the new __output_custom()
      facility from perf side with bpf_skb_copy() as callback helper
      to walk and extract the data. It will pass the data for setup
      to bpf_event_output(), which generates and pushes the raw record
      with an additional frag part. The linear data used in the first
      frag of the record serves as programmatically defined meta data
      passed along with the appended sample.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      555c8a86
  19. 14 7月, 2016 1 次提交
    • W
      rose: limit sk_filter trim to payload · f4979fce
      Willem de Bruijn 提交于
      Sockets can have a filter program attached that drops or trims
      incoming packets based on the filter program return value.
      
      Rose requires data packets to have at least ROSE_MIN_LEN bytes. It
      verifies this on arrival in rose_route_frame and unconditionally pulls
      the bytes in rose_recvmsg. The filter can trim packets to below this
      value in-between, causing pull to fail, leaving the partial header at
      the time of skb_copy_datagram_msg.
      
      Place a lower bound on the size to which sk_filter may trim packets
      by introducing sk_filter_trim_cap and call this for rose packets.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4979fce
  20. 05 7月, 2016 1 次提交
  21. 02 7月, 2016 2 次提交
    • M
      cgroup: bpf: Add bpf_skb_in_cgroup_proto · 4a482f34
      Martin KaFai Lau 提交于
      Adds a bpf helper, bpf_skb_in_cgroup, to decide if a skb->sk
      belongs to a descendant of a cgroup2.  It is similar to the
      feature added in netfilter:
      commit c38c4597 ("netfilter: implement xt_cgroup cgroup2 path match")
      
      The user is expected to populate a BPF_MAP_TYPE_CGROUP_ARRAY
      which will be used by the bpf_skb_in_cgroup.
      
      Modifications to the bpf verifier is to ensure BPF_MAP_TYPE_CGROUP_ARRAY
      and bpf_skb_in_cgroup() are always used together.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Tejun Heo <tj@kernel.org>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a482f34
    • D
      bpf: refactor bpf_prog_get and type check into helper · 113214be
      Daniel Borkmann 提交于
      Since bpf_prog_get() and program type check is used in a couple of places,
      refactor this into a small helper function that we can make use of. Since
      the non RO prog->aux part is not used in performance critical paths and a
      program destruction via RCU is rather very unlikley when doing the put, we
      shouldn't have an issue just doing the bpf_prog_get() + prog->type != type
      check, but actually not taking the ref at all (due to being in fdget() /
      fdput() section of the bpf fd) is even cleaner and makes the diff smaller
      as well, so just go for that. Callsites are changed to make use of the new
      helper where possible.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      113214be
  22. 30 6月, 2016 3 次提交
    • D
      bpf: add bpf_skb_change_type helper · d2485c42
      Daniel Borkmann 提交于
      This work adds a helper for changing skb->pkt_type in a controlled way.
      We only allow a subset of possible values and can extend that in future
      should other use cases come up. Doing this as a helper has the advantage
      that errors can be handeled gracefully and thus helper kept extensible.
      
      It's a write counterpart to pkt_type member we can already read from
      struct __sk_buff context. Major use case is to change incoming skbs to
      PACKET_HOST in a programmatic way instead of having to recirculate via
      redirect(..., BPF_F_INGRESS), for example.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2485c42
    • D
      bpf: add bpf_skb_change_proto helper · 6578171a
      Daniel Borkmann 提交于
      This patch adds a minimal helper for doing the groundwork of changing
      the skb->protocol in a controlled way. Currently supported is v4 to
      v6 and vice versa transitions, which allows f.e. for a minimal, static
      nat64 implementation where applications in containers that still
      require IPv4 can be transparently operated in an IPv6-only environment.
      For example, host facing veth of the container can transparently do
      the transitions in a programmatic way with the help of clsact qdisc
      and cls_bpf.
      
      Idea is to separate concerns for keeping complexity of the helper
      lower, which means that the programs utilize bpf_skb_change_proto(),
      bpf_skb_store_bytes() and bpf_lX_csum_replace() to get the job done,
      instead of doing everything in a single helper (and thus partially
      duplicating helper functionality). Also, bpf_skb_change_proto()
      shouldn't need to deal with raw packet data as this is done by other
      helpers.
      
      bpf_skb_proto_6_to_4() and bpf_skb_proto_4_to_6() unclone the skb to
      operate on a private one, push or pop additionally required header
      space and migrate the gso/gro meta data from the shared info. We do
      mark the gso type as dodgy so that headers are checked and segs
      recalculated by the gso/gro engine. The gso_size target is adapted
      as well. The flags argument added is currently reserved and can be
      used for future extensions.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6578171a
    • D
      bpf: don't use raw processor id in generic helper · 80b48c44
      Daniel Borkmann 提交于
      Use smp_processor_id() for the generic helper bpf_get_smp_processor_id()
      instead of the raw variant. This allows for preemption checks when we
      have DEBUG_PREEMPT, and otherwise uses the raw variant anyway. We only
      need to keep the raw variant for socket filters, but we can reuse the
      helper that is already there from cBPF side.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80b48c44
  23. 16 6月, 2016 1 次提交
  24. 11 6月, 2016 2 次提交