1. 26 1月, 2018 14 次提交
    • M
      bpf: Use the IS_FD_ARRAY() macro in map_update_elem() · 9c147b56
      Mickaël Salaün 提交于
      Make the code more readable.
      Signed-off-by: NMickaël Salaün <mic@digikod.net>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      9c147b56
    • A
      Merge branch 'bpf-more-sock_ops-callbacks' · 82f1e0f3
      Alexei Starovoitov 提交于
      Lawrence Brakmo says:
      
      ====================
      This patchset adds support for:
      
      - direct R or R/W access to many tcp_sock fields
      - passing up to 4 arguments to sock_ops BPF functions
      - tcp_sock field bpf_sock_ops_cb_flags for controlling callbacks
      - optionally calling sock_ops BPF program when RTO fires
      - optionally calling sock_ops BPF program when packet is retransmitted
      - optionally calling sock_ops BPF program when TCP state changes
      - access to tclass and sk_txhash
      - new selftest
      
      v2: Fixed commit message 0/11. The commit is to "bpf-next" but the patch
          below used "bpf" and Patchwork didn't work correctly.
      v3: Cleaned RTO callback as per  Yuchung's comment
          Added BPF enum for TCP states as per  Alexei's comment
      v4: Fixed compile warnings related to detecting changes between TCP
          internal states and the BPF defined states.
      v5: Fixed comment issues in some selftest files
          Fixed accesss issue with u64 fields in bpf_sock_ops struct
      v6: Made fixes based on comments form Eric Dumazet:
          The field bpf_sock_ops_cb_flags was addded in a hole on 64bit kernels
          Field bpf_sock_ops_cb_flags is now set through a helper function
          which returns an error when a BPF program tries to set bits for
          callbacks that are not supported in the current kernel.
          Added a comment indicating that when adding fields to bpf_sock_ops_kern
          they should be added before the field named "temp" if they need to be
          cleared before calling the BPF function.
      v7: Enfornced fields "op" and "replylong[1] .. replylong[3]" not be writable
          based on comments form Eric Dumazet and Alexei Starovoitov.
          Filled 32 bit hole in bpf_sock_ops struct with sk_txhash based on
          comments from Daniel Borkmann.
          Removed unused functions (tcp_call_bpf_1arg, tcp_call_bpf_4arg) based
          on comments from Daniel Borkmann.
      v8: Add commit message 00/12
          Add Acked-by as appropriate
      v9: Moved the bug fix to the front of the patchset
          Changed RETRANS_CB so it is always called (before it was only called if
          the retransmit succeeded). It is now called with an extra argument, the
          return value of tcp_transmit_skb (0 => success). Based on comments
          from Yuchung Cheng.
          Added support for reading 2 new fields, sacked_out and lost_out, based on
          comments from Yuchung Cheng.
      v10: Moved the callback flags from include/uapi/linux/tcp.h to
           include/uapi/linux/bpf.h
           Cleaned up the test in selftest. Added a timeout so it always completes,
           even if the client is not communicating with the server. Made it faster
           by removing the sleeps. Made sure it works even when called back-to-back
           20 times.
      
      Consists of the following patches:
      [PATCH bpf-next v10 01/12] bpf: Only reply field should be writeable
      [PATCH bpf-next v10 02/12] bpf: Make SOCK_OPS_GET_TCP size
      [PATCH bpf-next v10 03/12] bpf: Make SOCK_OPS_GET_TCP struct
      [PATCH bpf-next v10 04/12] bpf: Add write access to tcp_sock and sock
      [PATCH bpf-next v10 05/12] bpf: Support passing args to sock_ops bpf
      [PATCH bpf-next v10 06/12] bpf: Adds field bpf_sock_ops_cb_flags to
      [PATCH bpf-next v10 07/12] bpf: Add sock_ops RTO callback
      [PATCH bpf-next v10 08/12] bpf: Add support for reading sk_state and
      [PATCH bpf-next v10 09/12] bpf: Add sock_ops R/W access to tclass
      [PATCH bpf-next v10 10/12] bpf: Add BPF_SOCK_OPS_RETRANS_CB
      [PATCH bpf-next v10 11/12] bpf: Add BPF_SOCK_OPS_STATE_CB
      [PATCH bpf-next v10 12/12] bpf: add selftest for tcpbpf
      ====================
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      82f1e0f3
    • L
      bpf: add selftest for tcpbpf · d6d4f60c
      Lawrence Brakmo 提交于
      Added a selftest for tcpbpf (sock_ops) that checks that the appropriate
      callbacks occured and that it can access tcp_sock fields and that their
      values are correct.
      
      Run with command: ./test_tcpbpf_user
      Adding the flag "-d" will show why it did not pass.
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      d6d4f60c
    • L
      bpf: Add BPF_SOCK_OPS_STATE_CB · d4487491
      Lawrence Brakmo 提交于
      Adds support for calling sock_ops BPF program when there is a TCP state
      change. Two arguments are used; one for the old state and another for
      the new state.
      
      There is a new enum in include/uapi/linux/bpf.h that exports the TCP
      states that prepends BPF_ to the current TCP state names. If it is ever
      necessary to change the internal TCP state values (other than adding
      more to the end), then it will become necessary to convert from the
      internal TCP state value to the BPF value before calling the BPF
      sock_ops function. There are a set of compile checks added in tcp.c
      to detect if the internal and BPF values differ so we can make the
      necessary fixes.
      
      New op: BPF_SOCK_OPS_STATE_CB.
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      d4487491
    • L
      bpf: Add BPF_SOCK_OPS_RETRANS_CB · a31ad29e
      Lawrence Brakmo 提交于
      Adds support for calling sock_ops BPF program when there is a
      retransmission. Three arguments are used; one for the sequence number,
      another for the number of segments retransmitted, and the last one for
      the return value of tcp_transmit_skb (0 => success).
      Does not include syn-ack retransmissions.
      
      New op: BPF_SOCK_OPS_RETRANS_CB.
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      a31ad29e
    • L
      bpf: Add sock_ops R/W access to tclass · 6f9bd3d7
      Lawrence Brakmo 提交于
      Adds direct write access to sk_txhash and access to tclass for ipv6
      flows through getsockopt and setsockopt. Sample usage for tclass:
      
        bpf_getsockopt(skops, SOL_IPV6, IPV6_TCLASS, &v, sizeof(v))
      
      where skops is a pointer to the ctx (struct bpf_sock_ops).
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      6f9bd3d7
    • L
      bpf: Add support for reading sk_state and more · 44f0e430
      Lawrence Brakmo 提交于
      Add support for reading many more tcp_sock fields
      
        state,	same as sk->sk_state
        rtt_min	same as sk->rtt_min.s[0].v (current rtt_min)
        snd_ssthresh
        rcv_nxt
        snd_nxt
        snd_una
        mss_cache
        ecn_flags
        rate_delivered
        rate_interval_us
        packets_out
        retrans_out
        total_retrans
        segs_in
        data_segs_in
        segs_out
        data_segs_out
        lost_out
        sacked_out
        sk_txhash
        bytes_received (__u64)
        bytes_acked    (__u64)
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      44f0e430
    • L
      bpf: Add sock_ops RTO callback · f89013f6
      Lawrence Brakmo 提交于
      Adds an optional call to sock_ops BPF program based on whether the
      BPF_SOCK_OPS_RTO_CB_FLAG is set in bpf_sock_ops_flags.
      The BPF program is passed 2 arguments: icsk_retransmits and whether the
      RTO has expired.
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      f89013f6
    • L
      bpf: Adds field bpf_sock_ops_cb_flags to tcp_sock · b13d8807
      Lawrence Brakmo 提交于
      Adds field bpf_sock_ops_cb_flags to tcp_sock and bpf_sock_ops. Its primary
      use is to determine if there should be calls to sock_ops bpf program at
      various points in the TCP code. The field is initialized to zero,
      disabling the calls. A sock_ops BPF program can set it, per connection and
      as necessary, when the connection is established.
      
      It also adds support for reading and writting the field within a
      sock_ops BPF program. Reading is done by accessing the field directly.
      However, writing is done through the helper function
      bpf_sock_ops_cb_flags_set, in order to return an error if a BPF program
      is trying to set a callback that is not supported in the current kernel
      (i.e. running an older kernel). The helper function returns 0 if it was
      able to set all of the bits set in the argument, a positive number
      containing the bits that could not be set, or -EINVAL if the socket is
      not a full TCP socket.
      
      Examples of where one could call the bpf program:
      
      1) When RTO fires
      2) When a packet is retransmitted
      3) When the connection terminates
      4) When a packet is sent
      5) When a packet is received
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      b13d8807
    • L
      bpf: Support passing args to sock_ops bpf function · de525be2
      Lawrence Brakmo 提交于
      Adds support for passing up to 4 arguments to sock_ops bpf functions. It
      reusues the reply union, so the bpf_sock_ops structures are not
      increased in size.
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      de525be2
    • L
      bpf: Add write access to tcp_sock and sock fields · b73042b8
      Lawrence Brakmo 提交于
      This patch adds a macro, SOCK_OPS_SET_FIELD, for writing to
      struct tcp_sock or struct sock fields. This required adding a new
      field "temp" to struct bpf_sock_ops_kern for temporary storage that
      is used by sock_ops_convert_ctx_access. It is used to store and recover
      the contents of a register, so the register can be used to store the
      address of the sk. Since we cannot overwrite the dst_reg because it
      contains the pointer to ctx, nor the src_reg since it contains the value
      we want to store, we need an extra register to contain the address
      of the sk.
      
      Also adds the macro SOCK_OPS_GET_OR_SET_FIELD that calls one of the
      GET or SET macros depending on the value of the TYPE field.
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      b73042b8
    • L
      bpf: Make SOCK_OPS_GET_TCP struct independent · 34d367c5
      Lawrence Brakmo 提交于
      Changed SOCK_OPS_GET_TCP to SOCK_OPS_GET_FIELD and added 2
      arguments so now it can also work with struct sock fields.
      The first argument is the name of the field in the bpf_sock_ops
      struct, the 2nd argument is the name of the field in the OBJ struct.
      
      Previous: SOCK_OPS_GET_TCP(FIELD_NAME)
      New:      SOCK_OPS_GET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ)
      
      Where OBJ is either "struct tcp_sock" or "struct sock" (without
      quotation). BPF_FIELD is the name of the field in the bpf_sock_ops
      struct and OBJ_FIELD is the name of the field in the OBJ struct.
      
      Although the field names are currently the same, the kernel struct names
      could change in the future and this change makes it easier to support
      that.
      
      Note that adding access to tcp_sock fields in sock_ops programs does
      not preclude the tcp_sock fields from being removed as long as we are
      willing to do one of the following:
      
        1) Return a fixed value (e.x. 0 or 0xffffffff), or
        2) Make the verifier fail if that field is accessed (i.e. program
          fails to load) so the user will know that field is no longer
          supported.
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      34d367c5
    • L
      bpf: Make SOCK_OPS_GET_TCP size independent · a33de397
      Lawrence Brakmo 提交于
      Make SOCK_OPS_GET_TCP helper macro size independent (before only worked
      with 4-byte fields.
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      a33de397
    • L
      bpf: Only reply field should be writeable · 2585cd62
      Lawrence Brakmo 提交于
      Currently, a sock_ops BPF program can write the op field and all the
      reply fields (reply and replylong). This is a bug. The op field should
      not have been writeable and there is currently no way to use replylong
      field for indices >= 1. This patch enforces that only the reply field
      (which equals replylong[0]) is writeable.
      
      Fixes: 40304b2a ("bpf: BPF support for sock_ops")
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      2585cd62
  2. 24 1月, 2018 15 次提交
  3. 23 1月, 2018 11 次提交