1. 26 1月, 2018 11 次提交
    • L
      bpf: Add BPF_SOCK_OPS_STATE_CB · d4487491
      Lawrence Brakmo 提交于
      Adds support for calling sock_ops BPF program when there is a TCP state
      change. Two arguments are used; one for the old state and another for
      the new state.
      
      There is a new enum in include/uapi/linux/bpf.h that exports the TCP
      states that prepends BPF_ to the current TCP state names. If it is ever
      necessary to change the internal TCP state values (other than adding
      more to the end), then it will become necessary to convert from the
      internal TCP state value to the BPF value before calling the BPF
      sock_ops function. There are a set of compile checks added in tcp.c
      to detect if the internal and BPF values differ so we can make the
      necessary fixes.
      
      New op: BPF_SOCK_OPS_STATE_CB.
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      d4487491
    • L
      bpf: Add BPF_SOCK_OPS_RETRANS_CB · a31ad29e
      Lawrence Brakmo 提交于
      Adds support for calling sock_ops BPF program when there is a
      retransmission. Three arguments are used; one for the sequence number,
      another for the number of segments retransmitted, and the last one for
      the return value of tcp_transmit_skb (0 => success).
      Does not include syn-ack retransmissions.
      
      New op: BPF_SOCK_OPS_RETRANS_CB.
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      a31ad29e
    • L
      bpf: Add sock_ops R/W access to tclass · 6f9bd3d7
      Lawrence Brakmo 提交于
      Adds direct write access to sk_txhash and access to tclass for ipv6
      flows through getsockopt and setsockopt. Sample usage for tclass:
      
        bpf_getsockopt(skops, SOL_IPV6, IPV6_TCLASS, &v, sizeof(v))
      
      where skops is a pointer to the ctx (struct bpf_sock_ops).
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      6f9bd3d7
    • L
      bpf: Add support for reading sk_state and more · 44f0e430
      Lawrence Brakmo 提交于
      Add support for reading many more tcp_sock fields
      
        state,	same as sk->sk_state
        rtt_min	same as sk->rtt_min.s[0].v (current rtt_min)
        snd_ssthresh
        rcv_nxt
        snd_nxt
        snd_una
        mss_cache
        ecn_flags
        rate_delivered
        rate_interval_us
        packets_out
        retrans_out
        total_retrans
        segs_in
        data_segs_in
        segs_out
        data_segs_out
        lost_out
        sacked_out
        sk_txhash
        bytes_received (__u64)
        bytes_acked    (__u64)
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      44f0e430
    • L
      bpf: Add sock_ops RTO callback · f89013f6
      Lawrence Brakmo 提交于
      Adds an optional call to sock_ops BPF program based on whether the
      BPF_SOCK_OPS_RTO_CB_FLAG is set in bpf_sock_ops_flags.
      The BPF program is passed 2 arguments: icsk_retransmits and whether the
      RTO has expired.
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      f89013f6
    • L
      bpf: Adds field bpf_sock_ops_cb_flags to tcp_sock · b13d8807
      Lawrence Brakmo 提交于
      Adds field bpf_sock_ops_cb_flags to tcp_sock and bpf_sock_ops. Its primary
      use is to determine if there should be calls to sock_ops bpf program at
      various points in the TCP code. The field is initialized to zero,
      disabling the calls. A sock_ops BPF program can set it, per connection and
      as necessary, when the connection is established.
      
      It also adds support for reading and writting the field within a
      sock_ops BPF program. Reading is done by accessing the field directly.
      However, writing is done through the helper function
      bpf_sock_ops_cb_flags_set, in order to return an error if a BPF program
      is trying to set a callback that is not supported in the current kernel
      (i.e. running an older kernel). The helper function returns 0 if it was
      able to set all of the bits set in the argument, a positive number
      containing the bits that could not be set, or -EINVAL if the socket is
      not a full TCP socket.
      
      Examples of where one could call the bpf program:
      
      1) When RTO fires
      2) When a packet is retransmitted
      3) When the connection terminates
      4) When a packet is sent
      5) When a packet is received
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      b13d8807
    • L
      bpf: Support passing args to sock_ops bpf function · de525be2
      Lawrence Brakmo 提交于
      Adds support for passing up to 4 arguments to sock_ops bpf functions. It
      reusues the reply union, so the bpf_sock_ops structures are not
      increased in size.
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      de525be2
    • L
      bpf: Add write access to tcp_sock and sock fields · b73042b8
      Lawrence Brakmo 提交于
      This patch adds a macro, SOCK_OPS_SET_FIELD, for writing to
      struct tcp_sock or struct sock fields. This required adding a new
      field "temp" to struct bpf_sock_ops_kern for temporary storage that
      is used by sock_ops_convert_ctx_access. It is used to store and recover
      the contents of a register, so the register can be used to store the
      address of the sk. Since we cannot overwrite the dst_reg because it
      contains the pointer to ctx, nor the src_reg since it contains the value
      we want to store, we need an extra register to contain the address
      of the sk.
      
      Also adds the macro SOCK_OPS_GET_OR_SET_FIELD that calls one of the
      GET or SET macros depending on the value of the TYPE field.
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      b73042b8
    • L
      bpf: Make SOCK_OPS_GET_TCP struct independent · 34d367c5
      Lawrence Brakmo 提交于
      Changed SOCK_OPS_GET_TCP to SOCK_OPS_GET_FIELD and added 2
      arguments so now it can also work with struct sock fields.
      The first argument is the name of the field in the bpf_sock_ops
      struct, the 2nd argument is the name of the field in the OBJ struct.
      
      Previous: SOCK_OPS_GET_TCP(FIELD_NAME)
      New:      SOCK_OPS_GET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ)
      
      Where OBJ is either "struct tcp_sock" or "struct sock" (without
      quotation). BPF_FIELD is the name of the field in the bpf_sock_ops
      struct and OBJ_FIELD is the name of the field in the OBJ struct.
      
      Although the field names are currently the same, the kernel struct names
      could change in the future and this change makes it easier to support
      that.
      
      Note that adding access to tcp_sock fields in sock_ops programs does
      not preclude the tcp_sock fields from being removed as long as we are
      willing to do one of the following:
      
        1) Return a fixed value (e.x. 0 or 0xffffffff), or
        2) Make the verifier fail if that field is accessed (i.e. program
          fails to load) so the user will know that field is no longer
          supported.
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      34d367c5
    • L
      bpf: Make SOCK_OPS_GET_TCP size independent · a33de397
      Lawrence Brakmo 提交于
      Make SOCK_OPS_GET_TCP helper macro size independent (before only worked
      with 4-byte fields.
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      a33de397
    • L
      bpf: Only reply field should be writeable · 2585cd62
      Lawrence Brakmo 提交于
      Currently, a sock_ops BPF program can write the op field and all the
      reply fields (reply and replylong). This is a bug. The op field should
      not have been writeable and there is currently no way to use replylong
      field for indices >= 1. This patch enforces that only the reply field
      (which equals replylong[0]) is writeable.
      
      Fixes: 40304b2a ("bpf: BPF support for sock_ops")
      Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      2585cd62
  2. 24 1月, 2018 15 次提交
  3. 23 1月, 2018 14 次提交