1. 19 2月, 2019 1 次提交
  2. 18 2月, 2019 3 次提交
  3. 16 2月, 2019 7 次提交
    • B
      net/mlx5: E-Switch, Consider ECPF vport depends on eswitch ownership · 81cd229c
      Bodong Wang 提交于
      ECPF connects to the eswitch through vport 0xfffe. ECPF may or may
      not be the eswitch manager depending on firmware configuration.
      
      1. If ECPF is eswitch manager: ECPF will take over the eswitch manager
         responsibility. A rep of the host PF shall be created at the ECPF
         side for the eswitch manager to control.
      
      2. If ECPF is not eswitch manager: host PF will be the eswitch manager,
         ECPF acts similar as a VF to the host PF. Host PF will be aware
         of the ECPF vport presence and control it's rep.
      Signed-off-by: NBodong Wang <bodong@mellanox.com>
      Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      81cd229c
    • B
      net/mlx5: E-Switch, Assign a different position for uplink rep and vport · 5ae51620
      Bodong Wang 提交于
      In offloads mode, the current implementation puts the uplink
      representor at index zero of the vport reps array. It is not "natural"
      to place it at index 0 since we want to put the representor for vport
      0 at index 0 with the introduction of SmartNIC. A separate patch will
      handle the case whether a rep is needed for vport 0 (PF vport).
      
      So, we want to have a different placeholder for uplink vport and
      representor. It was placed at the end of vport and rep array. Since
      vport number can no longer act as an index into the vport or
      representors arrays, use functions to map vport numbers to indices
      when accessing the vports or representors arrays, and vice versa.
      Signed-off-by: NBodong Wang <bodong@mellanox.com>
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      5ae51620
    • B
      net/mlx5: E-Switch, Centralize repersentor reg/unreg to eswitch driver · f8e8fa02
      Bodong Wang 提交于
      Eswitch has two users: IB and ETH. They both register repersentors
      when mlx5 interface is added, and unregister the repersentors when
      mlx5 interface is removed. Ideally, each driver should only deal with
      the entities which are unique to itself. However, current IB and ETH
      drivers have to perform the following eswitch operations:
      
      1. When registering, specify how many vports to register. This number
         is the same for both drivers which is the total available vport
         numbers.
      2. When unregistering, specify the number of registered vports to do
         unregister. Also, unload the repersentors which are already loaded.
      
      It's unnecessary for eswitch driver to hands out the control of above
      operations to individual driver users, as they're not unique to each
      driver. Instead, such operations should be centralized to eswitch
      driver. This consolidates eswitch control flow, and simplified IB and
      ETH driver.
      
      This patch doesn't change any functionality.
      Signed-off-by: NBodong Wang <bodong@mellanox.com>
      Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      f8e8fa02
    • B
      net/mlx5: E-Switch, Add state to eswitch vport representors · f121e0ea
      Bodong Wang 提交于
      Currently the eswitch vport reps have a valid indicator, which is
      set on register and unset on unregister. However, a rep can be loaded
      or not loaded when doing unregister, current driver checks if the
      vport of that rep is enabled as a flag to imply the rep is loaded.
      However, for ECPF, this is not valid as the host PF will enable the
      vports for its VFs instead.
      
      Add three states: {unregistered, registered, loaded}, with the
      following state changes across different operations:
      
      	create: (none)       -> unregistered
      	reg:    unregistered -> registered
      	load:   registered   -> loaded
      	unload: loaded       -> registered
      	unreg:  registered   -> unregistered
      
      Note that the state shall only be updated inside eswitch driver rather
      than individual drivers such as ETH or IB.
      Signed-off-by: NBodong Wang <bodong@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Suggested-by: NMark Bloch <markb@mellanox.com>
      Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      f121e0ea
    • B
      net/mlx5: E-Switch, Split VF and special vports for offloads mode · c9b99abc
      Bodong Wang 提交于
      When driver is entering offloads mode, there are two major tasks to
      do: initialize flow steering and create representors. Flow steering
      should make sure enough flow table/group spaces are reserved for all
      reps. Representors will be created in a group, all or none.
      
      With the introduction of ECPF, flow steering should still reserve the
      same spaces. But, the representors are not always loaded/unloaded in a
      single piece. Once ECPF is in offloads mode, it will get the number
      of VF changing event from host PF. In such scenario, only the VF reps
      should be loaded/unloaded, not the reps for special vports (such as
      the uplink vport).
      
      Thus, when entering offloads mode, driver should specify the total
      number of reps, and the number of VF reps separately. When leaving
      offloads mode, the cleanup should use the information self-contained
      in eswitch such as number of VFs.
      
      This patch doesn't change any functionality.
      Signed-off-by: NBodong Wang <bodong@mellanox.com>
      Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      c9b99abc
    • B
      net/mlx5: E-Switch, Properly refer to host PF vport as other vport · cbc44e76
      Bodong Wang 提交于
      Commands referring to vports use the following scheme:
      
      1. When referring to my own vport, put 0 in vport and 0 in other_vport.
      2. When referring to another vport, put the vport number of the
         referred vport and put 1 in other_vport. It was assumed that driver
         is accessing other vport when vport number is greater than 0.
      
      With the above scheme, the case that ECPF eswitch manager is trying
      to access host PF vport will fall over with scheme 1 as the vport
      number is 0. This is apparently wrong as driver is trying to refer
      other vport.
      
      As such usage can only happen in the eswitch context, change relevant
      functions to provide other vport input properly.
      Signed-off-by: NBodong Wang <bodong@mellanox.com>
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      cbc44e76
    • B
      net/mlx5: E-Switch, Properly refer to the esw manager vport · a1b3839a
      Bodong Wang 提交于
      In SmartNIC mode, the eswitch manager is not necessarily the PF
      (vport 0). Use a helper function to get the correct eswitch manager
      vport number and cache on the eswitch instance for fast reference.
      Signed-off-by: NBodong Wang <bodong@mellanox.com>
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      a1b3839a
  4. 15 2月, 2019 15 次提交
  5. 14 2月, 2019 4 次提交
  6. 13 2月, 2019 8 次提交
  7. 11 2月, 2019 2 次提交
    • M
      bpf: Add struct bpf_tcp_sock and BPF_FUNC_tcp_sock · 655a51e5
      Martin KaFai Lau 提交于
      This patch adds a helper function BPF_FUNC_tcp_sock and it
      is currently available for cg_skb and sched_(cls|act):
      
      struct bpf_tcp_sock *bpf_tcp_sock(struct bpf_sock *sk);
      
      int cg_skb_foo(struct __sk_buff *skb) {
      	struct bpf_tcp_sock *tp;
      	struct bpf_sock *sk;
      	__u32 snd_cwnd;
      
      	sk = skb->sk;
      	if (!sk)
      		return 1;
      
      	tp = bpf_tcp_sock(sk);
      	if (!tp)
      		return 1;
      
      	snd_cwnd = tp->snd_cwnd;
      	/* ... */
      
      	return 1;
      }
      
      A 'struct bpf_tcp_sock' is also added to the uapi bpf.h to provide
      read-only access.  bpf_tcp_sock has all the existing tcp_sock's fields
      that has already been exposed by the bpf_sock_ops.
      i.e. no new tcp_sock's fields are exposed in bpf.h.
      
      This helper returns a pointer to the tcp_sock.  If it is not a tcp_sock
      or it cannot be traced back to a tcp_sock by sk_to_full_sk(), it
      returns NULL.  Hence, the caller needs to check for NULL before
      accessing it.
      
      The current use case is to expose members from tcp_sock
      to allow a cg_skb_bpf_prog to provide per cgroup traffic
      policing/shaping.
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      655a51e5
    • M
      bpf: Add a bpf_sock pointer to __sk_buff and a bpf_sk_fullsock helper · 46f8bc92
      Martin KaFai Lau 提交于
      In kernel, it is common to check "skb->sk && sk_fullsock(skb->sk)"
      before accessing the fields in sock.  For example, in __netdev_pick_tx:
      
      static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb,
      			    struct net_device *sb_dev)
      {
      	/* ... */
      
      	struct sock *sk = skb->sk;
      
      		if (queue_index != new_index && sk &&
      		    sk_fullsock(sk) &&
      		    rcu_access_pointer(sk->sk_dst_cache))
      			sk_tx_queue_set(sk, new_index);
      
      	/* ... */
      
      	return queue_index;
      }
      
      This patch adds a "struct bpf_sock *sk" pointer to the "struct __sk_buff"
      where a few of the convert_ctx_access() in filter.c has already been
      accessing the skb->sk sock_common's fields,
      e.g. sock_ops_convert_ctx_access().
      
      "__sk_buff->sk" is a PTR_TO_SOCK_COMMON_OR_NULL in the verifier.
      Some of the fileds in "bpf_sock" will not be directly
      accessible through the "__sk_buff->sk" pointer.  It is limited
      by the new "bpf_sock_common_is_valid_access()".
      e.g. The existing "type", "protocol", "mark" and "priority" in bpf_sock
           are not allowed.
      
      The newly added "struct bpf_sock *bpf_sk_fullsock(struct bpf_sock *sk)"
      can be used to get a sk with all accessible fields in "bpf_sock".
      This helper is added to both cg_skb and sched_(cls|act).
      
      int cg_skb_foo(struct __sk_buff *skb) {
      	struct bpf_sock *sk;
      
      	sk = skb->sk;
      	if (!sk)
      		return 1;
      
      	sk = bpf_sk_fullsock(sk);
      	if (!sk)
      		return 1;
      
      	if (sk->family != AF_INET6 || sk->protocol != IPPROTO_TCP)
      		return 1;
      
      	/* some_traffic_shaping(); */
      
      	return 1;
      }
      
      (1) The sk is read only
      
      (2) There is no new "struct bpf_sock_common" introduced.
      
      (3) Future kernel sock's members could be added to bpf_sock only
          instead of repeatedly adding at multiple places like currently
          in bpf_sock_ops_md, bpf_sock_addr_md, sk_reuseport_md...etc.
      
      (4) After "sk = skb->sk", the reg holding sk is in type
          PTR_TO_SOCK_COMMON_OR_NULL.
      
      (5) After bpf_sk_fullsock(), the return type will be in type
          PTR_TO_SOCKET_OR_NULL which is the same as the return type of
          bpf_sk_lookup_xxx().
      
          However, bpf_sk_fullsock() does not take refcnt.  The
          acquire_reference_state() is only depending on the return type now.
          To avoid it, a new is_acquire_function() is checked before calling
          acquire_reference_state().
      
      (6) The WARN_ON in "release_reference_state()" is no longer an
          internal verifier bug.
      
          When reg->id is not found in state->refs[], it means the
          bpf_prog does something wrong like
          "bpf_sk_release(bpf_sk_fullsock(skb->sk))" where reference has
          never been acquired by calling "bpf_sk_fullsock(skb->sk)".
      
          A -EINVAL and a verbose are done instead of WARN_ON.  A test is
          added to the test_verifier in a later patch.
      
          Since the WARN_ON in "release_reference_state()" is no longer
          needed, "__release_reference_state()" is folded into
          "release_reference_state()" also.
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      46f8bc92