1. 07 3月, 2022 1 次提交
  2. 06 3月, 2022 3 次提交
  3. 05 3月, 2022 3 次提交
  4. 04 3月, 2022 22 次提交
  5. 03 3月, 2022 11 次提交
    • M
      bpf: Add __sk_buff->delivery_time_type and bpf_skb_set_skb_delivery_time() · 8d21ec0e
      Martin KaFai Lau 提交于
      * __sk_buff->delivery_time_type:
      This patch adds __sk_buff->delivery_time_type.  It tells if the
      delivery_time is stored in __sk_buff->tstamp or not.
      
      It will be most useful for ingress to tell if the __sk_buff->tstamp
      has the (rcv) timestamp or delivery_time.  If delivery_time_type
      is 0 (BPF_SKB_DELIVERY_TIME_NONE), it has the (rcv) timestamp.
      
      Two non-zero types are defined for the delivery_time_type,
      BPF_SKB_DELIVERY_TIME_MONO and BPF_SKB_DELIVERY_TIME_UNSPEC.  For UNSPEC,
      it can only happen in egress because only mono delivery_time can be
      forwarded to ingress now.  The clock of UNSPEC delivery_time
      can be deduced from the skb->sk->sk_clockid which is how
      the sch_etf doing it also.
      
      * Provide forwarded delivery_time to tc-bpf@ingress:
      With the help of the new delivery_time_type, the tc-bpf has a way
      to tell if the __sk_buff->tstamp has the (rcv) timestamp or
      the delivery_time.  During bpf load time, the verifier will learn if
      the bpf prog has accessed the new __sk_buff->delivery_time_type.
      If it does, it means the tc-bpf@ingress is expecting the
      skb->tstamp could have the delivery_time.  The kernel will then
      read the skb->tstamp as-is during bpf insn rewrite without
      checking the skb->mono_delivery_time.  This is done by adding a
      new prog->delivery_time_access bit.  The same goes for
      writing skb->tstamp.
      
      * bpf_skb_set_delivery_time():
      The bpf_skb_set_delivery_time() helper is added to allow setting both
      delivery_time and the delivery_time_type at the same time.  If the
      tc-bpf does not need to change the delivery_time_type, it can directly
      write to the __sk_buff->tstamp as the existing tc-bpf has already been
      doing.  It will be most useful at ingress to change the
      __sk_buff->tstamp from the (rcv) timestamp to
      a mono delivery_time and then bpf_redirect_*().
      
      bpf only has mono clock helper (bpf_ktime_get_ns), and
      the current known use case is the mono EDT for fq, and
      only mono delivery time can be kept during forward now,
      so bpf_skb_set_delivery_time() only supports setting
      BPF_SKB_DELIVERY_TIME_MONO.  It can be extended later when use cases
      come up and the forwarding path also supports other clock bases.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d21ec0e
    • M
      bpf: Keep the (rcv) timestamp behavior for the existing tc-bpf@ingress · 7449197d
      Martin KaFai Lau 提交于
      The current tc-bpf@ingress reads and writes the __sk_buff->tstamp
      as a (rcv) timestamp which currently could either be 0 (not available)
      or ktime_get_real().  This patch is to backward compatible with the
      (rcv) timestamp expectation at ingress.  If the skb->tstamp has
      the delivery_time, the bpf insn rewrite will read 0 for tc-bpf
      running at ingress as it is not available.  When writing at ingress,
      it will also clear the skb->mono_delivery_time bit.
      
      /* BPF_READ: a = __sk_buff->tstamp */
      if (!skb->tc_at_ingress || !skb->mono_delivery_time)
      	a = skb->tstamp;
      else
      	a = 0
      
      /* BPF_WRITE: __sk_buff->tstamp = a */
      if (skb->tc_at_ingress)
      	skb->mono_delivery_time = 0;
      skb->tstamp = a;
      
      [ A note on the BPF_CGROUP_INET_INGRESS which can also access
        skb->tstamp.  At that point, the skb is delivered locally
        and skb_clear_delivery_time() has already been done,
        so the skb->tstamp will only have the (rcv) timestamp. ]
      
      If the tc-bpf@egress writes 0 to skb->tstamp, the skb->mono_delivery_time
      has to be cleared also.  It could be done together during
      convert_ctx_access().  However, the latter patch will also expose
      the skb->mono_delivery_time bit as __sk_buff->delivery_time_type.
      Changing the delivery_time_type in the background may surprise
      the user, e.g. the 2nd read on __sk_buff->delivery_time_type
      may need a READ_ONCE() to avoid compiler optimization.  Thus,
      in expecting the needs in the latter patch, this patch does a
      check on !skb->tstamp after running the tc-bpf and clears the
      skb->mono_delivery_time bit if needed.  The earlier discussion
      on v4 [0].
      
      The bpf insn rewrite requires the skb's mono_delivery_time bit and
      tc_at_ingress bit.  They are moved up in sk_buff so that bpf rewrite
      can be done at a fixed offset.  tc_skip_classify is moved together with
      tc_at_ingress.  To get one bit for mono_delivery_time, csum_not_inet is
      moved down and this bit is currently used by sctp.
      
      [0]: https://lore.kernel.org/bpf/20220217015043.khqwqklx45c4m4se@kafai-mbp.dhcp.thefacebook.com/Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7449197d
    • M
      net: Postpone skb_clear_delivery_time() until knowing the skb is delivered locally · cd14e9b7
      Martin KaFai Lau 提交于
      The previous patches handled the delivery_time in the ingress path
      before the routing decision is made.  This patch can postpone clearing
      delivery_time in a skb until knowing it is delivered locally and also
      set the (rcv) timestamp if needed.  This patch moves the
      skb_clear_delivery_time() from dev.c to ip_local_deliver_finish()
      and ip6_input_finish().
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd14e9b7
    • M
      net: Get rcv tstamp if needed in nfnetlink_{log, queue}.c · 80fcec67
      Martin KaFai Lau 提交于
      If skb has the (rcv) timestamp available, nfnetlink_{log, queue}.c
      logs/outputs it to the userspace.  When the locally generated skb is
      looping from egress to ingress over a virtual interface (e.g. veth,
      loopback...),  skb->tstamp may have the delivery time before it is
      known that will be delivered locally and received by another sk.  Like
      handling the delivery time in network tapping,  use ktime_get_real() to
      get the (rcv) timestamp.  The earlier added helper skb_tstamp_cond() is
      used to do this.  false is passed to the second 'cond' arg such
      that doing ktime_get_real() or not only depends on the
      netstamp_needed_key static key.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80fcec67
    • M
      net: ipv6: Get rcv timestamp if needed when handling hop-by-hop IOAM option · b6561f84
      Martin KaFai Lau 提交于
      IOAM is a hop-by-hop option with a temporary iana allocation (49).
      Since it is hop-by-hop, it is done before the input routing decision.
      One of the traced data field is the (rcv) timestamp.
      
      When the locally generated skb is looping from egress to ingress over
      a virtual interface (e.g. veth, loopback...), skb->tstamp may have the
      delivery time before it is known that it will be delivered locally
      and received by another sk.
      
      Like handling the network tapping (tcpdump) in the earlier patch,
      this patch gets the timestamp if needed without over-writing the
      delivery_time in the skb->tstamp.  skb_tstamp_cond() is added to do the
      ktime_get_real() with an extra cond arg to check on top of the
      netstamp_needed_key static key.  skb_tstamp_cond() will also be used in
      a latter patch and it needs the netstamp_needed_key check.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6561f84
    • M
      net: ipv6: Handle delivery_time in ipv6 defrag · 335c8cf3
      Martin KaFai Lau 提交于
      A latter patch will postpone the delivery_time clearing until the stack
      knows the skb is being delivered locally (i.e. calling
      skb_clear_delivery_time() at ip_local_deliver_finish() for IPv4
      and at ip6_input_finish() for IPv6).  That will allow other kernel
      forwarding path (e.g. ip[6]_forward) to keep the delivery_time also.
      
      A very similar IPv6 defrag codes have been duplicated in
      multiple places: regular IPv6, nf_conntrack, and 6lowpan.
      
      Unlike the IPv4 defrag which is done before ip_local_deliver_finish(),
      the regular IPv6 defrag is done after ip6_input_finish().
      Thus, no change should be needed in the regular IPv6 defrag
      logic because skb_clear_delivery_time() should have been called.
      
      6lowpan also does not need special handling on delivery_time
      because it is a non-inet packet_type.
      
      However, cf_conntrack has a case in NF_INET_PRE_ROUTING that needs
      to do the IPv6 defrag earlier.  Thus, it needs to save the
      mono_delivery_time bit in the inet_frag_queue which is similar
      to how it is handled in the previous patch for the IPv4 defrag.
      
      This patch chooses to do it consistently and stores the mono_delivery_time
      in the inet_frag_queue for all cases such that it will be easier
      for the future refactoring effort on the IPv6 reasm code.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      335c8cf3
    • M
      net: ip: Handle delivery_time in ip defrag · 8672406e
      Martin KaFai Lau 提交于
      A latter patch will postpone the delivery_time clearing until the stack
      knows the skb is being delivered locally.  That will allow other kernel
      forwarding path (e.g. ip[6]_forward) to keep the delivery_time also.
      
      An earlier attempt was to do skb_clear_delivery_time() in
      ip_local_deliver() and ip6_input().  The discussion [0] requested
      to move it one step later into ip_local_deliver_finish()
      and ip6_input_finish() so that the delivery_time can be kept
      for the ip_vs forwarding path also.
      
      To do that, this patch also needs to take care of the (rcv) timestamp
      usecase in ip_is_fragment().  It needs to expect delivery_time in
      the skb->tstamp, so it needs to save the mono_delivery_time bit in
      inet_frag_queue such that the delivery_time (if any) can be restored
      in the final defragmented skb.
      
      [Note that it will only happen when the locally generated skb is looping
       from egress to ingress over a virtual interface (e.g. veth, loopback...),
       skb->tstamp may have the delivery time before it is known that it will
       be delivered locally and received by another sk.]
      
      [0]: https://lore.kernel.org/netdev/ca728d81-80e8-3767-d5e-d44f6ad96e43@ssi.bg/Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8672406e
    • M
      net: Set skb->mono_delivery_time and clear it after sch_handle_ingress() · d98d58a0
      Martin KaFai Lau 提交于
      The previous patches handled the delivery_time before sch_handle_ingress().
      
      This patch can now set the skb->mono_delivery_time to flag the skb->tstamp
      is used as the mono delivery_time (EDT) instead of the (rcv) timestamp
      and also clear it with skb_clear_delivery_time() after
      sch_handle_ingress().  This will make the bpf_redirect_*()
      to keep the mono delivery_time and used by a qdisc (fq) of
      the egress-ing interface.
      
      A latter patch will postpone the skb_clear_delivery_time() until the
      stack learns that the skb is being delivered locally and that will
      make other kernel forwarding paths (ip[6]_forward) able to keep
      the delivery_time also.  Thus, like the previous patches on using
      the skb->mono_delivery_time bit, calling skb_clear_delivery_time()
      is not limited within the CONFIG_NET_INGRESS to avoid too many code
      churns among this set.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d98d58a0
    • M
      net: Clear mono_delivery_time bit in __skb_tstamp_tx() · d93376f5
      Martin KaFai Lau 提交于
      In __skb_tstamp_tx(), it may clone the egress skb and queues the clone to
      the sk_error_queue.  The outgoing skb may have the mono delivery_time
      while the (rcv) timestamp is expected for the clone, so the
      skb->mono_delivery_time bit needs to be cleared from the clone.
      
      This patch adds the skb->mono_delivery_time clearing to the existing
      __net_timestamp() and use it in __skb_tstamp_tx().
      The __net_timestamp() fast path usage in dev.c is changed to directly
      call ktime_get_real() since the mono_delivery_time bit is not set at
      that point.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d93376f5
    • M
      net: Handle delivery_time in skb->tstamp during network tapping with af_packet · 27942a15
      Martin KaFai Lau 提交于
      A latter patch will set the skb->mono_delivery_time to flag the skb->tstamp
      is used as the mono delivery_time (EDT) instead of the (rcv) timestamp.
      skb_clear_tstamp() will then keep this delivery_time during forwarding.
      
      This patch is to make the network tapping (with af_packet) to handle
      the delivery_time stored in skb->tstamp.
      
      Regardless of tapping at the ingress or egress,  the tapped skb is
      received by the af_packet socket, so it is ingress to the af_packet
      socket and it expects the (rcv) timestamp.
      
      When tapping at egress, dev_queue_xmit_nit() is used.  It has already
      expected skb->tstamp may have delivery_time,  so it does
      skb_clone()+net_timestamp_set() to ensure the cloned skb has
      the (rcv) timestamp before passing to the af_packet sk.
      This patch only adds to clear the skb->mono_delivery_time
      bit in net_timestamp_set().
      
      When tapping at ingress, it currently expects the skb->tstamp is either 0
      or the (rcv) timestamp.  Meaning, the tapping at ingress path
      has already expected the skb->tstamp could be 0 and it will get
      the (rcv) timestamp by ktime_get_real() when needed.
      
      There are two cases for tapping at ingress:
      
      One case is af_packet queues the skb to its sk_receive_queue.
      The skb is either not shared or new clone created.  The newly
      added skb_clear_delivery_time() is called to clear the
      delivery_time (if any) and set the (rcv) timestamp if
      needed before the skb is queued to the sk_receive_queue.
      
      Another case, the ingress skb is directly copied to the rx_ring
      and tpacket_get_timestamp() is used to get the (rcv) timestamp.
      The newly added skb_tstamp() is used in tpacket_get_timestamp()
      to check the skb->mono_delivery_time bit before returning skb->tstamp.
      As mentioned earlier, the tapping@ingress has already expected
      the skb may not have the (rcv) timestamp (because no sk has asked
      for it) and has handled this case by directly calling ktime_get_real().
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27942a15
    • M
      net: Add skb_clear_tstamp() to keep the mono delivery_time · de799101
      Martin KaFai Lau 提交于
      Right now, skb->tstamp is reset to 0 whenever the skb is forwarded.
      
      If skb->tstamp has the mono delivery_time, clearing it can hurt
      the performance when it finally transmits out to fq@phy-dev.
      
      The earlier patch added a skb->mono_delivery_time bit to
      flag the skb->tstamp carrying the mono delivery_time.
      
      This patch adds skb_clear_tstamp() helper which keeps
      the mono delivery_time and clears everything else.
      
      The delivery_time clearing will be postponed until the stack knows the
      skb will be delivered locally.  It will be done in a latter patch.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de799101