1. 21 5月, 2019 1 次提交
  2. 02 5月, 2019 1 次提交
    • S
      ipv4: ip_do_fragment: Preserve skb_iif during fragmentation · d2f0c961
      Shmulik Ladkani 提交于
      Previously, during fragmentation after forwarding, skb->skb_iif isn't
      preserved, i.e. 'ip_copy_metadata' does not copy skb_iif from given
      'from' skb.
      
      As a result, ip_do_fragment's creates fragments with zero skb_iif,
      leading to inconsistent behavior.
      
      Assume for example an eBPF program attached at tc egress (post
      forwarding) that examines __sk_buff->ingress_ifindex:
       - the correct iif is observed if forwarding path does not involve
         fragmentation/refragmentation
       - a bogus iif is observed if forwarding path involves
         fragmentation/refragmentatiom
      
      Fix, by preserving skb_iif during 'ip_copy_metadata'.
      Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2f0c961
  3. 09 4月, 2019 3 次提交
    • D
      ipv4: Add helpers for neigh lookup for nexthop · 5c9f7c1d
      David Ahern 提交于
      A common theme in the output path is looking up a neigh entry for a
      nexthop, either the gateway in an rtable or a fallback to the daddr
      in the skb:
      
              nexthop = (__force u32)rt_nexthop(rt, ip_hdr(skb)->daddr);
              neigh = __ipv4_neigh_lookup_noref(dev, nexthop);
              if (unlikely(!neigh))
                      neigh = __neigh_create(&arp_tbl, &nexthop, dev, false);
      
      To allow the nexthop to be an IPv6 address we need to consider the
      family of the nexthop and then call __ipv{4,6}_neigh_lookup_noref based
      on it.
      
      To make this simpler, add a ip_neigh_gw4 helper similar to ip_neigh_gw6
      added in an earlier patch which handles:
      
              neigh = __ipv4_neigh_lookup_noref(dev, nexthop);
              if (unlikely(!neigh))
                      neigh = __neigh_create(&arp_tbl, &nexthop, dev, false);
      
      And then add a second one, ip_neigh_for_gw, that calls either
      ip_neigh_gw4 or ip_neigh_gw6 based on the address family of the gateway.
      
      Update the output paths in the VRF driver and core v4 code to use
      ip_neigh_for_gw simplifying the family based lookup and making both
      ready for a v6 nexthop.
      
      ipv4_neigh_lookup has a different need - the potential to resolve a
      passed in address in addition to any gateway in the rtable or skb. Since
      this is a one-off, add ip_neigh_gw4 and ip_neigh_gw6 diectly. The
      difference between __neigh_create used by the helpers and neigh_create
      called by ipv4_neigh_lookup is taking a refcount, so add rcu_read_lock_bh
      and bump the refcnt on the neigh entry.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5c9f7c1d
    • D
      neighbor: Add skip_cache argument to neigh_output · 0353f282
      David Ahern 提交于
      A later patch allows an IPv6 gateway with an IPv4 route. The neighbor
      entry will exist in the v6 ndisc table and the cached header will contain
      the ipv6 protocol which is wrong for an IPv4 packet. For an IPv4 packet to
      use the v6 neighbor entry, neigh_output needs to skip the cached header
      and just use the output callback for the neigh entry.
      
      A future patchset can look at expanding the hh_cache to handle 2
      protocols. For now, IPv6 gateways with an IPv4 route will take the
      extra overhead of generating the header.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0353f282
    • D
      ipv4: Prepare rtable for IPv6 gateway · 1550c171
      David Ahern 提交于
      To allow the gateway to be either an IPv4 or IPv6 address, remove
      rt_uses_gateway from rtable and replace with rt_gw_family. If
      rt_gw_family is set it implies rt_uses_gateway. Rename rt_gateway
      to rt_gw4 to represent the IPv4 version.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1550c171
  4. 05 4月, 2019 1 次提交
  5. 20 12月, 2018 1 次提交
    • F
      sk_buff: add skb extension infrastructure · df5042f4
      Florian Westphal 提交于
      This adds an optional extension infrastructure, with ispec (xfrm) and
      bridge netfilter as first users.
      objdiff shows no changes if kernel is built without xfrm and br_netfilter
      support.
      
      The third (planned future) user is Multipath TCP which is still
      out-of-tree.
      MPTCP needs to map logical mptcp sequence numbers to the tcp sequence
      numbers used by individual subflows.
      
      This DSS mapping is read/written from tcp option space on receive and
      written to tcp option space on transmitted tcp packets that are part of
      and MPTCP connection.
      
      Extending skb_shared_info or adding a private data field to skb fclones
      doesn't work for incoming skb, so a different DSS propagation method would
      be required for the receive side.
      
      mptcp has same requirements as secpath/bridge netfilter:
      
      1. extension memory is released when the sk_buff is free'd.
      2. data is shared after cloning an skb (clone inherits extension)
      3. adding extension to an skb will COW the extension buffer if needed.
      
      The "MPTCP upstreaming" effort adds SKB_EXT_MPTCP extension to store the
      mapping for tx and rx processing.
      
      Two new members are added to sk_buff:
      1. 'active_extensions' byte (filling a hole), telling which extensions
         are available for this skb.
         This has two purposes.
         a) avoids the need to initialize the pointer.
         b) allows to "delete" an extension by clearing its bit
         value in ->active_extensions.
      
         While it would be possible to store the active_extensions byte
         in the extension struct instead of sk_buff, there is one problem
         with this:
          When an extension has to be disabled, we can always clear the
          bit in skb->active_extensions.  But in case it would be stored in the
          extension buffer itself, we might have to COW it first, if
          we are dealing with a cloned skb.  On kmalloc failure we would
          be unable to turn an extension off.
      
      2. extension pointer, located at the end of the sk_buff.
         If the active_extensions byte is 0, the pointer is undefined,
         it is not initialized on skb allocation.
      
      This adds extra code to skb clone and free paths (to deal with
      refcount/free of extension area) but this replaces similar code that
      manages skb->nf_bridge and skb->sp structs in the followup patches of
      the series.
      
      It is possible to add support for extensions that are not preseved on
      clones/copies.
      
      To do this, it would be needed to define a bitmask of all extensions that
      need copy/cow semantics, and change __skb_ext_copy() to check
      ->active_extensions & SKB_EXT_PRESERVE_ON_CLONE, then just set
      ->active_extensions to 0 on the new clone.
      
      This isn't done here because all extensions that get added here
      need the copy/cow semantics.
      
      v2:
      Allocate entire extension space using kmem_cache.
      Upside is that this allows better tracking of used memory,
      downside is that we will allocate more space than strictly needed in
      most cases (its unlikely that all extensions are active/needed at same
      time for same skb).
      The allocated memory (except the small extension header) is not cleared,
      so no additonal overhead aside from memory usage.
      
      Avoid atomic_dec_and_test operation on skb_ext_put()
      by using similar trick as kfree_skbmem() does with fclone_ref:
      If recount is 1, there is no concurrent user and we can free right away.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df5042f4
  6. 09 12月, 2018 1 次提交
  7. 04 12月, 2018 2 次提交
    • W
      udp: elide zerocopy operation in hot path · 52900d22
      Willem de Bruijn 提交于
      With MSG_ZEROCOPY, each skb holds a reference to a struct ubuf_info.
      Release of its last reference triggers a completion notification.
      
      The TCP stack in tcp_sendmsg_locked holds an extra ref independent of
      the skbs, because it can build, send and free skbs within its loop,
      possibly reaching refcount zero and freeing the ubuf_info too soon.
      
      The UDP stack currently also takes this extra ref, but does not need
      it as all skbs are sent after return from __ip(6)_append_data.
      
      Avoid the extra refcount_inc and refcount_dec_and_test, and generally
      the sock_zerocopy_put in the common path, by passing the initial
      reference to the first skb.
      
      This approach is taken instead of initializing the refcount to 0, as
      that would generate error "refcount_t: increment on 0" on the
      next skb_zcopy_set.
      
      Changes
        v3 -> v4
          - Move skb_zcopy_set below the only kfree_skb that might cause
            a premature uarg destroy before skb_zerocopy_put_abort
            - Move the entire skb_shinfo assignment block, to keep that
              cacheline access in one place
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52900d22
    • W
      udp: msg_zerocopy · b5947e5d
      Willem de Bruijn 提交于
      Extend zerocopy to udp sockets. Allow setting sockopt SO_ZEROCOPY and
      interpret flag MSG_ZEROCOPY.
      
      This patch was previously part of the zerocopy RFC patchsets. Zerocopy
      is not effective at small MTU. With segmentation offload building
      larger datagrams, the benefit of page flipping outweights the cost of
      generating a completion notification.
      
      tools/testing/selftests/net/msg_zerocopy.sh after applying follow-on
      test patch and making skb_orphan_frags_rx same as skb_orphan_frags:
      
          ipv4 udp -t 1
          tx=191312 (11938 MB) txc=0 zc=n
          rx=191312 (11938 MB)
          ipv4 udp -z -t 1
          tx=304507 (19002 MB) txc=304507 zc=y
          rx=304507 (19002 MB)
          ok
          ipv6 udp -t 1
          tx=174485 (10888 MB) txc=0 zc=n
          rx=174485 (10888 MB)
          ipv6 udp -z -t 1
          tx=294801 (18396 MB) txc=294801 zc=y
          rx=294801 (18396 MB)
          ok
      
      Changes
        v1 -> v2
          - Fixup reverse christmas tree violation
        v2 -> v3
          - Split refcount avoidance optimization into separate patch
            - Fix refcount leak on error in fragmented case
              (thanks to Paolo Abeni for pointing this one out!)
            - Fix refcount inc on zero
            - Test sock_flag SOCK_ZEROCOPY directly in __ip_append_data.
              This is needed since commit 5cf4a853 ("tcp: really ignore
      	MSG_ZEROCOPY if no SO_ZEROCOPY") did the same for tcp.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b5947e5d
  8. 25 11月, 2018 1 次提交
    • W
      net: always initialize pagedlen · aba36930
      Willem de Bruijn 提交于
      In ip packet generation, pagedlen is initialized for each skb at the
      start of the loop in __ip(6)_append_data, before label alloc_new_skb.
      
      Depending on compiler options, code can be generated that jumps to
      this label, triggering use of an an uninitialized variable.
      
      In practice, at -O2, the generated code moves the initialization below
      the label. But the code should not rely on that for correctness.
      
      Fixes: 15e36f5b ("udp: paged allocation with gso")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aba36930
  9. 11 9月, 2018 1 次提交
  10. 24 7月, 2018 1 次提交
    • P
      ip: hash fragments consistently · 3dd1c9a1
      Paolo Abeni 提交于
      The skb hash for locally generated ip[v6] fragments belonging
      to the same datagram can vary in several circumstances:
      * for connected UDP[v6] sockets, the first fragment get its hash
        via set_owner_w()/skb_set_hash_from_sk()
      * for unconnected IPv6 UDPv6 sockets, the first fragment can get
        its hash via ip6_make_flowlabel()/skb_get_hash_flowi6(), if
        auto_flowlabel is enabled
      
      For the following frags the hash is usually computed via
      skb_get_hash().
      The above can cause OoO for unconnected IPv6 UDPv6 socket: in that
      scenario the egress tx queue can be selected on a per packet basis
      via the skb hash.
      It may also fool flow-oriented schedulers to place fragments belonging
      to the same datagram in different flows.
      
      Fix the issue by copying the skb hash from the head frag into
      the others at fragmentation time.
      
      Before this commit:
      perf probe -a "dev_queue_xmit skb skb->hash skb->l4_hash:b1@0/8 skb->sw_hash:b1@1/8"
      netperf -H $IPV4 -t UDP_STREAM -l 5 -- -m 2000 -n &
      perf record -e probe:dev_queue_xmit -e probe:skb_set_owner_w -a sleep 0.1
      perf script
      probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=3713014309 l4_hash=1 sw_hash=0
      probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=0 l4_hash=0 sw_hash=0
      
      After this commit:
      probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=2171763177 l4_hash=1 sw_hash=0
      probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=2171763177 l4_hash=1 sw_hash=0
      
      Fixes: b73c3d0e ("net: Save TX flow hash in sock and set in skbuf on xmit")
      Fixes: 67800f9b ("ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3dd1c9a1
  11. 07 7月, 2018 3 次提交
  12. 04 7月, 2018 2 次提交
  13. 20 6月, 2018 1 次提交
  14. 18 5月, 2018 1 次提交
  15. 11 5月, 2018 1 次提交
    • J
      tcp: Add mark for TIMEWAIT sockets · 00483690
      Jon Maxwell 提交于
      This version has some suggestions by Eric Dumazet:
      
      - Use a local variable for the mark in IPv6 instead of ctl_sk to avoid SMP
      races.
      - Use the more elegant "IP4_REPLY_MARK(net, skb->mark) ?: sk->sk_mark"
      statement.
      - Factorize code as sk_fullsock() check is not necessary.
      
      Aidan McGurn from Openwave Mobility systems reported the following bug:
      
      "Marked routing is broken on customer deployment. Its effects are large
      increase in Uplink retransmissions caused by the client never receiving
      the final ACK to their FINACK - this ACK misses the mark and routes out
      of the incorrect route."
      
      Currently marks are added to sk_buffs for replies when the "fwmark_reflect"
      sysctl is enabled. But not for TW sockets that had sk->sk_mark set via
      setsockopt(SO_MARK..).
      
      Fix this in IPv4/v6 by adding tw->tw_mark for TIME_WAIT sockets. Copy the the
      original sk->sk_mark in __inet_twsk_hashdance() to the new tw->tw_mark location.
      Then progate this so that the skb gets sent with the correct mark. Do the same
      for resets. Give the "fwmark_reflect" sysctl precedence over sk->sk_mark so that
      netfilter rules are still honored.
      Signed-off-by: NJon Maxwell <jmaxwell37@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      00483690
  16. 27 4月, 2018 3 次提交
    • W
      udp: paged allocation with gso · 15e36f5b
      Willem de Bruijn 提交于
      When sending large datagrams that are later segmented, store data in
      page frags to avoid copying from linear in skb_segment.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      15e36f5b
    • W
      udp: generate gso with UDP_SEGMENT · bec1f6f6
      Willem de Bruijn 提交于
      Support generic segmentation offload for udp datagrams. Callers can
      concatenate and send at once the payload of multiple datagrams with
      the same destination.
      
      To set segment size, the caller sets socket option UDP_SEGMENT to the
      length of each discrete payload. This value must be smaller than or
      equal to the relevant MTU.
      
      A follow-up patch adds cmsg UDP_SEGMENT to specify segment size on a
      per send call basis.
      
      Total byte length may then exceed MTU. If not an exact multiple of
      segment size, the last segment will be shorter.
      
      The implementation adds a gso_size field to the udp socket, ip(v6)
      cmsg cookie and inet_cork structure to be able to set the value at
      setsockopt or cmsg time and to work with both lockless and corked
      paths.
      
      Initial benchmark numbers show UDP GSO about as expensive as TCP GSO.
      
          tcp tso
           3197 MB/s 54232 msg/s 54232 calls/s
               6,457,754,262      cycles
      
          tcp gso
           1765 MB/s 29939 msg/s 29939 calls/s
              11,203,021,806      cycles
      
          tcp without tso/gso *
            739 MB/s 12548 msg/s 12548 calls/s
              11,205,483,630      cycles
      
          udp
            876 MB/s 14873 msg/s 624666 calls/s
              11,205,777,429      cycles
      
          udp gso
           2139 MB/s 36282 msg/s 36282 calls/s
              11,204,374,561      cycles
      
         [*] after reverting commit 0a6b2a1d
             ("tcp: switch to GSO being always on")
      
      Measured total system cycles ('-a') for one core while pinning both
      the network receive path and benchmark process to that core:
      
        perf stat -a -C 12 -e cycles \
          ./udpgso_bench_tx -C 12 -4 -D "$DST" -l 4
      
      Note the reduction in calls/s with GSO. Bytes per syscall drops
      increases from 1470 to 61818.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bec1f6f6
    • W
      udp: expose inet cork to udp · 1cd7884d
      Willem de Bruijn 提交于
      UDP segmentation offload needs access to inet_cork in the udp layer.
      Pass the struct to ip(6)_make_skb instead of allocating it on the
      stack in that function itself.
      
      This patch is a noop otherwise.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1cd7884d
  17. 18 4月, 2018 1 次提交
  18. 17 4月, 2018 1 次提交
  19. 04 4月, 2018 1 次提交
    • P
      net: avoid unneeded atomic operation in ip*_append_data() · 9e8445a5
      Paolo Abeni 提交于
      After commit 694aba69 ("ipv4: factorize sk_wmem_alloc updates
      done by __ip_append_data()") and commit 1f4c6eb2 ("ipv6:
      factorize sk_wmem_alloc updates done by __ip6_append_data()"),
      when transmitting sub MTU datagram, an addtional, unneeded atomic
      operation is performed in ip*_append_data() to update wmem_alloc:
      in the above condition the delta is 0.
      
      The above cause small but measurable performance regression in UDP
      xmit tput test with packet size below MTU.
      
      This change avoids such overhead updating wmem_alloc only if
      wmem_alloc_delta is non zero.
      
      The error path is left intentionally unmodified: it's a slow path
      and simplicity is preferred to performances.
      
      Fixes: 694aba69 ("ipv4: factorize sk_wmem_alloc updates done by __ip_append_data()")
      Fixes: 1f4c6eb2 ("ipv6: factorize sk_wmem_alloc updates done by __ip6_append_data()")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e8445a5
  20. 02 4月, 2018 1 次提交
  21. 05 3月, 2018 1 次提交
  22. 23 8月, 2017 1 次提交
  23. 11 8月, 2017 1 次提交
  24. 07 8月, 2017 1 次提交
  25. 18 7月, 2017 1 次提交
  26. 16 7月, 2017 1 次提交
    • V
      ipv4: ip_do_fragment: fix headroom tests · 254d900b
      Vasily Averin 提交于
      Some time ago David Woodhouse reported skb_under_panic
      when we try to push ethernet header to fragmented ipv6 skbs.
      It was fixed for ipv6 by Florian Westphal in
      commit 1d325d21 ("ipv6: ip6_fragment: fix headroom tests and skb leak")
      
      However similar problem still exist in ipv4.
      
      It does not trigger skb_under_panic due paranoid check
      in ip_finish_output2, however according to Alexey Kuznetsov
      current state is abnormal and ip_fragment should be fixed too.
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      254d900b
  27. 04 7月, 2017 1 次提交
  28. 01 7月, 2017 1 次提交
  29. 24 6月, 2017 1 次提交
    • M
      net: account for current skb length when deciding about UFO · a5cb659b
      Michal Kubeček 提交于
      Our customer encountered stuck NFS writes for blocks starting at specific
      offsets w.r.t. page boundary caused by networking stack sending packets via
      UFO enabled device with wrong checksum. The problem can be reproduced by
      composing a long UDP datagram from multiple parts using MSG_MORE flag:
      
        sendto(sd, buff, 1000, MSG_MORE, ...);
        sendto(sd, buff, 1000, MSG_MORE, ...);
        sendto(sd, buff, 3000, 0, ...);
      
      Assume this packet is to be routed via a device with MTU 1500 and
      NETIF_F_UFO enabled. When second sendto() gets into __ip_append_data(),
      this condition is tested (among others) to decide whether to call
      ip_ufo_append_data():
      
        ((length + fragheaderlen) > mtu) || (skb && skb_is_gso(skb))
      
      At the moment, we already have skb with 1028 bytes of data which is not
      marked for GSO so that the test is false (fragheaderlen is usually 20).
      Thus we append second 1000 bytes to this skb without invoking UFO. Third
      sendto(), however, has sufficient length to trigger the UFO path so that we
      end up with non-UFO skb followed by a UFO one. Later on, udp_send_skb()
      uses udp_csum() to calculate the checksum but that assumes all fragments
      have correct checksum in skb->csum which is not true for UFO fragments.
      
      When checking against MTU, we need to add skb->len to length of new segment
      if we already have a partially filled skb and fragheaderlen only if there
      isn't one.
      
      In the IPv6 case, skb can only be null if this is the first segment so that
      we have to use headersize (length of the first IPv6 header) rather than
      fragheaderlen (length of IPv6 header of further fragments) for skb == NULL.
      
      Fixes: e89e9cf5 ("[IPv4/IPv6]: UFO Scatter-gather approach")
      Fixes: e4c5e13a ("ipv6: Should use consistent conditional judgement for
      	ip6 fragment between __ip6_append_data and ip6_finish_output")
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Acked-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5cb659b
  30. 10 3月, 2017 1 次提交
    • A
      udp: avoid ufo handling on IP payload compression packets · 4b3b45ed
      Alexey Kodanev 提交于
      commit c146066a ("ipv4: Don't use ufo handling on later transformed
      packets") and commit f89c56ce ("ipv6: Don't use ufo handling on
      later transformed packets") added a check that 'rt->dst.header_len' isn't
      zero in order to skip UFO, but it doesn't include IPcomp in transport mode
      where it equals zero.
      
      Packets, after payload compression, may not require further fragmentation,
      and if original length exceeds MTU, later compressed packets will be
      transmitted incorrectly. This can be reproduced with LTP udp_ipsec.sh test
      on veth device with enabled UFO, MTU is 1500 and UDP payload is 2000:
      
      * IPv4 case, offset is wrong + unnecessary fragmentation
          udp_ipsec.sh -p comp -m transport -s 2000 &
          tcpdump -ni ltp_ns_veth2
          ...
          IP (tos 0x0, ttl 64, id 45203, offset 0, flags [+],
            proto Compressed IP (108), length 49)
            10.0.0.2 > 10.0.0.1: IPComp(cpi=0x1000)
          IP (tos 0x0, ttl 64, id 45203, offset 1480, flags [none],
            proto UDP (17), length 21) 10.0.0.2 > 10.0.0.1: ip-proto-17
      
      * IPv6 case, sending small fragments
          udp_ipsec.sh -6 -p comp -m transport -s 2000 &
          tcpdump -ni ltp_ns_veth2
          ...
          IP6 (flowlabel 0x6b9ba, hlim 64, next-header Compressed IP (108)
            payload length: 37) fd00::2 > fd00::1: IPComp(cpi=0x1000)
          IP6 (flowlabel 0x6b9ba, hlim 64, next-header Compressed IP (108)
            payload length: 21) fd00::2 > fd00::1: IPComp(cpi=0x1000)
      
      Fix it by checking 'rt->dst.xfrm' pointer to 'xfrm_state' struct, skip UFO
      if xfrm is set. So the new check will include both cases: IPcomp and IPsec.
      
      Fixes: c146066a ("ipv4: Don't use ufo handling on later transformed packets")
      Fixes: f89c56ce ("ipv6: Don't use ufo handling on later transformed packets")
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b3b45ed
  31. 12 2月, 2017 1 次提交
  32. 08 2月, 2017 1 次提交