1. 16 9月, 2016 3 次提交
  2. 15 9月, 2016 10 次提交
  3. 14 9月, 2016 6 次提交
  4. 13 9月, 2016 3 次提交
  5. 12 9月, 2016 10 次提交
  6. 11 9月, 2016 8 次提交
    • D
      Merge branch 'vrf-tx-hook' · 8fee3156
      David S. Miller 提交于
      David Ahern says:
      
      ====================
      net: Convert vrf to tx hook
      
      The motivation for this series is that ICMP Unreachable - Fragmentation
      Needed packets are not handled properly for VRFs. Specifically, the
      FIB lookup in __ip_rt_update_pmtu fails so no nexthop exception is
      created with the reduced MTU. As a result connections stall if packets
      larger than the smallest MTU in the path are generated.
      
      While investigating that problem I also noticed that the MSS for all
      connections in a VRF is based on the VRF device's MTU and not the
      route the packets ultimately go through. VRF currently uses a dst
      to direct packets to the device. The first FIB lookup returns this dst
      and then the lookup in the VRF driver gets the actual output route. A
      side effect of this design is that the VRF dst is cached on sockets
      and then used for calculations like the MSS.
      
      This series fixes this problem by removing the hook in the FIB lookups
      that returns the dst pointing to the VRF device to the VRF and always
      doing the actual FIB lookup. This allows the real dst to be used
      throughout the stack (for example the MSS). Packets are diverted to
      the VRF device on Tx using an l3mdev hook in the output path similar to
      to what is done for Rx. The end result is a simpler implementation for
      VRF with fewer intrusions into the network stack and symmetrical packet
      handling for Rx and Tx paths.
      
      Comparison of netperf performance for a build without l3mdev (best case
      performance), the old vrf driver and the VRF driver from this series.
      Data are collected using VMs with virtio + vhost. The netperf client
      runs in the VM and netserver runs in the host. 1-byte RR tests are done
      as these packets exaggerate the performance hit due to the extra lookups
      done for l3mdev and VRF.
      
      Command: netperf -cC -H ${ip} -l 60 -t {TCP,UDP}_RR [-J red]
      
                            TCP_RR              UDP_RR
                         IPv4     IPv6       IPv4     IPv6
      no l3mdev        29,996   30,601     31,638   24,336
      vrf old          27,417   27,626     29,159   24,801
      vrf new          28,036   28,372     30,110   24,857
      l3mdev, no vrf   29,534   30,465     30,670   24,346
      
       * Transactions per second as reported by netperf
       * netperf modified to take a bind-to-device argument -- the -J red option
      
      1. 'no l3mdev'      == NET_L3_MASTER_DEV is unset so code is compiled out
      2. 'vrf old'        == data for existing implementation
      3. 'vrf new'        == data with this series
      4. 'l3mdev, no vrf' == NET_L3_MASTER_DEV is enabled but traffic is not
                             going through a VRF
      
      About the series
      - patch 1 adds the flow update (changing oif or iif to L3 master device
        and setting the flag to skip the oif check) to ipv4 and ipv6 paths just
        before hitting the rules. This catches all code paths in a single spot.
      
      - patch 2 adds the Tx hook to push the packet to the l3mdev if relevant
      
      - patch 3 adds some checks so the vrf device can act as a vrf-local
        loopback. These changes were not needed before since the vrf dst was
        returned from the lookup.
      
      - patches 4 and 5 flip the ipv4 and ipv6 stacks to the tx hook leaving
        the route lookup to be the real one. The dst flip happens at the
        beginning of the L3 output path so the VRFs can have device based
        features such as netfilter, tc and tcpdump.
      
      - patches 6-11 remove no longer needed l3mdev code
      
      v2
      - properly handle IPv6 link scope addresses
      
      - keep the device xmit path and associated dst which is switched in by
        the l3_out hook. packets still need to go through the xmit path in
        case the user puts a qdisc on the vrf device and to allow tc rules.
        version 1 short circuited the tx handling and only covered netfilter
        and tcpdump.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8fee3156
    • D
      net: flow: Remove FLOWI_FLAG_L3MDEV_SRC flag · c71ad3d4
      David Ahern 提交于
      No longer used
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c71ad3d4
    • D
      net: l3mdev: remove get_rtable method · afb460fe
      David Ahern 提交于
      No longer used
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      afb460fe
    • D
      net: l3mdev: Remove l3mdev_fib_oif · ca28b8f2
      David Ahern 提交于
      No longer used
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca28b8f2
    • D
      net: ipv6: Remove l3mdev_get_saddr6 · 8a966fc0
      David Ahern 提交于
      No longer needed
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a966fc0
    • D
      net: ipv4: Remove l3mdev_get_saddr · d66f6c0a
      David Ahern 提交于
      No longer needed
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d66f6c0a
    • D
      net: l3mdev: remove redundant calls · e0d56fdd
      David Ahern 提交于
      A previous patch added l3mdev flow update making these hooks
      redundant. Remove them.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e0d56fdd
    • D
      net: vrf: Flip IPv6 output path from FIB lookup hook to out hook · 4c1feac5
      David Ahern 提交于
      Flip the IPv6 output path to use the l3mdev tx out hook. The VRF dst
      is not returned on the first FIB lookup. Instead, the dst on the
      skb is switched at the beginning of the IPv6 output processing to
      send the packet to the VRF driver on xmit.
      
      Link scope addresses (linklocal and multicast) need special handling:
      specifically the oif the flow struct can not be changed because we
      want the lookup tied to the enslaved interface. ie., the source address
      and the returned route MUST point to the interface scope passed in.
      Convert the existing vrf_get_rt6_dst to handle only link scope addresses.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c1feac5