1. 23 3月, 2017 13 次提交
    • A
      openvswitch: Optimize sample action for the clone use cases · 798c1661
      andy zhou 提交于
      With the introduction of open flow 'clone' action, the OVS user space
      can now translate the 'clone' action into kernel datapath 'sample'
      action, with 100% probability, to ensure that the clone semantics,
      which is that the packet seen by the clone action is the same as the
      packet seen by the action after clone, is faithfully carried out
      in the datapath.
      
      While the sample action in the datpath has the matching semantics,
      its implementation is only optimized for its original use.
      Specifically, there are two limitation: First, there is a 3 level of
      nesting restriction, enforced at the flow downloading time. This
      limit turns out to be too restrictive for the 'clone' use case.
      Second, the implementation avoid recursive call only if the sample
      action list has a single userspace action.
      
      The main optimization implemented in this series removes the static
      nesting limit check, instead, implement the run time recursion limit
      check, and recursion avoidance similar to that of the 'recirc' action.
      This optimization solve both #1 and #2 issues above.
      
      One related optimization attempts to avoid copying flow key as
      long as the actions enclosed does not change the flow key. The
      detection is performed only once at the flow downloading time.
      
      Another related optimization is to rewrite the action list
      at flow downloading time in order to save the fast path from parsing
      the sample action list in its original form repeatedly.
      Signed-off-by: NAndy Zhou <azhou@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      798c1661
    • A
      openvswitch: Refactor recirc key allocation. · 4572ef52
      andy zhou 提交于
      The logic of allocating and copy key for each 'exec_actions_level'
      was specific to execute_recirc(). However, future patches will reuse
      as well.  Refactor the logic into its own function clone_key().
      Signed-off-by: NAndy Zhou <azhou@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4572ef52
    • A
      openvswitch: Deferred fifo API change. · 47c697aa
      andy zhou 提交于
      add_deferred_actions() API currently requires actions to be passed in
      as a fully encoded netlink message. So far both 'sample' and 'recirc'
      actions happens to carry actions as fully encoded netlink messages.
      However, this requirement is more restrictive than necessary, future
      patch will need to pass in action lists that are not fully encoded
      by themselves.
      Signed-off-by: NAndy Zhou <azhou@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47c697aa
    • D
      Merge branch 'vrf-perf' · 29dd5ec0
      David S. Miller 提交于
      David Ahern says:
      
      ====================
      net: vrf: performance improvements
      
      Device based features for VRF such as qdisc, netfilter and packet
      captures are implemented by switching the dst on skbuffs to its per-VRF
      dst. This has the effect of controlling the output function which points
      a function in the VRF driver. [1] The skb proceeds down the stack with
      dst->dev pointing to the VRF device. Netfilter, qdisc and tc rules and
      network taps are evaluated based on this device. Finally, the skb makes
      it to the vrf_xmit function which resets the dst based on a FIB lookup.
      
      The feature comes at cost - between 5 and 10% depending on test (TCP vs
      UDP, stream vs RR and IPv4 vs IPv6). The main cost is requiring a FIB
      lookup in the VRF driver for each packet sent through it. The FIB lookup
      is required because the real dst gets dropped so that the skb can
      traverse the stack with dst->dev set to the VRF device.
      
      All of that is really driven by the qdisc and not replicating the
      processing of __dev_queue_xmit if a qdisc is set up on the device. But,
      VRF devices by default do not have a qdisc and really have no need for
      multiple Tx queues. This means the performance overhead is inflicted upon
      all users for the potential use case of a qdisc being configured.
      
      The overhead can be avoided by checking if the default configuration
      applies to a specific VRF device before switching the dst. If a device
      does not have a qdisc, the pass through netfilter hooks and packet taps
      can be done inline without dropping the dst and thus avoiding the
      performance penalty. With this change performance overhead of VRF drops
      to neglible (difference with run-over-run variance) to 3% depending on
      test type.
      
      netperf performance comparison for 3 cases:
      1. L3_MASTER_DEVICE compiled out
      2. VRF with this patch set
      3. current VRF code
      
      IPv4
      ----
                 no-l3mdev     new-vrf     old-vrf
      TCP_RR       28778        28938*       27169
      TCP_CRR      10706        10490         9770
      UDP_RR       30750        29813        29256
      
      * Although higher in the final run used for submitting this patch set, I
        think what this really represents is a neglible performance overhead for
        VRF with this change (i.e, within the +-1% variance of runs). Most
        notably the FIB lookups in the Tx path are avoided for TCP_RR.
      
      IPv6
      ----
                 no-l3mdev     new-vrf     old-vrf
      TCP_RR       29495        29432       27794
      TCP_CRR      10520        10338        9870
      UDP_RR       26137        27019*      26511
      
      * UDP is consistently better with VRF for two reasons:
        1. Source address selection with L3 domains is considering fewer
           addresses since only addresses on interfaces in the domain are
           considered for the selection. Specifically, perf-top shows
           shows ipv6_get_saddr_eval, ipv6_dev_get_saddr and __ipv6_dev_get_saddr
           running much lower with vrf than without.
      
        2. The VRF table contains all routes (i.e, there are no separate local
           and main tables per VRF). That means ip6_pol_route_output only has 1
           lookup for VRF where it does 2 without it (1 in the local table and 1
           in the main table).
      
      [1] http://netdevconf.org/1.2/papers/ahern-what-is-l3mdev-paper.pdf
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29dd5ec0
    • D
      net: vrf: performance improvements for IPv6 · a9ec54d1
      David Ahern 提交于
      The VRF driver allows users to implement device based features for an
      entire domain. For example, a qdisc or netfilter rules can be attached
      to a VRF device or tcpdump can be used to view packets for all devices
      in the L3 domain.
      
      The device-based features come with a performance penalty, most
      notably in the Tx path. The VRF driver uses the l3mdev_l3_out hook
      to switch the dst on an skb to its private dst. This allows the skb
      to traverse the xmit stack with the device set to the VRF device
      which in turn enables the netfilter and qdisc features. The VRF
      driver then performs the FIB lookup again and reinserts the packet.
      
      This patch avoids the redirect for IPv6 packets if a qdisc has not
      been attached to a VRF device which is the default config. In this
      case the netfilter hooks and network taps are directly traversed in
      the l3mdev_l3_out handler. If a qdisc is attached to a VRF device,
      then the redirect using the vrf dst is done.
      
      Additional overhead is removed by only checking packet taps if a
      socket is open on the device (vrf_dev->ptype_all list is not empty).
      Packet sockets bound to any device will still get a copy of the
      packet via the real ingress or egress interface.
      
      The end result of this change is a decrease in the overhead of VRF
      for the default, baseline case (ie., no netfilter rules, no packet
      sockets, no qdisc) from a +3% improvement for UDP which has a lookup
      per packet (VRF being better than no l3mdev) to ~2% loss for TCP_CRR
      which connects a socket for each request-response.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a9ec54d1
    • D
      net: vrf: performance improvements for IPv4 · dcdd43c4
      David Ahern 提交于
      The VRF driver allows users to implement device based features for an
      entire domain. For example, a qdisc or netfilter rules can be attached
      to a VRF device or tcpdump can be used to view packets for all devices
      in the L3 domain.
      
      The device-based features come with a performance penalty, most
      notably in the Tx path. The VRF driver uses the l3mdev_l3_out hook
      to switch the dst on an skb to its private dst. This allows the skb
      to traverse the xmit stack with the device set to the VRF device
      which in turn enables the netfilter and qdisc features. The VRF
      driver then performs the FIB lookup again and reinserts the packet.
      
      This patch avoids the redirect for IPv4 packets if a qdisc has not
      been attached to a VRF device which is the default config. In this
      case the netfilter hooks and network taps are directly traversed in
      the l3mdev_l3_out handler. If a qdisc is attached to a VRF device,
      then the redirect using the vrf dst is done.
      
      Additional overhead is removed by only checking packet taps if a
      socket is open on the device (vrf_dev->ptype_all list is not empty).
      Packet sockets bound to any device will still get a copy of the
      packet via the real ingress or egress interface.
      
      The end result of this change is a decrease in the overhead of VRF
      for the default, baseline case (ie., no netfilter rules, no packet
      sockets, no qdisc) to ~3% for UDP which has a lookup per packet and
      < 1% overhead for connected sockets that leverage early demux and
      avoid FIB lookups.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dcdd43c4
    • J
      sock: introduce SO_MEMINFO getsockopt · a2d133b1
      Josh Hunt 提交于
      Allows reading of SK_MEMINFO_VARS via socket option. This way an
      application can get all meminfo related information in single socket
      option call instead of multiple calls.
      
      Adds helper function, sk_get_meminfo(), and uses that for both
      getsockopt and sock_diag_put_meminfo().
      
      Suggested by Eric Dumazet.
      Signed-off-by: NJosh Hunt <johunt@akamai.com>
      Reviewed-by: NJason Baron <jbaron@akamai.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2d133b1
    • C
      mlxsw: spectrum: fix swapped order of arguments packets and bytes · c7cd4c9b
      Colin Ian King 提交于
      The arguments packets and bytes to call mlxsw_sp_acl_rule_get_stats are
      in the wrong order. Fix this by swapping them.
      
      Detected by CoverityScan, CID#1419705 ("Arguments in wrong order")
      
      Fixes: 7c1b8eb1 ("mlxsw: spectrum: Add support for TC flower offload statistics")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Acked-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7cd4c9b
    • A
      cxgb4: Update IngPad and IngPack values · bb58d079
      Arjun Vynipadath 提交于
      We are using the smallest padding boundary (8 bytes), which isn't
      smaller than the Memory Controller Read/Write Size
      
      We get best performance in 100G when the Packing Boundary is a multiple
      of the Maximum Payload Size. Its related to inefficient chopping of DMA
      packets by PCIe, that causes more overhead on bus. So driver is helping
      by making the starting address alignment to be MPS size.
      
      We will try to determine PCIE MaxPayloadSize capabiltiy  and set
      IngPackBoundary based on this value. If cache line size is greater than
      MPS or determinig MPS fails, we will use cache line size to determine
      IngPackBoundary(as before).
      Signed-off-by: NArjun Vynipadath <arjun@chelsio.com>
      Signed-off-by: NCasey Leedom <leedom@chelsio.com>
      Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bb58d079
    • A
      net: dwc-xlgmac: add module license · 3588f29e
      Arnd Bergmann 提交于
      When building the driver as a module, we get a warning about the
      lack of a license:
      
      WARNING: modpost: missing MODULE_LICENSE() in drivers/net/ethernet/synopsys/dwc-xlgmac.o
      see include/linux/module.h for more information
      
      Curiously the text in the .c files only mentions GPLv2+, while the license
      tag in the PCI driver contains both GPL and BSD. I picked the license text
      as the more definite reference here and put a GPL tag in there.
      
      Fixes: 65e0ace2 ("net: dwc-xlgmac: Initial driver for DesignWare Enterprise Ethernet")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3588f29e
    • A
      net: dwc-xlgmac: include dcbnl.h · 424fa00e
      Arnd Bergmann 提交于
      Without this header, we can run into a build error:
      
      drivers/net/ethernet/synopsys/dwc-xlgmac-hw.c: In function 'xlgmac_config_queue_mapping':
      drivers/net/ethernet/synopsys/dwc-xlgmac-hw.c:1548:36: error: 'IEEE_8021QAZ_MAX_TCS' undeclared (first use in this function)
        prio_queues = min_t(unsigned int, IEEE_8021QAZ_MAX_TCS,
      
      Fixes: 65e0ace2 ("net: dwc-xlgmac: Initial driver for DesignWare Enterprise Ethernet")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NJie Deng <jiedeng@synopsys.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      424fa00e
    • R
      neighbour: fix nlmsg_pid in notifications · 7b8f7a40
      Roopa Prabhu 提交于
      neigh notifications today carry pid 0 for nlmsg_pid
      in all cases. This patch fixes it to carry calling process
      pid when available. Applications (eg. quagga) rely on
      nlmsg_pid to ignore notifications generated by their own
      netlink operations. This patch follows the routing subsystem
      which already sets this correctly.
      Reported-by: NVivek Venkatraman <vivek@cumulusnetworks.com>
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b8f7a40
    • D
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 7ada7ca5
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2017-03-21
      
      This series contains updates to e1000, e1000e, igb, igbvf and ixgb.
      
      This finishes up the work Philippe Reynes did to update the Intel drivers
      to the new API for ethtool (get|set)_link_ksettings.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ada7ca5
  2. 22 3月, 2017 27 次提交