1. 23 3月, 2017 20 次提交
    • T
      net: stmmac: Always enable MAC RX queues · f3976874
      Thierry Reding 提交于
      The MAC RX queues always need to be enabled in order to receive network
      packets. Remove the condition that this only needs to be done for multi-
      queue configurations.
      Signed-off-by: NThierry Reding <treding@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f3976874
    • R
      net: convert sk_filter.refcnt from atomic_t to refcount_t · 4c355cdf
      Reshetova, Elena 提交于
      refcount_t type and corresponding API should be
      used instead of atomic_t when the variable is used as
      a reference counter. This allows to avoid accidental
      refcounter overflows that might lead to use-after-free
      situations.
      Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
      Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c355cdf
    • T
      net: greth: Utilize of_get_mac_address() · 726bceca
      Tobias Klauser 提交于
      Do not open code getting the MAC address exclusively from the
      "local-mac-address" property, but instead use of_get_mac_address() which
      looks up the MAC address using the 3 typical property names.
      Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      726bceca
    • F
      liquidio: fix Coverity scan errors · 58ad3198
      Felix Manlunas 提交于
      Fix Coverity scan errors by not dereferencing lio->glists_dma_base pointer
      if it's NULL.
      
      See http://marc.info/?l=linux-netdev&m=149002294305614&w=2Reported-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NFelix Manlunas <felix.manlunas@cavium.com>
      Signed-off-by: NVSR Burru <veerasenareddy.burru@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58ad3198
    • G
      net: tcp: Permit user set TCP_MAXSEG to default value · cfc62d87
      Gao Feng 提交于
      When user_mss is zero, it means use the default value. But the current
      codes don't permit user set TCP_MAXSEG to the default value.
      It would return the -EINVAL when val is zero.
      Signed-off-by: NGao Feng <fgao@ikuai8.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cfc62d87
    • D
      Merge branch 'ovs-sample-action-optimization' · b2a1674a
      David S. Miller 提交于
      Andy Zhou says:
      
      ====================
      net-next sample action optimization v4
      
      The sample action can be used for translating Openflow 'clone' action.
      However its implementation has not been sufficiently optimized for this
      use case. This series attempts to close the gap.
      
      Patch 3 commit message has more details on the specific optimizations
      implemented.
      
      ---
      v3->v4: Enhance patch 4.
              Fix two bugs pointed out by Pravin,
              Remove 'is_sample' variable.
      
      v2->v3: Enhance patch 4, Rafctor to move more common logic to clone_execute().
      
      v1->v2: Address Pravin's comment, Refactor recirc and sample
              to share more common code
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b2a1674a
    • A
      Openvswitch: Refactor sample and recirc actions implementation · bef7f756
      andy zhou 提交于
      Added clone_execute() that both the sample and the recirc
      action implementation can use.
      Signed-off-by: NAndy Zhou <azhou@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bef7f756
    • A
      openvswitch: Optimize sample action for the clone use cases · 798c1661
      andy zhou 提交于
      With the introduction of open flow 'clone' action, the OVS user space
      can now translate the 'clone' action into kernel datapath 'sample'
      action, with 100% probability, to ensure that the clone semantics,
      which is that the packet seen by the clone action is the same as the
      packet seen by the action after clone, is faithfully carried out
      in the datapath.
      
      While the sample action in the datpath has the matching semantics,
      its implementation is only optimized for its original use.
      Specifically, there are two limitation: First, there is a 3 level of
      nesting restriction, enforced at the flow downloading time. This
      limit turns out to be too restrictive for the 'clone' use case.
      Second, the implementation avoid recursive call only if the sample
      action list has a single userspace action.
      
      The main optimization implemented in this series removes the static
      nesting limit check, instead, implement the run time recursion limit
      check, and recursion avoidance similar to that of the 'recirc' action.
      This optimization solve both #1 and #2 issues above.
      
      One related optimization attempts to avoid copying flow key as
      long as the actions enclosed does not change the flow key. The
      detection is performed only once at the flow downloading time.
      
      Another related optimization is to rewrite the action list
      at flow downloading time in order to save the fast path from parsing
      the sample action list in its original form repeatedly.
      Signed-off-by: NAndy Zhou <azhou@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      798c1661
    • A
      openvswitch: Refactor recirc key allocation. · 4572ef52
      andy zhou 提交于
      The logic of allocating and copy key for each 'exec_actions_level'
      was specific to execute_recirc(). However, future patches will reuse
      as well.  Refactor the logic into its own function clone_key().
      Signed-off-by: NAndy Zhou <azhou@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4572ef52
    • A
      openvswitch: Deferred fifo API change. · 47c697aa
      andy zhou 提交于
      add_deferred_actions() API currently requires actions to be passed in
      as a fully encoded netlink message. So far both 'sample' and 'recirc'
      actions happens to carry actions as fully encoded netlink messages.
      However, this requirement is more restrictive than necessary, future
      patch will need to pass in action lists that are not fully encoded
      by themselves.
      Signed-off-by: NAndy Zhou <azhou@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47c697aa
    • D
      Merge branch 'vrf-perf' · 29dd5ec0
      David S. Miller 提交于
      David Ahern says:
      
      ====================
      net: vrf: performance improvements
      
      Device based features for VRF such as qdisc, netfilter and packet
      captures are implemented by switching the dst on skbuffs to its per-VRF
      dst. This has the effect of controlling the output function which points
      a function in the VRF driver. [1] The skb proceeds down the stack with
      dst->dev pointing to the VRF device. Netfilter, qdisc and tc rules and
      network taps are evaluated based on this device. Finally, the skb makes
      it to the vrf_xmit function which resets the dst based on a FIB lookup.
      
      The feature comes at cost - between 5 and 10% depending on test (TCP vs
      UDP, stream vs RR and IPv4 vs IPv6). The main cost is requiring a FIB
      lookup in the VRF driver for each packet sent through it. The FIB lookup
      is required because the real dst gets dropped so that the skb can
      traverse the stack with dst->dev set to the VRF device.
      
      All of that is really driven by the qdisc and not replicating the
      processing of __dev_queue_xmit if a qdisc is set up on the device. But,
      VRF devices by default do not have a qdisc and really have no need for
      multiple Tx queues. This means the performance overhead is inflicted upon
      all users for the potential use case of a qdisc being configured.
      
      The overhead can be avoided by checking if the default configuration
      applies to a specific VRF device before switching the dst. If a device
      does not have a qdisc, the pass through netfilter hooks and packet taps
      can be done inline without dropping the dst and thus avoiding the
      performance penalty. With this change performance overhead of VRF drops
      to neglible (difference with run-over-run variance) to 3% depending on
      test type.
      
      netperf performance comparison for 3 cases:
      1. L3_MASTER_DEVICE compiled out
      2. VRF with this patch set
      3. current VRF code
      
      IPv4
      ----
                 no-l3mdev     new-vrf     old-vrf
      TCP_RR       28778        28938*       27169
      TCP_CRR      10706        10490         9770
      UDP_RR       30750        29813        29256
      
      * Although higher in the final run used for submitting this patch set, I
        think what this really represents is a neglible performance overhead for
        VRF with this change (i.e, within the +-1% variance of runs). Most
        notably the FIB lookups in the Tx path are avoided for TCP_RR.
      
      IPv6
      ----
                 no-l3mdev     new-vrf     old-vrf
      TCP_RR       29495        29432       27794
      TCP_CRR      10520        10338        9870
      UDP_RR       26137        27019*      26511
      
      * UDP is consistently better with VRF for two reasons:
        1. Source address selection with L3 domains is considering fewer
           addresses since only addresses on interfaces in the domain are
           considered for the selection. Specifically, perf-top shows
           shows ipv6_get_saddr_eval, ipv6_dev_get_saddr and __ipv6_dev_get_saddr
           running much lower with vrf than without.
      
        2. The VRF table contains all routes (i.e, there are no separate local
           and main tables per VRF). That means ip6_pol_route_output only has 1
           lookup for VRF where it does 2 without it (1 in the local table and 1
           in the main table).
      
      [1] http://netdevconf.org/1.2/papers/ahern-what-is-l3mdev-paper.pdf
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29dd5ec0
    • D
      net: vrf: performance improvements for IPv6 · a9ec54d1
      David Ahern 提交于
      The VRF driver allows users to implement device based features for an
      entire domain. For example, a qdisc or netfilter rules can be attached
      to a VRF device or tcpdump can be used to view packets for all devices
      in the L3 domain.
      
      The device-based features come with a performance penalty, most
      notably in the Tx path. The VRF driver uses the l3mdev_l3_out hook
      to switch the dst on an skb to its private dst. This allows the skb
      to traverse the xmit stack with the device set to the VRF device
      which in turn enables the netfilter and qdisc features. The VRF
      driver then performs the FIB lookup again and reinserts the packet.
      
      This patch avoids the redirect for IPv6 packets if a qdisc has not
      been attached to a VRF device which is the default config. In this
      case the netfilter hooks and network taps are directly traversed in
      the l3mdev_l3_out handler. If a qdisc is attached to a VRF device,
      then the redirect using the vrf dst is done.
      
      Additional overhead is removed by only checking packet taps if a
      socket is open on the device (vrf_dev->ptype_all list is not empty).
      Packet sockets bound to any device will still get a copy of the
      packet via the real ingress or egress interface.
      
      The end result of this change is a decrease in the overhead of VRF
      for the default, baseline case (ie., no netfilter rules, no packet
      sockets, no qdisc) from a +3% improvement for UDP which has a lookup
      per packet (VRF being better than no l3mdev) to ~2% loss for TCP_CRR
      which connects a socket for each request-response.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a9ec54d1
    • D
      net: vrf: performance improvements for IPv4 · dcdd43c4
      David Ahern 提交于
      The VRF driver allows users to implement device based features for an
      entire domain. For example, a qdisc or netfilter rules can be attached
      to a VRF device or tcpdump can be used to view packets for all devices
      in the L3 domain.
      
      The device-based features come with a performance penalty, most
      notably in the Tx path. The VRF driver uses the l3mdev_l3_out hook
      to switch the dst on an skb to its private dst. This allows the skb
      to traverse the xmit stack with the device set to the VRF device
      which in turn enables the netfilter and qdisc features. The VRF
      driver then performs the FIB lookup again and reinserts the packet.
      
      This patch avoids the redirect for IPv4 packets if a qdisc has not
      been attached to a VRF device which is the default config. In this
      case the netfilter hooks and network taps are directly traversed in
      the l3mdev_l3_out handler. If a qdisc is attached to a VRF device,
      then the redirect using the vrf dst is done.
      
      Additional overhead is removed by only checking packet taps if a
      socket is open on the device (vrf_dev->ptype_all list is not empty).
      Packet sockets bound to any device will still get a copy of the
      packet via the real ingress or egress interface.
      
      The end result of this change is a decrease in the overhead of VRF
      for the default, baseline case (ie., no netfilter rules, no packet
      sockets, no qdisc) to ~3% for UDP which has a lookup per packet and
      < 1% overhead for connected sockets that leverage early demux and
      avoid FIB lookups.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dcdd43c4
    • J
      sock: introduce SO_MEMINFO getsockopt · a2d133b1
      Josh Hunt 提交于
      Allows reading of SK_MEMINFO_VARS via socket option. This way an
      application can get all meminfo related information in single socket
      option call instead of multiple calls.
      
      Adds helper function, sk_get_meminfo(), and uses that for both
      getsockopt and sock_diag_put_meminfo().
      
      Suggested by Eric Dumazet.
      Signed-off-by: NJosh Hunt <johunt@akamai.com>
      Reviewed-by: NJason Baron <jbaron@akamai.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2d133b1
    • C
      mlxsw: spectrum: fix swapped order of arguments packets and bytes · c7cd4c9b
      Colin Ian King 提交于
      The arguments packets and bytes to call mlxsw_sp_acl_rule_get_stats are
      in the wrong order. Fix this by swapping them.
      
      Detected by CoverityScan, CID#1419705 ("Arguments in wrong order")
      
      Fixes: 7c1b8eb1 ("mlxsw: spectrum: Add support for TC flower offload statistics")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Acked-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7cd4c9b
    • A
      cxgb4: Update IngPad and IngPack values · bb58d079
      Arjun Vynipadath 提交于
      We are using the smallest padding boundary (8 bytes), which isn't
      smaller than the Memory Controller Read/Write Size
      
      We get best performance in 100G when the Packing Boundary is a multiple
      of the Maximum Payload Size. Its related to inefficient chopping of DMA
      packets by PCIe, that causes more overhead on bus. So driver is helping
      by making the starting address alignment to be MPS size.
      
      We will try to determine PCIE MaxPayloadSize capabiltiy  and set
      IngPackBoundary based on this value. If cache line size is greater than
      MPS or determinig MPS fails, we will use cache line size to determine
      IngPackBoundary(as before).
      Signed-off-by: NArjun Vynipadath <arjun@chelsio.com>
      Signed-off-by: NCasey Leedom <leedom@chelsio.com>
      Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bb58d079
    • A
      net: dwc-xlgmac: add module license · 3588f29e
      Arnd Bergmann 提交于
      When building the driver as a module, we get a warning about the
      lack of a license:
      
      WARNING: modpost: missing MODULE_LICENSE() in drivers/net/ethernet/synopsys/dwc-xlgmac.o
      see include/linux/module.h for more information
      
      Curiously the text in the .c files only mentions GPLv2+, while the license
      tag in the PCI driver contains both GPL and BSD. I picked the license text
      as the more definite reference here and put a GPL tag in there.
      
      Fixes: 65e0ace2 ("net: dwc-xlgmac: Initial driver for DesignWare Enterprise Ethernet")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3588f29e
    • A
      net: dwc-xlgmac: include dcbnl.h · 424fa00e
      Arnd Bergmann 提交于
      Without this header, we can run into a build error:
      
      drivers/net/ethernet/synopsys/dwc-xlgmac-hw.c: In function 'xlgmac_config_queue_mapping':
      drivers/net/ethernet/synopsys/dwc-xlgmac-hw.c:1548:36: error: 'IEEE_8021QAZ_MAX_TCS' undeclared (first use in this function)
        prio_queues = min_t(unsigned int, IEEE_8021QAZ_MAX_TCS,
      
      Fixes: 65e0ace2 ("net: dwc-xlgmac: Initial driver for DesignWare Enterprise Ethernet")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NJie Deng <jiedeng@synopsys.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      424fa00e
    • R
      neighbour: fix nlmsg_pid in notifications · 7b8f7a40
      Roopa Prabhu 提交于
      neigh notifications today carry pid 0 for nlmsg_pid
      in all cases. This patch fixes it to carry calling process
      pid when available. Applications (eg. quagga) rely on
      nlmsg_pid to ignore notifications generated by their own
      netlink operations. This patch follows the routing subsystem
      which already sets this correctly.
      Reported-by: NVivek Venkatraman <vivek@cumulusnetworks.com>
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b8f7a40
    • D
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 7ada7ca5
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2017-03-21
      
      This series contains updates to e1000, e1000e, igb, igbvf and ixgb.
      
      This finishes up the work Philippe Reynes did to update the Intel drivers
      to the new API for ethtool (get|set)_link_ksettings.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ada7ca5
  2. 22 3月, 2017 20 次提交