1. 01 3月, 2015 3 次提交
    • E
      tcp: tso: restore IW10 after TSO autosizing · 50c8339e
      Eric Dumazet 提交于
      With sysctl_tcp_min_tso_segs being 4, it is very possible
      that tcp_tso_should_defer() decides not sending last 2 MSS
      of initial window of 10 packets. This also applies if
      autosizing decides to send X MSS per GSO packet, and cwnd
      is not a multiple of X.
      
      This patch implements an heuristic based on age of first
      skb in write queue : If it was sent very recently (less than half srtt),
      we can predict that no ACK packet will come in less than half rtt,
      so deferring might cause an under utilization of our window.
      
      This is visible on initial send (IW10) on web servers,
      but more generally on some RPC, as the last part of the message
      might need an extra RTT to get delivered.
      
      Tested:
      
      Ran following packetdrill test
      // A simple server-side test that sends exactly an initial window (IW10)
      // worth of packets.
      
      `sysctl -e -q net.ipv4.tcp_min_tso_segs=4`
      
      0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +0    setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +0    bind(3, ..., ...) = 0
      +0    listen(3, 1) = 0
      
      +.1   < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
      +0    > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
      +.1   < . 1:1(0) ack 1 win 257
      +0    accept(3, ..., ...) = 4
      
      +0    write(4, ..., 14600) = 14600
      +0    > . 1:5841(5840) ack 1 win 457
      +0    > . 5841:11681(5840) ack 1 win 457
      // Following packet should be sent right now.
      +0    > P. 11681:14601(2920) ack 1 win 457
      
      +.1   < . 1:1(0) ack 14601 win 257
      
      +0    close(4) = 0
      +0    > F. 14601:14601(0) ack 1
      +.1   < F. 1:1(0) ack 14602 win 257
      +0    > . 14602:14602(0) ack 2
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50c8339e
    • E
      tcp: tso: remove tp->tso_deferred · 5f852eb5
      Eric Dumazet 提交于
      TSO relies on ability to defer sending a small amount of packets.
      Heuristic is to wait for future ACKS in hope to send more packets at once.
      Current algorithm uses a per socket tso_deferred field as a pseudo timer.
      
      This pseudo timer relies on future ACK, but there is no guarantee
      we receive them in time.
      
      Fix would be to use a real timer, but cost of such timer is probably too
      expensive for typical cases.
      
      This patch changes the logic to test the time of last transmit,
      because we should not add bursts of more than 1ms for any given flow.
      
      We've used this patch for about two years at Google, before FQ/pacing
      as it would reduce a fair amount of bursts.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f852eb5
    • B
      usbnet: Fix tx_packets stat for FLAG_MULTI_FRAME drivers · 6588af61
      Ben Hutchings 提交于
      Currently the usbnet core does not update the tx_packets statistic for
      drivers with FLAG_MULTI_PACKET and there is no hook in the TX
      completion path where they could do this.
      
      cdc_ncm and dependent drivers are bumping tx_packets stat on the
      transmit path while asix and sr9800 aren't updating it at all.
      
      Add a packet count in struct skb_data so these drivers can fill it
      in, initialise it to 1 for other drivers, and add the packet count
      to the tx_packets statistic on completion.
      Signed-off-by: NBen Hutchings <ben.hutchings@codethink.co.uk>
      Tested-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6588af61
  2. 28 2月, 2015 16 次提交
  3. 27 2月, 2015 5 次提交
  4. 26 2月, 2015 6 次提交
    • D
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · 009f33ed
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2015-02-24
      
      This series contains updates to i40e and i40evf only, which bumps their
      versions to i40e 1.2.9 and i40evf 1.2.3.
      
      Paul fixes i40e_debug_aq() for big endian machines by adding the
      appropriate LExx_TO_CPU wrappers.
      
      Catherine adds a requested speed variable to the link_status to store the
      last speeds we requested from the firmware and use the advertised speed
      settings in get_settings in ethtool now that we have it.  Due to the
      new code addition, she also refactors get_settings to improve readability
      and to accommodate some of the longer lines of code by adding two
      functions i40e_get_settings_link_up() and i40e_get_settings_link_down().
      
      Carolyn adds a struct to the VSI struct to keep track of RXNFC settings
      done via ethtool.  Adds more information to the interrupt vector
      names, specifically to the VF misc vector name so that we can distinguish
      between all the interrupts.
      
      Ashish enables the i40evf driver to enable debug prints via ethtool.
      
      Mitch updates i40e to enable packet split only when IOMMU is in use,
      since it shows a distinct advantage over the single-buffer path
      because it minimizes DMA mapping and unmapping.  Also adds the receive
      routine in use to the features log message to be able to print the
      receive packet split status.
      
      Greg adds the ability to get, set and commit permanently the NPAR
      partition BW configuration through configfs.  Enables an application
      to query the i40e driver's private flags to get the status of NPAR
      enablement via ethtool.
      
      Neerav adds support for bridge offload ndo_ops getlink and setlink
      to enable bridge hardware mode as per the mode set via IFLA_BRIDGE_MODE.
      The support is only enabled in the case of a PF VSI and not available for
      any other VSI type.
      
      Kevin fixes i40e by ensuring the BUF and FLAG_RD flags are set for
      indirect admin queue command.
      
      Vasu updates the driver to setup FCoE netdev device type as "fcoe", so that
      it shows up in sysfs as FCoE device.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      009f33ed
    • G
      net: dsa: Introduce dsa_is_port_initialized · d79d2107
      Guenter Roeck 提交于
      To avoid race conditions when using the ds->ports[] array,
      we need to check if the accessed port has been initialized.
      Introduce and use helper function dsa_is_port_initialized
      for that purpose and use it where needed.
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d79d2107
    • D
      Merge branch 'sf2_hwbridge' · bb66be1c
      David S. Miller 提交于
      Florian Fainelli says:
      
      ====================
      net: dsa: integration with SWITCHDEV for HW bridging
      
      This patch set provides the DSA and SWITCHDEV integration bits together and
      modifies the bcm_sf2 driver accordingly such that it works properly with HW
      bridging.
      
      Changes in v3:
      
      - add back the null pointer check in dsa_slave_br_port_mask from Guenter
      - slightly rework patch 1 commit message not to mention the function name
        we add in patch 2
      
      Changes in v2:
      
      - avoid a race condition in how DSA network devices are created, patch from
        Guenter Roeck
      - provide a consistent and work STP state once a port leaves the bridge
      - retain a bridge device pointer to properly flag port/bridge membership
      - properly flush the ARL (Address Resolution Logic) in bcm_sf2.c
      - properly retain port membership when individually bringing devices up/down
        while they are members of a bridge
      
      We discussed on the mailing-list the possibility of standardizing a "fdb_flush"
      operation for DSA switch drivers, looking at the Marvell and Broadcom switches,
      I am not convinced this is practical or diserable as the terminologies vary
      here, but there is nothing preventing us from doing it later.
      
      Many thanks to Guenter and Andrew for both testing and providing feedback.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bb66be1c
    • F
      net: dsa: bcm_sf2: add HW bridging support · 12f460f2
      Florian Fainelli 提交于
      Implement the bridge join, leave and set_stp callbacks by making that
      we do the following:
      
      - when a port joins the bridge, all existing ports in the bridge get
        their VLAN control register updated with that joining port
      - the joining port is including all existing bridge ports in its own
        VLAN control register
      
      The leave operation is fairly similar, special care must be taken to
      make sure that port leaving the bridging is not removing itself from its
      own VLAN control register.
      
      Since the various BR_* states apply directly to our HW semantics, we
      just need to translate these constants into their corresponding HW
      settings, and voila!
      
      We make sure to trigger a fast-ageing process for ports that are
      joining/leaving the bridge and transition from incompatible states, this
      is equivalent to triggering an ARL flush for that port.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12f460f2
    • F
      net: dsa: integrate with SWITCHDEV for HW bridging · b73adef6
      Florian Fainelli 提交于
      In order to support bridging offloads in DSA switch drivers, select
      NET_SWITCHDEV to get access to the port_stp_update and parent_get_id
      NDOs that we are required to implement.
      
      To facilitate the integratation at the DSA driver level, we implement 3
      types of operations:
      
      - port_join_bridge
      - port_leave_bridge
      - port_stp_update
      
      DSA will resolve which switch ports that are currently bridge port
      members as some Switch hardware/drivers need to know about that to limit
      the register programming to just the relevant registers (especially for
      slow MDIO buses).
      
      We also take care of setting the correct STP state when slave network
      devices are brought up/down while being bridge members.
      
      Finally, when a port is leaving the bridge, we make sure we set in
      BR_STATE_FORWARDING state, otherwise the bridge layer would leave it
      disabled as a result of having left the bridge.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
      Tested-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b73adef6
    • G
      net: dsa: Ensure that port array elements are initialized before being used · d87d6f44
      Guenter Roeck 提交于
      A network device notifier can be called for one or more of the created
      slave devices before all slave devices have been registered. This can
      result in a mismatch between ds->phys_port_mask and the registered devices
      by the time the call is made, and it can result in a slave device being
      added to a bridge before its entry in ds->ports[] has been initialized.
      
      Rework the initialization code to initialize entries in ds->ports[] in
      dsa_slave_create. With this change, dsa_slave_create no longer needs
      to return slave_dev but can return an error code instead.
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d87d6f44
  5. 25 2月, 2015 10 次提交