1. 12 7月, 2012 2 次提交
    • D
      ipv4: Rearrange arguments to ip_rt_redirect() · 94206125
      David S. Miller 提交于
      Pass in the SKB rather than just the IP addresses, so that policy
      and other aspects can reside in ip_rt_redirect() rather then
      icmp_redirect().
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      94206125
    • E
      tcp: TCP Small Queues · 46d3ceab
      Eric Dumazet 提交于
      This introduce TSQ (TCP Small Queues)
      
      TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
      device queues), to reduce RTT and cwnd bias, part of the bufferbloat
      problem.
      
      sk->sk_wmem_alloc not allowed to grow above a given limit,
      allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
      given time.
      
      TSO packets are sized/capped to half the limit, so that we have two
      TSO packets in flight, allowing better bandwidth use.
      
      As a side effect, setting the limit to 40000 automatically reduces the
      standard gso max limit (65536) to 40000/2 : It can help to reduce
      latencies of high prio packets, having smaller TSO packets.
      
      This means we divert sock_wfree() to a tcp_wfree() handler, to
      queue/send following frames when skb_orphan() [2] is called for the
      already queued skbs.
      
      Results on my dev machines (tg3/ixgbe nics) are really impressive,
      using standard pfifo_fast, and with or without TSO/GSO.
      
      Without reduction of nominal bandwidth, we have reduction of buffering
      per bulk sender :
      < 1ms on Gbit (instead of 50ms with TSO)
      < 8ms on 100Mbit (instead of 132 ms)
      
      I no longer have 4 MBytes backlogged in qdisc by a single netperf
      session, and both side socket autotuning no longer use 4 Mbytes.
      
      As skb destructor cannot restart xmit itself ( as qdisc lock might be
      taken at this point ), we delegate the work to a tasklet. We use one
      tasklest per cpu for performance reasons.
      
      If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag.
      This flag is tested in a new protocol method called from release_sock(),
      to eventually send new segments.
      
      [1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
      [2] skb_orphan() is usually called at TX completion time,
        but some drivers call it in their start_xmit() handler.
        These drivers should at least use BQL, or else a single TCP
        session can still fill the whole NIC TX ring, since TSQ will
        have no effect.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Dave Taht <dave.taht@bufferbloat.net>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Matt Mathis <mattmathis@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Nandita Dukkipati <nanditad@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46d3ceab
  2. 11 7月, 2012 15 次提交
  3. 09 7月, 2012 2 次提交
  4. 08 7月, 2012 4 次提交
  5. 06 7月, 2012 1 次提交
  6. 05 7月, 2012 10 次提交
  7. 04 7月, 2012 1 次提交
  8. 01 7月, 2012 3 次提交
    • G
      phy: add the EEE support and the way to access to the MMD registers. · a59a4d19
      Giuseppe CAVALLARO 提交于
      This patch adds the support for the Energy-Efficient Ethernet (EEE)
      to the Physical Abstraction Layer.
      To support the EEE we have to access to the MMD registers 3.20 and
      7.60/61. So two new functions have been added to read/write the MMD
      registers (clause 45).
      
      An Ethernet driver (I tested the stmmac) can invoke the phy_init_eee to properly
      check if the EEE is supported by the PHYs and it can also set the clock
      stop enable bit in the 3.0 register.
      The phy_get_eee_err can be used for reporting the number of time where
      the PHY failed to complete its normal wake sequence.
      
      In the end, this patch also adds the EEE ethtool support implementing:
       o phy_ethtool_set_eee
       o phy_ethtool_get_eee
      
      v1: initial patch
      v2: fixed some errors especially on naming convention
      v3: renamed again the mmd read/write functions thank to Ben's feedback
      v4: moved file to phy.c and added the ethtool support.
      v5: fixed phy_adv_to_eee, phy_eee_to_supported, phy_eee_to_adv return
          values according to ethtool API (thanks to Ben's feedback).
          Renamed some macros to avoid too long names.
      v6: fixed kernel-doc comments to be properly parsed.
          Fixed the phy_init_eee function: we need to check which link mode
          was autonegotiated and then the corresponding bits in 7.60 and 7.61
          registers.
      v7: reviewed the way to get the negotiated settings.
      v8: fixed a problem in the phy_init_eee return value erroneously added
          when included the phy_read_status call.
      v9: do not remove the MDIO_AN_EEE_ADV_100TX and MDIO_AN_EEE_ADV_1000T
          and fixed the eee_{cap,lp,adv} declaration as "int" instead of u16.
      Signed-off-by: NGiuseppe Cavallaro <peppe.cavallaro@st.com>
      Reviewed-by: NBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a59a4d19
    • N
      sctp: be more restrictive in transport selection on bundled sacks · 4244854d
      Neil Horman 提交于
      It was noticed recently that when we send data on a transport, its possible that
      we might bundle a sack that arrived on a different transport.  While this isn't
      a major problem, it does go against the SHOULD requirement in section 6.4 of RFC
      2960:
      
       An endpoint SHOULD transmit reply chunks (e.g., SACK, HEARTBEAT ACK,
         etc.) to the same destination transport address from which it
         received the DATA or control chunk to which it is replying.  This
         rule should also be followed if the endpoint is bundling DATA chunks
         together with the reply chunk.
      
      This patch seeks to correct that.  It restricts the bundling of sack operations
      to only those transports which have moved the ctsn of the association forward
      since the last sack.  By doing this we guarantee that we only bundle outbound
      saks on a transport that has received a chunk since the last sack.  This brings
      us into stricter compliance with the RFC.
      
      Vlad had initially suggested that we strictly allow only sack bundling on the
      transport that last moved the ctsn forward.  While this makes sense, I was
      concerned that doing so prevented us from bundling in the case where we had
      received chunks that moved the ctsn on multiple transports.  In those cases, the
      RFC allows us to select any of the transports having received chunks to bundle
      the sack on.  so I've modified the approach to allow for that, by adding a state
      variable to each transport that tracks weather it has moved the ctsn since the
      last sack.  This I think keeps our behavior (and performance), close enough to
      our current profile that I think we can do this without a sysctl knob to
      enable/disable it.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: Vlad Yaseivch <vyasevich@gmail.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: linux-sctp@vger.kernel.org
      Reported-by: NMichele Baldessari <michele@redhat.com>
      Reported-by: Nsorin serban <sserban@redhat.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4244854d
    • R
      linux/irq.h: fix kernel-doc warning · 87fac288
      Randy Dunlap 提交于
      Fix kernel-doc warning.  This struct member was removed in commit
      87568264 ("irq: Remove irq_chip->release()") so remove its
      associated kernel-doc entry also.
      
        Warning(include/linux/irq.h:338): Excess struct/union/enum/typedef member 'release' description in 'irq_chip'
      Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      87fac288
  9. 30 6月, 2012 2 次提交