1. 12 7月, 2012 2 次提交
    • D
      ipv4: Rearrange arguments to ip_rt_redirect() · 94206125
      David S. Miller 提交于
      Pass in the SKB rather than just the IP addresses, so that policy
      and other aspects can reside in ip_rt_redirect() rather then
      icmp_redirect().
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      94206125
    • E
      tcp: TCP Small Queues · 46d3ceab
      Eric Dumazet 提交于
      This introduce TSQ (TCP Small Queues)
      
      TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
      device queues), to reduce RTT and cwnd bias, part of the bufferbloat
      problem.
      
      sk->sk_wmem_alloc not allowed to grow above a given limit,
      allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
      given time.
      
      TSO packets are sized/capped to half the limit, so that we have two
      TSO packets in flight, allowing better bandwidth use.
      
      As a side effect, setting the limit to 40000 automatically reduces the
      standard gso max limit (65536) to 40000/2 : It can help to reduce
      latencies of high prio packets, having smaller TSO packets.
      
      This means we divert sock_wfree() to a tcp_wfree() handler, to
      queue/send following frames when skb_orphan() [2] is called for the
      already queued skbs.
      
      Results on my dev machines (tg3/ixgbe nics) are really impressive,
      using standard pfifo_fast, and with or without TSO/GSO.
      
      Without reduction of nominal bandwidth, we have reduction of buffering
      per bulk sender :
      < 1ms on Gbit (instead of 50ms with TSO)
      < 8ms on 100Mbit (instead of 132 ms)
      
      I no longer have 4 MBytes backlogged in qdisc by a single netperf
      session, and both side socket autotuning no longer use 4 Mbytes.
      
      As skb destructor cannot restart xmit itself ( as qdisc lock might be
      taken at this point ), we delegate the work to a tasklet. We use one
      tasklest per cpu for performance reasons.
      
      If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag.
      This flag is tested in a new protocol method called from release_sock(),
      to eventually send new segments.
      
      [1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
      [2] skb_orphan() is usually called at TX completion time,
        but some drivers call it in their start_xmit() handler.
        These drivers should at least use BQL, or else a single TCP
        session can still fill the whole NIC TX ring, since TSQ will
        have no effect.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Dave Taht <dave.taht@bufferbloat.net>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Matt Mathis <mattmathis@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Nandita Dukkipati <nanditad@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46d3ceab
  2. 11 7月, 2012 11 次提交
  3. 09 7月, 2012 1 次提交
  4. 06 7月, 2012 1 次提交
  5. 05 7月, 2012 8 次提交
  6. 01 7月, 2012 1 次提交
    • N
      sctp: be more restrictive in transport selection on bundled sacks · 4244854d
      Neil Horman 提交于
      It was noticed recently that when we send data on a transport, its possible that
      we might bundle a sack that arrived on a different transport.  While this isn't
      a major problem, it does go against the SHOULD requirement in section 6.4 of RFC
      2960:
      
       An endpoint SHOULD transmit reply chunks (e.g., SACK, HEARTBEAT ACK,
         etc.) to the same destination transport address from which it
         received the DATA or control chunk to which it is replying.  This
         rule should also be followed if the endpoint is bundling DATA chunks
         together with the reply chunk.
      
      This patch seeks to correct that.  It restricts the bundling of sack operations
      to only those transports which have moved the ctsn of the association forward
      since the last sack.  By doing this we guarantee that we only bundle outbound
      saks on a transport that has received a chunk since the last sack.  This brings
      us into stricter compliance with the RFC.
      
      Vlad had initially suggested that we strictly allow only sack bundling on the
      transport that last moved the ctsn forward.  While this makes sense, I was
      concerned that doing so prevented us from bundling in the case where we had
      received chunks that moved the ctsn on multiple transports.  In those cases, the
      RFC allows us to select any of the transports having received chunks to bundle
      the sack on.  so I've modified the approach to allow for that, by adding a state
      variable to each transport that tracks weather it has moved the ctsn since the
      last sack.  This I think keeps our behavior (and performance), close enough to
      our current profile that I think we can do this without a sysctl knob to
      enable/disable it.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: Vlad Yaseivch <vyasevich@gmail.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: linux-sctp@vger.kernel.org
      Reported-by: NMichele Baldessari <michele@redhat.com>
      Reported-by: Nsorin serban <sserban@redhat.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4244854d
  7. 29 6月, 2012 5 次提交
    • D
      ipv4: Elide fib_validate_source() completely when possible. · 7a9bc9b8
      David S. Miller 提交于
      If rpfilter is off (or the SKB has an IPSEC path) and there are not
      tclassid users, we don't have to do anything at all when
      fib_validate_source() is invoked besides setting the itag to zero.
      
      We monitor tclassid uses with a counter (modified only under RTNL and
      marked __read_mostly) and we protect the fib_validate_source() real
      work with a test against this counter and whether rpfilter is to be
      done.
      
      Having a way to know whether we need no tclassid processing or not
      also opens the door for future optimized rpfilter algorithms that do
      not perform full FIB lookups.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a9bc9b8
    • V
      ipv6_tunnel: Allow receiving packets on the fallback tunnel if they pass sanity checks · d0087b29
      Ville Nuorvala 提交于
      At Facebook, we do Layer-3 DSR via IP-in-IP tunneling. Our load balancers wrap
      an extra IP header on incoming packets so they can be routed to the backend.
      In the v4 tunnel driver, when these packets fall on the default tunl0 device,
      the behavior is to decapsulate them and drop them back on the stack. So our
      setup is that tunl0 has the VIP and eth0 has (obviously) the backend's real
      address.
      
      In IPv6 we do the same thing, but the v6 tunnel driver didn't have this same
      behavior - if you didn't have an explicit tunnel setup, it would drop the
      packet.
      
      This patch brings that v4 feature to the v6 driver.
      
      The same IPv6 address checks are performed as with any normal tunnel,
      but as the fallback tunnel endpoint addresses are unspecified, the checks
      must be performed on a per-packet basis, rather than at tunnel
      configuration time.
      
      [Patch description modified by phil@ipom.com]
      Signed-off-by: NVille Nuorvala <ville.nuorvala@gmail.com>
      Tested-by: NPhil Dibowitz <phil@ipom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0087b29
    • D
      ipv4: Adjust in_dev handling in fib_validate_source() · 9e56e380
      David S. Miller 提交于
      Checking for in_dev being NULL is pointless.
      
      In fact, all of our callers have in_dev precomputed already,
      so just pass it in and remove the NULL checking.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e56e380
    • T
      net: Use NLMSG_DEFAULT_SIZE in combination with nlmsg_new() · 58050fce
      Thomas Graf 提交于
      Using NLMSG_GOODSIZE results in multiple pages being used as
      nlmsg_new() will automatically add the size of the netlink
      header to the payload thus exceeding the page limit.
      
      NLMSG_DEFAULT_SIZE takes this into account.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Cc: Jiri Pirko <jpirko@redhat.com>
      Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
      Cc: Sergey Lapin <slapin@ossfans.org>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org>
      Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org>
      Cc: Samuel Ortiz <sameo@linux.intel.com>
      Reviewed-by: NJiri Pirko <jpirko@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58050fce
    • N
      tcp: pass fl6 to inet6_csk_route_req() · 3840a06e
      Neal Cardwell 提交于
      This commit changes inet_csk_route_req() so that it uses a pointer to
      a struct flowi6, rather than allocating its own on the stack. This
      brings its behavior in line with its IPv4 cousin,
      inet_csk_route_req(), and allows a follow-on patch to fix a dst leak.
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3840a06e
  8. 28 6月, 2012 9 次提交
  9. 27 6月, 2012 1 次提交
  10. 26 6月, 2012 1 次提交