1. 02 10月, 2013 4 次提交
  2. 01 10月, 2013 4 次提交
    • S
      ipv4 igmp: use in_dev_put in timer handlers instead of __in_dev_put · e2401654
      Salam Noureddine 提交于
      It is possible for the timer handlers to run after the call to
      ip_mc_down so use in_dev_put instead of __in_dev_put in the handler
      function in order to do proper cleanup when the refcnt reaches 0.
      Otherwise, the refcnt can reach zero without the in_device being
      destroyed and we end up leaking a reference to the net_device and
      see messages like the following,
      
      unregister_netdevice: waiting for eth0 to become free. Usage count = 1
      
      Tested on linux-3.4.43.
      Signed-off-by: NSalam Noureddine <noureddine@aristanetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2401654
    • E
      net ipv4: Convert ipv4.ip_local_port_range to be per netns v3 · 0bbf87d8
      Eric W. Biederman 提交于
      - Move sysctl_local_ports from a global variable into struct netns_ipv4.
      - Modify inet_get_local_port_range to take a struct net, and update all
        of the callers.
      - Move the initialization of sysctl_local_ports into
         sysctl_net_ipv4.c:ipv4_sysctl_init_net from inet_connection_sock.c
      
      v2:
      - Ensure indentation used tabs
      - Fixed ip.h so it applies cleanly to todays net-next
      
      v3:
      - Compile fixes of strange callers of inet_get_local_port_range.
        This patch now successfully passes an allmodconfig build.
        Removed manual inlining of inet_get_local_port_range in ipv4_local_port_range
      Originally-by: NSamya <samya@twitter.com>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0bbf87d8
    • E
      tcp: TSQ can use a dynamic limit · c9eeec26
      Eric Dumazet 提交于
      When TCP Small Queues was added, we used a sysctl to limit amount of
      packets queues on Qdisc/device queues for a given TCP flow.
      
      Problem is this limit is either too big for low rates, or too small
      for high rates.
      
      Now TCP stack has rate estimation in sk->sk_pacing_rate, and TSO
      auto sizing, it can better control number of packets in Qdisc/device
      queues.
      
      New limit is two packets or at least 1 to 2 ms worth of packets.
      
      Low rates flows benefit from this patch by having even smaller
      number of packets in queues, allowing for faster recovery,
      better RTT estimations.
      
      High rates flows benefit from this patch by allowing more than 2 packets
      in flight as we had reports this was a limiting factor to reach line
      rate. [ In particular if TX completion is delayed because of coalescing
      parameters ]
      
      Example for a single flow on 10Gbp link controlled by FQ/pacing
      
      14 packets in flight instead of 2
      
      $ tc -s -d qd
      qdisc fq 8001: dev eth0 root refcnt 32 limit 10000p flow_limit 100p
      buckets 1024 quantum 3028 initial_quantum 15140
       Sent 1168459366606 bytes 771822841 pkt (dropped 0, overlimits 0
      requeues 6822476)
       rate 9346Mbit 771713pps backlog 953820b 14p requeues 6822476
        2047 flow, 2046 inactive, 1 throttled, delay 15673 ns
        2372 gc, 0 highprio, 0 retrans, 9739249 throttled, 0 flows_plimit
      
      Note that sk_pacing_rate is currently set to twice the actual rate, but
      this might be refined in the future when a flow is in congestion
      avoidance.
      
      Additional change : skb->destructor should be set to tcp_wfree().
      
      A future patch (for linux 3.13+) might remove tcp_limit_output_bytes
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Wei Liu <wei.liu2@citrix.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9eeec26
    • P
      ip_tunnel: Do not use stale inner_iph pointer. · d4a71b15
      Pravin B Shelar 提交于
      While sending packet skb_cow_head() can change skb header which
      invalidates inner_iph pointer to skb header. Following patch
      avoid using it. Found by code inspection.
      
      This bug was introduced by commit 0e6fbc5b (ip_tunnels: extend
      iptunnel_xmit()).
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4a71b15
  3. 30 9月, 2013 1 次提交
  4. 29 9月, 2013 4 次提交
    • E
      net: introduce SO_MAX_PACING_RATE · 62748f32
      Eric Dumazet 提交于
      As mentioned in commit afe4fd06 ("pkt_sched: fq: Fair Queue packet
      scheduler"), this patch adds a new socket option.
      
      SO_MAX_PACING_RATE offers the application the ability to cap the
      rate computed by transport layer. Value is in bytes per second.
      
      u32 val = 1000000;
      setsockopt(sockfd, SOL_SOCKET, SO_MAX_PACING_RATE, &val, sizeof(val));
      
      To be effectively paced, a flow must use FQ packet scheduler.
      
      Note that a packet scheduler takes into account the headers for its
      computations. The effective payload rate depends on MSS and retransmits
      if any.
      
      I chose to make this pacing rate a SOL_SOCKET option instead of a
      TCP one because this can be used by other protocols.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Steinar H. Gunderson <sesse@google.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      62748f32
    • F
      ipv4: processing ancillary IP_TOS or IP_TTL · aa661581
      Francesco Fusco 提交于
      If IP_TOS or IP_TTL are specified as ancillary data, then sendmsg() sends out
      packets with the specified TTL or TOS overriding the socket values specified
      with the traditional setsockopt().
      
      The struct inet_cork stores the values of TOS, TTL and priority that are
      passed through the struct ipcm_cookie. If there are user-specified TOS
      (tos != -1) or TTL (ttl != 0) in the struct ipcm_cookie, these values are
      used to override the per-socket values. In case of TOS also the priority
      is changed accordingly.
      
      Two helper functions get_rttos and get_rtconn_flags are defined to take
      into account the presence of a user specified TOS value when computing
      RT_TOS and RT_CONN_FLAGS.
      Signed-off-by: NFrancesco Fusco <ffusco@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa661581
    • F
      ipv4: IP_TOS and IP_TTL can be specified as ancillary data · f02db315
      Francesco Fusco 提交于
      This patch enables the IP_TTL and IP_TOS values passed from userspace to
      be stored in the ipcm_cookie struct. Three fields are added to the struct:
      
      - the TTL, expressed as __u8.
        The allowed values are in the [1-255].
        A value of 0 means that the TTL is not specified.
      
      - the TOS, expressed as __s16.
        The allowed values are in the range [0,255].
        A value of -1 means that the TOS is not specified.
      
      - the priority, expressed as a char and computed when
        handling the ancillary data.
      Signed-off-by: NFrancesco Fusco <ffusco@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f02db315
    • E
      net: net_secret should not depend on TCP · 9a3bab6b
      Eric Dumazet 提交于
      A host might need net_secret[] and never open a single socket.
      
      Problem added in commit aebda156
      ("net: defer net_secret[] initialization")
      
      Based on prior patch from Hannes Frederic Sowa.
      Reported-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@strressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a3bab6b
  5. 24 9月, 2013 5 次提交
  6. 20 9月, 2013 2 次提交
  7. 18 9月, 2013 1 次提交
  8. 13 9月, 2013 1 次提交
  9. 07 9月, 2013 2 次提交
    • E
      tcp: properly increase rcv_ssthresh for ofo packets · 4e4f1fc2
      Eric Dumazet 提交于
      TCP receive window handling is multi staged.
      
      A socket has a memory budget, static or dynamic, in sk_rcvbuf.
      
      Because we do not really know how this memory budget translates to
      a TCP window (payload), TCP announces a small initial window
      (about 20 MSS).
      
      When a packet is received, we increase TCP rcv_win depending
      on the payload/truesize ratio of this packet. Good citizen
      packets give a hint that it's reasonable to have rcv_win = sk_rcvbuf/2
      
      This heuristic takes place in tcp_grow_window()
      
      Problem is : We currently call tcp_grow_window() only for in-order
      packets.
      
      This means that reorders or packet losses stop proper grow of
      rcv_win, and senders are unable to benefit from fast recovery,
      or proper reordering level detection.
      
      Really, a packet being stored in OFO queue is not a bad citizen.
      It should be part of the game as in-order packets.
      
      In our traces, we very often see sender is limited by linux small
      receive windows, even if linux hosts use autotuning (DRS) and should
      allow rcv_win to grow to ~3MB.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e4f1fc2
    • Y
      tcp: fix no cwnd growth after timeout · 16edfe7e
      Yuchung Cheng 提交于
      In commit 0f7cc9a3 "tcp: increase throughput when reordering is high",
      it only allows cwnd to increase in Open state. This mistakenly disables
      slow start after timeout (CA_Loss). Moreover cwnd won't grow if the
      state moves from Disorder to Open later in tcp_fastretrans_alert().
      
      Therefore the correct logic should be to allow cwnd to grow as long
      as the data is received in order in Open, Loss, or even Disorder state.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16edfe7e
  10. 06 9月, 2013 1 次提交
  11. 05 9月, 2013 1 次提交
  12. 04 9月, 2013 9 次提交
  13. 03 9月, 2013 1 次提交
  14. 01 9月, 2013 1 次提交
  15. 31 8月, 2013 3 次提交