1. 25 2月, 2014 2 次提交
  2. 21 2月, 2014 2 次提交
  3. 20 2月, 2014 3 次提交
    • F
      tcp: use zero-window when free_space is low · 86c1a045
      Florian Westphal 提交于
      Currently the kernel tries to announce a zero window when free_space
      is below the current receiver mss estimate.
      
      When a sender is transmitting small packets and reader consumes data
      slowly (or not at all), receiver might be unable to shrink the receive
      win because
      
      a) we cannot withdraw already-commited receive window, and,
      b) we have to round the current rwin up to a multiple of the wscale
         factor, else we would shrink the current window.
      
      This causes the receive buffer to fill up until the rmem limit is hit.
      When this happens, we start dropping packets.
      
      Moreover, tcp_clamp_window may continue to grow sk_rcvbuf towards rmem[2]
      even if socket is not being read from.
      
      As we cannot avoid the "current_win is rounded up to multiple of mss"
      issue [we would violate a) above] at least try to prevent the receive buf
      growth towards tcp_rmem[2] limit by attempting to move to zero-window
      announcement when free_space becomes less than 1/16 of the current
      allowed receive buffer maximum.  If tcp_rmem[2] is large, this will
      increase our chances to get a zero-window announcement out in time.
      
      Reproducer:
      On server:
      $ nc -l -p 12345
      <suspend it: CTRL-Z>
      
      Client:
      #!/usr/bin/env python
      import socket
      import time
      
      sock = socket.socket()
      sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
      sock.connect(("192.168.4.1", 12345));
      while True:
         sock.send('A' * 23)
         time.sleep(0.005)
      
      socket buffer on server-side will grow until tcp_rmem[2] is hit,
      at which point the client rexmits data until -EDTIMEOUT:
      
      tcp_data_queue invokes tcp_try_rmem_schedule which will call
      tcp_prune_queue which calls tcp_clamp_window().  And that function will
      grow sk->sk_rcvbuf up until it eventually hits tcp_rmem[2].
      
      Thanks to Eric Dumazet for running regression tests.
      
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Tested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86c1a045
    • E
      tipc: failed transmissions should return error · 63fa01c1
      Erik Hugne 提交于
      When a message could not be sent out because the destination node
      or link could not be found, the full message size is returned from
      sendmsg() as if it had been sent successfully. An application will
      then get a false indication that it's making forward progress. This
      problem has existed since the initial commit in 2.6.16.
      
      We change this to return -ENETUNREACH if the message cannot be
      delivered due to the destination node/link being unavailable. We
      also get rid of the redundant tipc_reject_msg call since freeing
      the buffer and doing a tipc_port_iovec_reject accomplishes exactly
      the same thing.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      63fa01c1
    • H
      ipv6: honor IPV6_PKTINFO with v4 mapped addresses on sendmsg · c8e6ad08
      Hannes Frederic Sowa 提交于
      In case we decide in udp6_sendmsg to send the packet down the ipv4
      udp_sendmsg path because the destination is either of family AF_INET or
      the destination is an ipv4 mapped ipv6 address, we don't honor the
      maybe specified ipv4 mapped ipv6 address in IPV6_PKTINFO.
      
      We simply can check for this option in ip_cmsg_send because no calls to
      ipv6 module functions are needed to do so.
      Reported-by: NGert Doering <gert@space.net>
      Cc: Tore Anderson <tore@fud.no>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c8e6ad08
  4. 19 2月, 2014 8 次提交
  5. 18 2月, 2014 23 次提交
  6. 17 2月, 2014 2 次提交
    • N
      ipsec: add support of limited SA dump · d3623099
      Nicolas Dichtel 提交于
      The goal of this patch is to allow userland to dump only a part of SA by
      specifying a filter during the dump.
      The kernel is in charge to filter SA, this avoids to generate useless netlink
      traffic (it save also some cpu cycles). This is particularly useful when there
      is a big number of SA set on the system.
      
      Note that I removed the union in struct xfrm_state_walk to fix a problem on arm.
      struct netlink_callback->args is defined as a array of 6 long and the first long
      is used in xfrm code to flag the cb as initialized. Hence, we must have:
      sizeof(struct xfrm_state_walk) <= sizeof(long) * 5.
      With the union, it was false on arm (sizeof(struct xfrm_state_walk) was
      sizeof(long) * 7), due to the padding.
      In fact, whatever the arch is, this union seems useless, there will be always
      padding after it. Removing it will not increase the size of this struct (and
      reduce it on arm).
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      d3623099
    • D
      packet: check for ndo_select_queue during queue selection · 0fd5d57b
      Daniel Borkmann 提交于
      Mathias reported that on an AMD Geode LX embedded board (ALiX)
      with ath9k driver PACKET_QDISC_BYPASS, introduced in commit
      d346a3fa ("packet: introduce PACKET_QDISC_BYPASS socket
      option"), triggers a WARN_ON() coming from the driver itself
      via 066dae93 ("ath9k: rework tx queue selection and fix
      queue stopping/waking").
      
      The reason why this happened is that ndo_select_queue() call
      is not invoked from direct xmit path i.e. for ieee80211 subsystem
      that sets queue and TID (similar to 802.1d tag) which is being
      put into the frame through 802.11e (WMM, QoS). If that is not
      set, pending frame counter for e.g. ath9k can get messed up.
      
      So the WARN_ON() in ath9k is absolutely legitimate. Generally,
      the hw queue selection in ieee80211 depends on the type of
      traffic, and priorities are set according to ieee80211_ac_numbers
      mapping; working in a similar way as DiffServ only on a lower
      layer, so that the AP can favour frames that have "real-time"
      requirements like voice or video data frames.
      
      Therefore, check for presence of ndo_select_queue() in netdev
      ops and, if available, invoke it with a fallback handler to
      __packet_pick_tx_queue(), so that driver such as bnx2x, ixgbe,
      or mlx4 can still select a hw queue for transmission in
      relation to the current CPU while e.g. ieee80211 subsystem
      can make their own choices.
      Reported-by: NMathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0fd5d57b