1. 01 2月, 2012 34 次提交
  2. 31 1月, 2012 3 次提交
    • E
      af_unix: fix EPOLLET regression for stream sockets · 6f01fd6e
      Eric Dumazet 提交于
      Commit 0884d7aa (AF_UNIX: Fix poll blocking problem when reading from
      a stream socket) added a regression for epoll() in Edge Triggered mode
      (EPOLLET)
      
      Appropriate fix is to use skb_peek()/skb_unlink() instead of
      skb_dequeue(), and only call skb_unlink() when skb is fully consumed.
      
      This remove the need to requeue a partial skb into sk_receive_queue head
      and the extra sk->sk_data_ready() calls that added the regression.
      
      This is safe because once skb is given to sk_receive_queue, it is not
      modified by a writer, and readers are serialized by u->readlock mutex.
      
      This also reduce number of spinlock acquisition for small reads or
      MSG_PEEK users so should improve overall performance.
      Reported-by: NNick Mathewson <nickm@freehaven.net>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Alexey Moiseytsev <himeraster@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6f01fd6e
    • N
      tcp: fix tcp_trim_head() to adjust segment count with skb MSS · 5b35e1e6
      Neal Cardwell 提交于
      This commit fixes tcp_trim_head() to recalculate the number of
      segments in the skb with the skb's existing MSS, so trimming the head
      causes the skb segment count to be monotonically non-increasing - it
      should stay the same or go down, but not increase.
      
      Previously tcp_trim_head() used the current MSS of the connection. But
      if there was a decrease in MSS between original transmission and ACK
      (e.g. due to PMTUD), this could cause tcp_trim_head() to
      counter-intuitively increase the segment count when trimming bytes off
      the head of an skb. This violated assumptions in tcp_tso_acked() that
      tcp_trim_head() only decreases the packet count, so that packets_acked
      in tcp_tso_acked() could underflow, leading tcp_clean_rtx_queue() to
      pass u32 pkts_acked values as large as 0xffffffff to
      ca_ops->pkts_acked().
      
      As an aside, if tcp_trim_head() had really wanted the skb to reflect
      the current MSS, it should have called tcp_set_skb_tso_segs()
      unconditionally, since a decrease in MSS would mean that a
      single-packet skb should now be sliced into multiple segments.
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NNandita Dukkipati <nanditad@google.com>
      Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b35e1e6
    • G
      net/tcp: Fix tcp memory limits initialization when !CONFIG_SYSCTL · 4acb4190
      Glauber Costa 提交于
      sysctl_tcp_mem() initialization was moved to sysctl_tcp_ipv4.c
      in commit 3dc43e3e, since it
      became a per-ns value.
      
      That code, however, will never run when CONFIG_SYSCTL is
      disabled, leading to bogus values on those fields - causing hung
      TCP sockets.
      
      This patch fixes it by keeping an initialization code in
      tcp_init(). It will be overwritten by the first net namespace
      init if CONFIG_SYSCTL is compiled in, and do the right thing if
      it is compiled out.
      
      It is also named properly as tcp_init_mem(), to properly signal
      its non-sysctl side effect on TCP limits.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: David S. Miller <davem@davemloft.net>
      Link: http://lkml.kernel.org/r/4F22D05A.8030604@parallels.com
      [ renamed the function, tidied up the changelog a bit ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4acb4190
  3. 28 1月, 2012 1 次提交
  4. 27 1月, 2012 2 次提交
    • S
      net: RTNETLINK adjusting values of min_ifinfo_dump_size · f18da145
      Stefan Gula 提交于
      Setting link parameters on a netdevice changes the value
      of if_nlmsg_size(), therefore it is necessary to recalculate
      min_ifinfo_dump_size.
      Signed-off-by: NStefan Gula <steweg@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f18da145
    • W
      ipv6: Fix ip_gre lockless xmits. · f2b3ee9e
      Willem de Bruijn 提交于
      Tunnel devices set NETIF_F_LLTX to bypass HARD_TX_LOCK.  Sit and
      ipip set this unconditionally in ops->setup, but gre enables it
      conditionally after parameter passing in ops->newlink. This is
      not called during tunnel setup as below, however, so GRE tunnels are
      still taking the lock.
      
      modprobe ip_gre
      ip tunnel add test0 mode gre remote 10.5.1.1 dev lo
      ip link set test0 up
      ip addr add 10.6.0.1 dev test0
       # cat /sys/class/net/test0/features
       # $DIR/test_tunnel_xmit 10 10.5.2.1
      ip route add 10.5.2.0/24 dev test0
      ip tunnel del test0
      
      The newlink callback is only called in rtnl_netlink, and only if
      the device is new, as it calls register_netdevice internally. Gre
      tunnels are created at 'ip tunnel add' with ioctl SIOCADDTUNNEL,
      which calls ipgre_tunnel_locate, which calls register_netdev.
      rtnl_newlink is called at 'ip link set', but skips ops->newlink
      and the device is up with locking still enabled. The equivalent
      ipip tunnel works fine, btw (just substitute 'method gre' for
      'method ipip').
      
      On kernels before /sys/class/net/*/features was removed [1],
      the first commented out line returns 0x6000 with method gre,
      which indicates that NETIF_F_LLTX (0x1000) is not set. With ipip,
      it reports 0x7000. This test cannot be used on recent kernels where
      the sysfs file is removed (and ETHTOOL_GFEATURES does not currently
      work for tunnel devices, because they lack dev->ethtool_ops).
      
      The second commented out line calls a simple transmission test [2]
      that sends on 24 cores at maximum rate. Results of a single run:
      
      ipip:			19,372,306
      gre before patch:	 4,839,753
      gre after patch:	19,133,873
      
      This patch replicates the condition check in ipgre_newlink to
      ipgre_tunnel_locate. It works for me, both with oseq on and off.
      This is the first time I looked at rtnetlink and iproute2 code,
      though, so someone more knowledgeable should probably check the
      patch. Thanks.
      
      The tail of both functions is now identical, by the way. To avoid
      code duplication, I'll be happy to rework this and merge the two.
      
      [1] http://patchwork.ozlabs.org/patch/104610/
      [2] http://kernel.googlecode.com/files/xmit_udp_parallel.cSigned-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f2b3ee9e