1. 06 2月, 2009 2 次提交
  2. 03 2月, 2009 1 次提交
  3. 30 1月, 2009 1 次提交
  4. 27 1月, 2009 2 次提交
    • D
      tcp: Fix length tcp_splice_data_recv passes to skb_splice_bits. · 9fa5fdf2
      Dimitris Michailidis 提交于
      tcp_splice_data_recv has two lengths to consider: the len parameter it
      gets from tcp_read_sock, which specifies the amount of data in the skb,
      and rd_desc->count, which is the amount of data the splice caller still
      wants.  Currently it passes just the latter to skb_splice_bits, which then
      splices min(rd_desc->count, skb->len - offset) bytes.
      
      Most of the time this is fine, except when the skb contains urgent data.
      In that case len goes only up to the urgent byte and is less than
      skb->len - offset.  By ignoring len tcp_splice_data_recv may a) splice
      data tcp_read_sock told it not to, b) return to tcp_read_sock a value > len.
      
      Now, tcp_read_sock doesn't handle used > len and leaves the socket in a
      bad state (both sk_receive_queue and copied_seq are bad at that point)
      resulting in duplicated data and corruption.
      
      Fix by passing min(rd_desc->count, len) to skb_splice_bits.
      Signed-off-by: NDimitris Michailidis <dm@chelsio.com>
      Acked-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9fa5fdf2
    • E
      udp: optimize bind(0) if many ports are in use · 98322f22
      Eric Dumazet 提交于
      commit 9088c560
      (udp: Improve port randomization) introduced a regression for UDP bind() syscall
      to null port (getting a random port) in case lot of ports are already in use.
      
      This is because we do about 28000 scans of very long chains (220 sockets per chain),
      with many spin_lock_bh()/spin_unlock_bh() calls.
      
      Fix this using a bitmap (64 bytes for current value of UDP_HTABLE_SIZE)
      so that we scan chains at most once.
      
      Instead of 250 ms per bind() call, we get after patch a time of 2.9 ms 
      
      Based on a report from Vitaly Mayatskikh
      Reported-by: NVitaly Mayatskikh <v.mayatskih@gmail.com>
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Tested-by: NVitaly Mayatskikh <v.mayatskih@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      98322f22
  5. 15 1月, 2009 1 次提交
  6. 14 1月, 2009 1 次提交
    • W
      tcp: splice as many packets as possible at once · 33966dd0
      Willy Tarreau 提交于
      As spotted by Willy Tarreau, current splice() from tcp socket to pipe is not
      optimal. It processes at most one segment per call.
      This results in low performance and very high overhead due to syscall rate
      when splicing from interfaces which do not support LRO.
      
      Willy provided a patch inside tcp_splice_read(), but a better fix
      is to let tcp_read_sock() process as many segments as possible, so
      that tcp_rcv_space_adjust() and tcp_cleanup_rbuf() are called less
      often.
      
      With this change, splice() behaves like tcp_recvmsg(), being able
      to consume many skbs in one system call. With typical 1460 bytes
      of payload per frame, that means splice(SPLICE_F_NONBLOCK) can return
      16*1460 = 23360 bytes.
      Signed-off-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33966dd0
  7. 13 1月, 2009 2 次提交
  8. 09 1月, 2009 1 次提交
  9. 07 1月, 2009 2 次提交
  10. 05 1月, 2009 3 次提交
  11. 01 1月, 2009 1 次提交
  12. 30 12月, 2008 2 次提交
  13. 26 12月, 2008 4 次提交
  14. 19 12月, 2008 1 次提交
  15. 16 12月, 2008 3 次提交
    • I
      ipmr: merge common code · b1879204
      Ilpo Järvinen 提交于
      Also removes redundant skb->len < x check which can't
      be true once pskb_may_pull(skb, x) succeeded.
      
      $ diff-funcs pim_rcv ipmr.c ipmr.c pim_rcv_v1
        --- ipmr.c:pim_rcv()
        +++ ipmr.c:pim_rcv_v1()
      @@ -1,22 +1,27 @@
      -static int pim_rcv(struct sk_buff * skb)
      +int pim_rcv_v1(struct sk_buff * skb)
       {
      -	struct pimreghdr *pim;
      +	struct igmphdr *pim;
       	struct iphdr   *encap;
       	struct net_device  *reg_dev = NULL;
      
       	if (!pskb_may_pull(skb, sizeof(*pim) + sizeof(*encap)))
       		goto drop;
      
      -	pim = (struct pimreghdr *)skb_transport_header(skb);
      -	if (pim->type != ((PIM_VERSION<<4)|(PIM_REGISTER)) ||
      -	    (pim->flags&PIM_NULL_REGISTER) ||
      -	    (ip_compute_csum((void *)pim, sizeof(*pim)) != 0 &&
      -	     csum_fold(skb_checksum(skb, 0, skb->len, 0))))
      +	pim = igmp_hdr(skb);
      +
      +	if (!mroute_do_pim ||
      +	    skb->len < sizeof(*pim) + sizeof(*encap) ||
      +	    pim->group != PIM_V1_VERSION || pim->code != PIM_V1_REGISTER)
       		goto drop;
      
      -	/* check if the inner packet is destined to mcast group */
       	encap = (struct iphdr *)(skb_transport_header(skb) +
      -				 sizeof(struct pimreghdr));
      +				 sizeof(struct igmphdr));
      +	/*
      +	   Check that:
      +	   a. packet is really destinted to a multicast group
      +	   b. packet is not a NULL-REGISTER
      +	   c. packet is not truncated
      +	 */
       	if (!ipv4_is_multicast(encap->daddr) ||
       	    encap->tot_len == 0 ||
       	    ntohs(encap->tot_len) + sizeof(*pim) > skb->len)
      @@ -40,9 +45,9 @@
       	skb->ip_summed = 0;
       	skb->pkt_type = PACKET_HOST;
       	dst_release(skb->dst);
      +	skb->dst = NULL;
       	reg_dev->stats.rx_bytes += skb->len;
       	reg_dev->stats.rx_packets++;
      -	skb->dst = NULL;
       	nf_reset(skb);
       	netif_rx(skb);
       	dev_put(reg_dev);
      
      $ codiff net/ipv4/ipmr.o.old net/ipv4/ipmr.o.new
      
      net/ipv4/ipmr.c:
        pim_rcv_v1 | -283
        pim_rcv    | -284
       2 functions changed, 567 bytes removed
      
      net/ipv4/ipmr.c:
        __pim_rcv | +307
       1 function changed, 307 bytes added
      
      net/ipv4/ipmr.o.new:
       3 functions changed, 307 bytes added, 567 bytes removed, diff: -260
      
      (Tested on x86_64).
      
      It seems that pimlen arg could be left out as well and
      eq-sizedness of structs trapped with BUILD_BUG_ON but
      I don't think that's more than a cosmetic flaw since there
      aren't that many args anyway.
      
      Compile tested.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1879204
    • H
      tcp: Add GRO support · bf296b12
      Herbert Xu 提交于
      This patch adds the TCP-specific portion of GRO.  The criterion for
      merging is extremely strict (the TCP header must match exactly apart
      from the checksum) so as to allow refragmentation.  Otherwise this
      is pretty much identical to LRO, except that we support the merging
      of ECN packets.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf296b12
    • H
      ipv4: Add GRO infrastructure · 73cc19f1
      Herbert Xu 提交于
      This patch adds GRO support for IPv4.
      
      The criteria for merging is more stringent than LRO, in particular,
      we require all fields in the IP header to be identical except for
      the length, ID and checksum.  In addition, the ID must form an
      arithmetic sequence with a difference of one.
      
      The ID requirement might seem overly strict, however, most hardware
      TSO solutions already obey this rule.  Linux itself also obeys this
      whether GSO is in use or not.
      
      In future we could relax this rule by storing the IDs (or rather
      making sure that we don't drop them when pulling the aggregate
      skb's tail).
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73cc19f1
  16. 15 12月, 2008 2 次提交
  17. 09 12月, 2008 1 次提交
    • D
      tcp: tcp_vegas cong avoid fix · 8d3a564d
      Doug Leith 提交于
      This patch addresses a book-keeping issue in tcp_vegas.c.  At present
      tcp_vegas does separate book-keeping of cwnd based on packet sequence
      numbers.  A mismatch can develop between this book-keeping and
      tp->snd_cwnd due, for example, to delayed acks acking multiple
      packets.  When vegas transitions to reno operation (e.g. following
      loss), then this mismatch leads to incorrect behaviour (akin to a cwnd
      backoff).  This seems mostly to affect operation at low cwnds where
      delayed acking can lead to a significant fraction of cwnd being
      covered by a single ack, leading to the book-keeping mismatch.  This
      patch modifies the congestion avoidance update to avoid the need for
      separate book-keeping while leaving vegas congestion avoidance
      functionally unchanged.  A secondary advantage of this modification is
      that the use of fixed-point (via V_PARAM_SHIFT) and 64 bit arithmetic
      is no longer necessary, simplifying the code.
      
      Some example test measurements with the patched code (confirming no functional
      change in the congestion avoidance algorithm) can be seen at:
      
      http://www.hamilton.ie/doug/vegaspatch/Signed-off-by: NDoug Leith <doug.leith@nuim.ie>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d3a564d
  18. 06 12月, 2008 10 次提交