1. 26 1月, 2011 1 次提交
    • J
      TCP: fix a bug that triggers large number of TCP RST by mistake · 44f5324b
      Jerry Chu 提交于
      This patch fixes a bug that causes TCP RST packets to be generated
      on otherwise correctly behaved applications, e.g., no unread data
      on close,..., etc. To trigger the bug, at least two conditions must
      be met:
      
      1. The FIN flag is set on the last data packet, i.e., it's not on a
      separate, FIN only packet.
      2. The size of the last data chunk on the receive side matches
      exactly with the size of buffer posted by the receiver, and the
      receiver closes the socket without any further read attempt.
      
      This bug was first noticed on our netperf based testbed for our IW10
      proposal to IETF where a large number of RST packets were observed.
      netperf's read side code meets the condition 2 above 100%.
      
      Before the fix, tcp_data_queue() will queue the last skb that meets
      condition 1 to sk_receive_queue even though it has fully copied out
      (skb_copy_datagram_iovec()) the data. Then if condition 2 is also met,
      tcp_recvmsg() often returns all the copied out data successfully
      without actually consuming the skb, due to a check
      "if ((chunk = len - tp->ucopy.len) != 0) {"
      and
      "len -= chunk;"
      after tcp_prequeue_process() that causes "len" to become 0 and an
      early exit from the big while loop.
      
      I don't see any reason not to free the skb whose data have been fully
      consumed in tcp_data_queue(), regardless of the FIN flag.  We won't
      get there if MSG_PEEK is on. Am I missing some arcane cases related
      to urgent data?
      Signed-off-by: NH.K. Jerry Chu <hkchu@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44f5324b
  2. 25 1月, 2011 3 次提交
  3. 20 1月, 2011 1 次提交
  4. 12 1月, 2011 2 次提交
  5. 11 1月, 2011 2 次提交
  6. 10 1月, 2011 1 次提交
  7. 07 1月, 2011 2 次提交
    • P
      netfilter: fix export secctx error handling · cba85b53
      Pablo Neira Ayuso 提交于
      In 1ae4de0c, the secctx was exported
      via the /proc/net/netfilter/nf_conntrack and ctnetlink interfaces
      instead of the secmark.
      
      That patch introduced the use of security_secid_to_secctx() which may
      return a non-zero value on error.
      
      In one of my setups, I have NF_CONNTRACK_SECMARK enabled but no
      security modules. Thus, security_secid_to_secctx() returns a negative
      value that results in the breakage of the /proc and `conntrack -L'
      outputs. To fix this, we skip the inclusion of secctx if the
      aforementioned function fails.
      
      This patch also fixes the dynamic netlink message size calculation
      if security_secid_to_secctx() returns an error, since its logic is
      also wrong.
      
      This problem exists in Linux kernel >= 2.6.37.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cba85b53
    • E
      ipv4: IP defragmentation must be ECN aware · 6623e3b2
      Eric Dumazet 提交于
      RFC3168 (The Addition of Explicit Congestion Notification to IP)
      states :
      
      5.3.  Fragmentation
      
         ECN-capable packets MAY have the DF (Don't Fragment) bit set.
         Reassembly of a fragmented packet MUST NOT lose indications of
         congestion.  In other words, if any fragment of an IP packet to be
         reassembled has the CE codepoint set, then one of two actions MUST be
         taken:
      
            * Set the CE codepoint on the reassembled packet.  However, this
              MUST NOT occur if any of the other fragments contributing to
              this reassembly carries the Not-ECT codepoint.
      
            * The packet is dropped, instead of being reassembled, for any
              other reason.
      
      This patch implements this requirement for IPv4, choosing the first
      action :
      
      If one fragment had NO-ECT codepoint
              reassembled frame has NO-ECT
      ElIf one fragment had CE codepoint
              reassembled frame has CE
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6623e3b2
  8. 05 1月, 2011 1 次提交
    • J
      ipv4/route.c: respect prefsrc for local routes · 9fc3bbb4
      Joel Sing 提交于
      The preferred source address is currently ignored for local routes,
      which results in all local connections having a src address that is the
      same as the local dst address. Fix this by respecting the preferred source
      address when it is provided for local routes.
      
      This bug can be demonstrated as follows:
      
       # ifconfig dummy0 192.168.0.1
       # ip route show table local | grep local.*dummy0
       local 192.168.0.1 dev dummy0  proto kernel  scope host  src 192.168.0.1
       # ip route change table local local 192.168.0.1 dev dummy0 \
           proto kernel scope host src 127.0.0.1
       # ip route show table local | grep local.*dummy0
       local 192.168.0.1 dev dummy0  proto kernel  scope host  src 127.0.0.1
      
      We now establish a local connection and verify the source IP
      address selection:
      
       # nc -l 192.168.0.1 3128 &
       # nc 192.168.0.1 3128 &
       # netstat -ant | grep 192.168.0.1:3128.*EST
       tcp        0      0 192.168.0.1:3128        192.168.0.1:33228 ESTABLISHED
       tcp        0      0 192.168.0.1:33228       192.168.0.1:3128  ESTABLISHED
      Signed-off-by: NJoel Sing <jsing@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9fc3bbb4
  9. 26 12月, 2010 1 次提交
  10. 24 12月, 2010 3 次提交
  11. 21 12月, 2010 2 次提交
  12. 17 12月, 2010 2 次提交
  13. 15 12月, 2010 1 次提交
  14. 14 12月, 2010 2 次提交
  15. 13 12月, 2010 2 次提交
    • D
      ipv4: Don't pre-seed hoplimit metric. · 323e126f
      David S. Miller 提交于
      Always go through a new ip4_dst_hoplimit() helper, just like ipv6.
      
      This allowed several simplifications:
      
      1) The interim dst_metric_hoplimit() can go as it's no longer
         userd.
      
      2) The sysctl_ip_default_ttl entry no longer needs to use
         ipv4_doint_and_flush, since the sysctl is not cached in
         routing cache metrics any longer.
      
      3) ipv4_doint_and_flush no longer needs to be exported and
         therefore can be marked static.
      
      When ipv4_doint_and_flush_strategy was removed some time ago,
      the external declaration in ip.h was mistakenly left around
      so kill that off too.
      
      We have to move the sysctl_ip_default_ttl declaration into
      ipv4's route cache definition header net/route.h, because
      currently net/ip.h (where the declaration lives now) has
      a back dependency on net/route.h
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      323e126f
    • D
      5170ae82
  16. 11 12月, 2010 1 次提交
  17. 10 12月, 2010 2 次提交
    • E
      net: optimize INET input path further · 68835aba
      Eric Dumazet 提交于
      Followup of commit b178bb3d (net: reorder struct sock fields)
      
      Optimize INET input path a bit further, by :
      
      1) moving sk_refcnt close to sk_lock.
      
      This reduces number of dirtied cache lines by one on 64bit arches (and
      64 bytes cache line size).
      
      2) moving inet_daddr & inet_rcv_saddr at the beginning of sk
      
      (same cache line than hash / family / bound_dev_if / nulls_node)
      
      This reduces number of accessed cache lines in lookups by one, and dont
      increase size of inet and timewait socks.
      inet and tw sockets now share same place-holder for these fields.
      
      Before patch :
      
      offsetof(struct sock, sk_refcnt) = 0x10
      offsetof(struct sock, sk_lock) = 0x40
      offsetof(struct sock, sk_receive_queue) = 0x60
      offsetof(struct inet_sock, inet_daddr) = 0x270
      offsetof(struct inet_sock, inet_rcv_saddr) = 0x274
      
      After patch :
      
      offsetof(struct sock, sk_refcnt) = 0x44
      offsetof(struct sock, sk_lock) = 0x48
      offsetof(struct sock, sk_receive_queue) = 0x68
      offsetof(struct inet_sock, inet_daddr) = 0x0
      offsetof(struct inet_sock, inet_rcv_saddr) = 0x4
      
      compute_score() (udp or tcp) now use a single cache line per ignored
      item, instead of two.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68835aba
    • D
      net: Abstract away all dst_entry metrics accesses. · defb3519
      David S. Miller 提交于
      Use helper functions to hide all direct accesses, especially writes,
      to dst_entry metrics values.
      
      This will allow us to:
      
      1) More easily change how the metrics are stored.
      
      2) Implement COW for metrics.
      
      In particular this will help us put metrics into the inetpeer
      cache if that is what we end up doing.  We can make the _metrics
      member a pointer instead of an array, initially have it point
      at the read-only metrics in the FIB, and then on the first set
      grab an inetpeer entry and point the _metrics member there.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      defb3519
  18. 09 12月, 2010 5 次提交
  19. 07 12月, 2010 2 次提交
  20. 03 12月, 2010 1 次提交
  21. 02 12月, 2010 3 次提交