1. 03 12月, 2009 2 次提交
    • W
      TCPCT part 1d: define TCP cookie option, extend existing struct's · 435cf559
      William Allen Simpson 提交于
      Data structures are carefully composed to require minimal additions.
      For example, the struct tcp_options_received cookie_plus variable fits
      between existing 16-bit and 8-bit variables, requiring no additional
      space (taking alignment into consideration).  There are no additions to
      tcp_request_sock, and only 1 pointer in tcp_sock.
      
      This is a significantly revised implementation of an earlier (year-old)
      patch that no longer applies cleanly, with permission of the original
      author (Adam Langley):
      
          http://thread.gmane.org/gmane.linux.network/102586
      
      The principle difference is using a TCP option to carry the cookie nonce,
      instead of a user configured offset in the data.  This is more flexible and
      less subject to user configuration error.  Such a cookie option has been
      suggested for many years, and is also useful without SYN data, allowing
      several related concepts to use the same extension option.
      
          "Re: SYN floods (was: does history repeat itself?)", September 9, 1996.
          http://www.merit.net/mail.archives/nanog/1996-09/msg00235.html
      
          "Re: what a new TCP header might look like", May 12, 1998.
          ftp://ftp.isi.edu/end2end/end2end-interest-1998.mail
      
      These functions will also be used in subsequent patches that implement
      additional features.
      
      Requires:
         TCPCT part 1a: add request_values parameter for sending SYNACK
         TCPCT part 1b: generate Responder Cookie secret
         TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS
      
      Signed-off-by: William.Allen.Simpson@gmail.com
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      435cf559
    • W
      TCPCT part 1a: add request_values parameter for sending SYNACK · e6b4d113
      William Allen Simpson 提交于
      Add optional function parameters associated with sending SYNACK.
      These parameters are not needed after sending SYNACK, and are not
      used for retransmission.  Avoids extending struct tcp_request_sock,
      and avoids allocating kernel memory.
      
      Also affects DCCP as it uses common struct request_sock_ops,
      but this parameter is currently reserved for future use.
      
      Signed-off-by: William.Allen.Simpson@gmail.com
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6b4d113
  2. 14 11月, 2009 2 次提交
  3. 29 10月, 2009 1 次提交
  4. 19 10月, 2009 1 次提交
    • E
      inet: rename some inet_sock fields · c720c7e8
      Eric Dumazet 提交于
      In order to have better cache layouts of struct sock (separate zones
      for rx/tx paths), we need this preliminary patch.
      
      Goal is to transfert fields used at lookup time in the first
      read-mostly cache line (inside struct sock_common) and move sk_refcnt
      to a separate cache line (only written by rx path)
      
      This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
      sport and id fields. This allows a future patch to define these
      fields as macros, like sk_refcnt, without name clashes.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c720c7e8
  5. 13 10月, 2009 1 次提交
  6. 15 9月, 2009 1 次提交
  7. 03 9月, 2009 1 次提交
    • W
      tcp: replace hard coded GFP_KERNEL with sk_allocation · aa133076
      Wu Fengguang 提交于
      This fixed a lockdep warning which appeared when doing stress
      memory tests over NFS:
      
      	inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
      
      	page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock
      
      	mount_root => nfs_root_data => tcp_close => lock sk_lock =>
      			tcp_send_fin => alloc_skb_fclone => page reclaim
      
      David raised a concern that if the allocation fails in tcp_send_fin(), and it's
      GFP_ATOMIC, we are going to yield() (which sleeps) and loop endlessly waiting
      for the allocation to succeed.
      
      But fact is, the original GFP_KERNEL also sleeps. GFP_ATOMIC+yield() looks
      weird, but it is no worse the implicit sleep inside GFP_KERNEL. Both could
      loop endlessly under memory pressure.
      
      CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      CC: David S. Miller <davem@davemloft.net>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa133076
  8. 02 9月, 2009 2 次提交
  9. 01 9月, 2009 2 次提交
  10. 20 7月, 2009 2 次提交
  11. 03 6月, 2009 2 次提交
  12. 06 5月, 2009 1 次提交
  13. 27 4月, 2009 1 次提交
  14. 28 3月, 2009 1 次提交
    • P
      lsm: Relocate the IPv4 security_inet_conn_request() hooks · 284904aa
      Paul Moore 提交于
      The current placement of the security_inet_conn_request() hooks do not allow
      individual LSMs to override the IP options of the connection's request_sock.
      This is a problem as both SELinux and Smack have the ability to use labeled
      networking protocols which make use of IP options to carry security attributes
      and the inability to set the IP options at the start of the TCP handshake is
      problematic.
      
      This patch moves the IPv4 security_inet_conn_request() hooks past the code
      where the request_sock's IP options are set/reset so that the LSM can safely
      manipulate the IP options as needed.  This patch intentionally does not change
      the related IPv6 hooks as IPv6 based labeling protocols which use IPv6 options
      are not currently implemented, once they are we will have a better idea of
      the correct placement for the IPv6 hooks.
      Signed-off-by: NPaul Moore <paul.moore@hp.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      284904aa
  15. 12 3月, 2009 1 次提交
  16. 03 3月, 2009 1 次提交
  17. 23 2月, 2009 1 次提交
  18. 30 1月, 2009 1 次提交
    • H
      gro: Avoid copying headers of unmerged packets · 86911732
      Herbert Xu 提交于
      Unfortunately simplicity isn't always the best.  The fraginfo
      interface turned out to be suboptimal.  The problem was quite
      obvious.  For every packet, we have to copy the headers from
      the frags structure into skb->head, even though for 99% of the
      packets this part is immediately thrown away after the merge.
      
      LRO didn't have this problem because it directly read the headers
      from the frags structure.
      
      This patch attempts to address this by creating an interface
      that allows GRO to access the headers in the first frag without
      having to copy it.  Because all drivers that use frags place the
      headers in the first frag this optimisation should be enough.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86911732
  19. 07 1月, 2009 1 次提交
  20. 30 12月, 2008 1 次提交
  21. 16 12月, 2008 1 次提交
  22. 26 11月, 2008 1 次提交
  23. 24 11月, 2008 1 次提交
    • E
      net: Convert TCP/DCCP listening hash tables to use RCU · c25eb3bf
      Eric Dumazet 提交于
      This is the last step to be able to perform full RCU lookups
      in __inet_lookup() : After established/timewait tables, we
      add RCU lookups to listening hash table.
      
      The only trick here is that a socket of a given type (TCP ipv4,
      TCP ipv6, ...) can now flight between two different tables
      (established and listening) during a RCU grace period, so we
      must use different 'nulls' end-of-chain values for two tables.
      
      We define a large value :
      
      #define LISTENING_NULLS_BASE (1U << 29)
      
      So that slots in listening table are guaranteed to have different
      end-of-chain values than slots in established table. A reader can
      still detect it finished its lookup in the right chain.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c25eb3bf
  24. 21 11月, 2008 1 次提交
  25. 20 11月, 2008 2 次提交
  26. 17 11月, 2008 1 次提交
    • E
      net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls · 3ab5aee7
      Eric Dumazet 提交于
      RCU was added to UDP lookups, using a fast infrastructure :
      - sockets kmem_cache use SLAB_DESTROY_BY_RCU and dont pay the
        price of call_rcu() at freeing time.
      - hlist_nulls permits to use few memory barriers.
      
      This patch uses same infrastructure for TCP/DCCP established
      and timewait sockets.
      
      Thanks to SLAB_DESTROY_BY_RCU, no slowdown for applications
      using short lived TCP connections. A followup patch, converting
      rwlocks to spinlocks will even speedup this case.
      
      __inet_lookup_established() is pretty fast now we dont have to
      dirty a contended cache line (read_lock/read_unlock)
      
      Only established and timewait hashtable are converted to RCU
      (bind table and listen table are still using traditional locking)
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3ab5aee7
  27. 03 11月, 2008 1 次提交
  28. 31 10月, 2008 1 次提交
  29. 10 10月, 2008 1 次提交
  30. 09 10月, 2008 1 次提交
    • I
      tcp: fix length used for checksum in a reset · 52cd5750
      Ilpo Järvinen 提交于
      While looking for some common code I came across difference
      in checksum calculation between tcp_v6_send_(reset|ack) I
      couldn't explain. I checked both v4 and v6 and found out that
      both seem to have the same "feature". I couldn't find anything
      in rfc nor anywhere else which would state that md5 option
      should be ignored like it was in case of reset so I came to
      a conclusion that this is probably a genuine bug. I suspect
      that addition of md5 just was fooled by the excessive
      copy-paste code in those functions and the reset part was
      never tested well enough to find out the problem.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52cd5750
  31. 08 10月, 2008 1 次提交
  32. 01 10月, 2008 2 次提交
    • K
      tcp: Handle TCP SYN+ACK/ACK/RST transparency · 88ef4a5a
      KOVACS Krisztian 提交于
      The TCP stack sends out SYN+ACK/ACK/RST reply packets in response to
      incoming packets. The non-local source address check on output bites
      us again, as replies for transparently redirected traffic won't have a
      chance to leave the node.
      
      This patch selectively sets the FLOWI_FLAG_ANYSRC flag when doing the
      route lookup for those replies. Transparent replies are enabled if the
      listening socket has the transparent socket flag set.
      Signed-off-by: NKOVACS Krisztian <hidden@sch.bme.hu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88ef4a5a
    • V
      tcp: Fix NULL dereference in tcp_4_send_ack() · 4dd7972d
      Vitaliy Gusev 提交于
      Fix NULL dereference in tcp_4_send_ack().
      
      As skb->dev is reset to NULL in tcp_v4_rcv() thus OOPS occurs:
      
      BUG: unable to handle kernel NULL pointer dereference at 00000000000004d0
      IP: [<ffffffff80498503>] tcp_v4_send_ack+0x203/0x250
      
      Stack:  ffff810005dbb000 ffff810015c8acc0 e77b2c6e5f861600 a01610802e90cb6d
       0a08010100000000 88afffff88afffff 0000000080762be8 0000000115c872e8
       0004122000000000 0000000000000001 ffffffff80762b88 0000000000000020
      Call Trace:
       <IRQ>  [<ffffffff80499c33>] tcp_v4_reqsk_send_ack+0x20/0x22
       [<ffffffff8049bce5>] tcp_check_req+0x108/0x14c
       [<ffffffff8047aaf7>] ? rt_intern_hash+0x322/0x33c
       [<ffffffff80499846>] tcp_v4_do_rcv+0x399/0x4ec
       [<ffffffff8045ce4b>] ? skb_checksum+0x4f/0x272
       [<ffffffff80485b74>] ? __inet_lookup_listener+0x14a/0x15c
       [<ffffffff8049babc>] tcp_v4_rcv+0x6a1/0x701
       [<ffffffff8047e739>] ip_local_deliver_finish+0x157/0x24a
       [<ffffffff8047ec9a>] ip_local_deliver+0x72/0x7c
       [<ffffffff8047e5bd>] ip_rcv_finish+0x38d/0x3b2
       [<ffffffff803d3548>] ? scsi_io_completion+0x19d/0x39e
       [<ffffffff8047ebe5>] ip_rcv+0x2a2/0x2e5
       [<ffffffff80462faa>] netif_receive_skb+0x293/0x303
       [<ffffffff80465a9b>] process_backlog+0x80/0xd0
       [<ffffffff802630b4>] ? __rcu_process_callbacks+0x125/0x1b4
       [<ffffffff8046560e>] net_rx_action+0xb9/0x17f
       [<ffffffff80234cc5>] __do_softirq+0xa3/0x164
       [<ffffffff8020c52c>] call_softirq+0x1c/0x28
       <EOI>  [<ffffffff8020de1c>] do_softirq+0x34/0x72
       [<ffffffff80234b8e>] local_bh_enable_ip+0x3f/0x50
       [<ffffffff804d43ca>] _spin_unlock_bh+0x12/0x14
       [<ffffffff804599cd>] release_sock+0xb8/0xc1
       [<ffffffff804a6f9a>] inet_stream_connect+0x146/0x25c
       [<ffffffff80243078>] ? autoremove_wake_function+0x0/0x38
       [<ffffffff8045751f>] sys_connect+0x68/0x8e
       [<ffffffff80291818>] ? fd_install+0x5f/0x68
       [<ffffffff80457784>] ? sock_map_fd+0x55/0x62
       [<ffffffff8020b39b>] system_call_after_swapgs+0x7b/0x80
      
      Code: 41 10 11 d0 83 d0 00 4d 85 ed 89 45 c0 c7 45 c4 08 00 00 00 74 07 41 8b 45 04 89 45 c8 48 8b 43 20 8b 4d b8 48 8d 55 b0 48 89 de <48> 8b 80 d0 04 00 00 48 8b b8 60 01 00 00 e8 20 ae fe ff 65 48
      RIP  [<ffffffff80498503>] tcp_v4_send_ack+0x203/0x250
       RSP <ffffffff80762b78>
      CR2: 00000000000004d0
      Signed-off-by: NVitaliy Gusev <vgusev@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4dd7972d