1. 13 10月, 2009 1 次提交
  2. 15 9月, 2009 1 次提交
  3. 03 9月, 2009 1 次提交
    • W
      tcp: replace hard coded GFP_KERNEL with sk_allocation · aa133076
      Wu Fengguang 提交于
      This fixed a lockdep warning which appeared when doing stress
      memory tests over NFS:
      
      	inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
      
      	page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock
      
      	mount_root => nfs_root_data => tcp_close => lock sk_lock =>
      			tcp_send_fin => alloc_skb_fclone => page reclaim
      
      David raised a concern that if the allocation fails in tcp_send_fin(), and it's
      GFP_ATOMIC, we are going to yield() (which sleeps) and loop endlessly waiting
      for the allocation to succeed.
      
      But fact is, the original GFP_KERNEL also sleeps. GFP_ATOMIC+yield() looks
      weird, but it is no worse the implicit sleep inside GFP_KERNEL. Both could
      loop endlessly under memory pressure.
      
      CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      CC: David S. Miller <davem@davemloft.net>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa133076
  4. 02 9月, 2009 2 次提交
  5. 01 9月, 2009 2 次提交
  6. 20 7月, 2009 2 次提交
  7. 03 6月, 2009 2 次提交
  8. 06 5月, 2009 1 次提交
  9. 27 4月, 2009 1 次提交
  10. 28 3月, 2009 1 次提交
    • P
      lsm: Relocate the IPv4 security_inet_conn_request() hooks · 284904aa
      Paul Moore 提交于
      The current placement of the security_inet_conn_request() hooks do not allow
      individual LSMs to override the IP options of the connection's request_sock.
      This is a problem as both SELinux and Smack have the ability to use labeled
      networking protocols which make use of IP options to carry security attributes
      and the inability to set the IP options at the start of the TCP handshake is
      problematic.
      
      This patch moves the IPv4 security_inet_conn_request() hooks past the code
      where the request_sock's IP options are set/reset so that the LSM can safely
      manipulate the IP options as needed.  This patch intentionally does not change
      the related IPv6 hooks as IPv6 based labeling protocols which use IPv6 options
      are not currently implemented, once they are we will have a better idea of
      the correct placement for the IPv6 hooks.
      Signed-off-by: NPaul Moore <paul.moore@hp.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      284904aa
  11. 12 3月, 2009 1 次提交
  12. 03 3月, 2009 1 次提交
  13. 23 2月, 2009 1 次提交
  14. 30 1月, 2009 1 次提交
    • H
      gro: Avoid copying headers of unmerged packets · 86911732
      Herbert Xu 提交于
      Unfortunately simplicity isn't always the best.  The fraginfo
      interface turned out to be suboptimal.  The problem was quite
      obvious.  For every packet, we have to copy the headers from
      the frags structure into skb->head, even though for 99% of the
      packets this part is immediately thrown away after the merge.
      
      LRO didn't have this problem because it directly read the headers
      from the frags structure.
      
      This patch attempts to address this by creating an interface
      that allows GRO to access the headers in the first frag without
      having to copy it.  Because all drivers that use frags place the
      headers in the first frag this optimisation should be enough.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86911732
  15. 07 1月, 2009 1 次提交
  16. 30 12月, 2008 1 次提交
  17. 16 12月, 2008 1 次提交
  18. 26 11月, 2008 1 次提交
  19. 24 11月, 2008 1 次提交
    • E
      net: Convert TCP/DCCP listening hash tables to use RCU · c25eb3bf
      Eric Dumazet 提交于
      This is the last step to be able to perform full RCU lookups
      in __inet_lookup() : After established/timewait tables, we
      add RCU lookups to listening hash table.
      
      The only trick here is that a socket of a given type (TCP ipv4,
      TCP ipv6, ...) can now flight between two different tables
      (established and listening) during a RCU grace period, so we
      must use different 'nulls' end-of-chain values for two tables.
      
      We define a large value :
      
      #define LISTENING_NULLS_BASE (1U << 29)
      
      So that slots in listening table are guaranteed to have different
      end-of-chain values than slots in established table. A reader can
      still detect it finished its lookup in the right chain.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c25eb3bf
  20. 21 11月, 2008 1 次提交
  21. 20 11月, 2008 2 次提交
  22. 17 11月, 2008 1 次提交
    • E
      net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls · 3ab5aee7
      Eric Dumazet 提交于
      RCU was added to UDP lookups, using a fast infrastructure :
      - sockets kmem_cache use SLAB_DESTROY_BY_RCU and dont pay the
        price of call_rcu() at freeing time.
      - hlist_nulls permits to use few memory barriers.
      
      This patch uses same infrastructure for TCP/DCCP established
      and timewait sockets.
      
      Thanks to SLAB_DESTROY_BY_RCU, no slowdown for applications
      using short lived TCP connections. A followup patch, converting
      rwlocks to spinlocks will even speedup this case.
      
      __inet_lookup_established() is pretty fast now we dont have to
      dirty a contended cache line (read_lock/read_unlock)
      
      Only established and timewait hashtable are converted to RCU
      (bind table and listen table are still using traditional locking)
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3ab5aee7
  23. 03 11月, 2008 1 次提交
  24. 31 10月, 2008 1 次提交
  25. 10 10月, 2008 1 次提交
  26. 09 10月, 2008 1 次提交
    • I
      tcp: fix length used for checksum in a reset · 52cd5750
      Ilpo Järvinen 提交于
      While looking for some common code I came across difference
      in checksum calculation between tcp_v6_send_(reset|ack) I
      couldn't explain. I checked both v4 and v6 and found out that
      both seem to have the same "feature". I couldn't find anything
      in rfc nor anywhere else which would state that md5 option
      should be ignored like it was in case of reset so I came to
      a conclusion that this is probably a genuine bug. I suspect
      that addition of md5 just was fooled by the excessive
      copy-paste code in those functions and the reset part was
      never tested well enough to find out the problem.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52cd5750
  27. 08 10月, 2008 1 次提交
  28. 01 10月, 2008 2 次提交
    • K
      tcp: Handle TCP SYN+ACK/ACK/RST transparency · 88ef4a5a
      KOVACS Krisztian 提交于
      The TCP stack sends out SYN+ACK/ACK/RST reply packets in response to
      incoming packets. The non-local source address check on output bites
      us again, as replies for transparently redirected traffic won't have a
      chance to leave the node.
      
      This patch selectively sets the FLOWI_FLAG_ANYSRC flag when doing the
      route lookup for those replies. Transparent replies are enabled if the
      listening socket has the transparent socket flag set.
      Signed-off-by: NKOVACS Krisztian <hidden@sch.bme.hu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88ef4a5a
    • V
      tcp: Fix NULL dereference in tcp_4_send_ack() · 4dd7972d
      Vitaliy Gusev 提交于
      Fix NULL dereference in tcp_4_send_ack().
      
      As skb->dev is reset to NULL in tcp_v4_rcv() thus OOPS occurs:
      
      BUG: unable to handle kernel NULL pointer dereference at 00000000000004d0
      IP: [<ffffffff80498503>] tcp_v4_send_ack+0x203/0x250
      
      Stack:  ffff810005dbb000 ffff810015c8acc0 e77b2c6e5f861600 a01610802e90cb6d
       0a08010100000000 88afffff88afffff 0000000080762be8 0000000115c872e8
       0004122000000000 0000000000000001 ffffffff80762b88 0000000000000020
      Call Trace:
       <IRQ>  [<ffffffff80499c33>] tcp_v4_reqsk_send_ack+0x20/0x22
       [<ffffffff8049bce5>] tcp_check_req+0x108/0x14c
       [<ffffffff8047aaf7>] ? rt_intern_hash+0x322/0x33c
       [<ffffffff80499846>] tcp_v4_do_rcv+0x399/0x4ec
       [<ffffffff8045ce4b>] ? skb_checksum+0x4f/0x272
       [<ffffffff80485b74>] ? __inet_lookup_listener+0x14a/0x15c
       [<ffffffff8049babc>] tcp_v4_rcv+0x6a1/0x701
       [<ffffffff8047e739>] ip_local_deliver_finish+0x157/0x24a
       [<ffffffff8047ec9a>] ip_local_deliver+0x72/0x7c
       [<ffffffff8047e5bd>] ip_rcv_finish+0x38d/0x3b2
       [<ffffffff803d3548>] ? scsi_io_completion+0x19d/0x39e
       [<ffffffff8047ebe5>] ip_rcv+0x2a2/0x2e5
       [<ffffffff80462faa>] netif_receive_skb+0x293/0x303
       [<ffffffff80465a9b>] process_backlog+0x80/0xd0
       [<ffffffff802630b4>] ? __rcu_process_callbacks+0x125/0x1b4
       [<ffffffff8046560e>] net_rx_action+0xb9/0x17f
       [<ffffffff80234cc5>] __do_softirq+0xa3/0x164
       [<ffffffff8020c52c>] call_softirq+0x1c/0x28
       <EOI>  [<ffffffff8020de1c>] do_softirq+0x34/0x72
       [<ffffffff80234b8e>] local_bh_enable_ip+0x3f/0x50
       [<ffffffff804d43ca>] _spin_unlock_bh+0x12/0x14
       [<ffffffff804599cd>] release_sock+0xb8/0xc1
       [<ffffffff804a6f9a>] inet_stream_connect+0x146/0x25c
       [<ffffffff80243078>] ? autoremove_wake_function+0x0/0x38
       [<ffffffff8045751f>] sys_connect+0x68/0x8e
       [<ffffffff80291818>] ? fd_install+0x5f/0x68
       [<ffffffff80457784>] ? sock_map_fd+0x55/0x62
       [<ffffffff8020b39b>] system_call_after_swapgs+0x7b/0x80
      
      Code: 41 10 11 d0 83 d0 00 4d 85 ed 89 45 c0 c7 45 c4 08 00 00 00 74 07 41 8b 45 04 89 45 c8 48 8b 43 20 8b 4d b8 48 8d 55 b0 48 89 de <48> 8b 80 d0 04 00 00 48 8b b8 60 01 00 00 e8 20 ae fe ff 65 48
      RIP  [<ffffffff80498503>] tcp_v4_send_ack+0x203/0x250
       RSP <ffffffff80762b78>
      CR2: 00000000000004d0
      Signed-off-by: NVitaliy Gusev <vgusev@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4dd7972d
  29. 21 9月, 2008 1 次提交
    • T
      tcp: advertise MSS requested by user · f5fff5dc
      Tom Quetchenbach 提交于
      I'm trying to use the TCP_MAXSEG option to setsockopt() to set the MSS
      for both sides of a bidirectional connection.
      
      man tcp says: "If this option is set before connection establishment, it
      also changes the MSS value announced to the other end in the initial
      packet."
      
      However, the kernel only uses the MTU/route cache to set the advertised
      MSS. That means if I set the MSS to, say, 500 before calling connect(),
      I will send at most 500-byte packets, but I will still receive 1500-byte
      packets in reply.
      
      This is a bug, either in the kernel or the documentation.
      
      This patch (applies to latest net-2.6) reduces the advertised value to
      that requested by the user as long as setsockopt() is called before
      connect() or accept(). This seems like the behavior that one would
      expect as well as that which is documented.
      
      I've tried to make sure that things that depend on the advertised MSS
      are set correctly.
      Signed-off-by: NTom Quetchenbach <virtualphtn@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5fff5dc
  30. 09 9月, 2008 1 次提交
    • D
      netns : fix kernel panic in timewait socket destruction · d315492b
      Daniel Lezcano 提交于
      How to reproduce ?
       - create a network namespace
       - use tcp protocol and get timewait socket
       - exit the network namespace
       - after a moment (when the timewait socket is destroyed), the kernel
         panics.
      
      # BUG: unable to handle kernel NULL pointer dereference at
      0000000000000007
      IP: [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8
      PGD 119985067 PUD 11c5c0067 PMD 0
      Oops: 0000 [1] SMP
      CPU 1
      Modules linked in: ipv6 button battery ac loop dm_mod tg3 libphy ext3 jbd
      edd fan thermal processor thermal_sys sg sata_svw libata dock serverworks
      sd_mod scsi_mod ide_disk ide_core [last unloaded: freq_table]
      Pid: 0, comm: swapper Not tainted 2.6.27-rc2 #3
      RIP: 0010:[<ffffffff821e394d>] [<ffffffff821e394d>]
      inet_twdr_do_twkill_work+0x6e/0xb8
      RSP: 0018:ffff88011ff7fed0 EFLAGS: 00010246
      RAX: ffffffffffffffff RBX: ffffffff82339420 RCX: ffff88011ff7ff30
      RDX: 0000000000000001 RSI: ffff88011a4d03c0 RDI: ffff88011ac2fc00
      RBP: ffffffff823392e0 R08: 0000000000000000 R09: ffff88002802a200
      R10: ffff8800a5c4b000 R11: ffffffff823e4080 R12: ffff88011ac2fc00
      R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
      FS: 0000000041cbd940(0000) GS:ffff8800bff839c0(0000)
      knlGS:0000000000000000
      CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      CR2: 0000000000000007 CR3: 00000000bd87c000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process swapper (pid: 0, threadinfo ffff8800bff9e000, task
      ffff88011ff76690)
      Stack: ffffffff823392e0 0000000000000100 ffffffff821e3a3a
      0000000000000008
      0000000000000000 ffffffff821e3a61 ffff8800bff7c000 ffffffff8203c7e7
      ffff88011ff7ff10 ffff88011ff7ff10 0000000000000021 ffffffff82351108
      Call Trace:
      <IRQ> [<ffffffff821e3a3a>] ? inet_twdr_hangman+0x0/0x9e
      [<ffffffff821e3a61>] ? inet_twdr_hangman+0x27/0x9e
      [<ffffffff8203c7e7>] ? run_timer_softirq+0x12c/0x193
      [<ffffffff820390d1>] ? __do_softirq+0x5e/0xcd
      [<ffffffff8200d08c>] ? call_softirq+0x1c/0x28
      [<ffffffff8200e611>] ? do_softirq+0x2c/0x68
      [<ffffffff8201a055>] ? smp_apic_timer_interrupt+0x8e/0xa9
      [<ffffffff8200cad6>] ? apic_timer_interrupt+0x66/0x70
      <EOI> [<ffffffff82011f4c>] ? default_idle+0x27/0x3b
      [<ffffffff8200abbd>] ? cpu_idle+0x5f/0x7d
      
      
      Code: e8 01 00 00 4c 89 e7 41 ff c5 e8 8d fd ff ff 49 8b 44 24 38 4c 89 e7
      65 8b 14 25 24 00 00 00 89 d2 48 8b 80 e8 00 00 00 48 f7 d0 <48> 8b 04 d0
      48 ff 40 58 e8 fc fc ff ff 48 89 df e8 c0 5f 04 00
      RIP [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8
      RSP <ffff88011ff7fed0>
      CR2: 0000000000000007
      
      This patch provides a function to purge all timewait sockets related
      to a network namespace. The timewait sockets life cycle is not tied with
      the network namespace, that means the timewait sockets stay alive while
      the network namespace dies. The timewait sockets are for avoiding to
      receive a duplicate packet from the network, if the network namespace is
      freed, the network stack is removed, so no chance to receive any packets
      from the outside world. Furthermore, having a pending destruction timer
      on these sockets with a network namespace freed is not safe and will lead
      to an oops if the timer callback which try to access data belonging to 
      the namespace like for example in:
      	inet_twdr_do_twkill_work
      		-> NET_INC_STATS_BH(twsk_net(tw), LINUX_MIB_TIMEWAITED);
      
      Purging the timewait sockets at the network namespace destruction will:
       1) speed up memory freeing for the namespace
       2) fix kernel panic on asynchronous timewait destruction
      Signed-off-by: NDaniel Lezcano <dlezcano@fr.ibm.com>
      Acked-by: NDenis V. Lunev <den@openvz.org>
      Acked-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d315492b
  31. 28 8月, 2008 1 次提交
    • A
      tcp: Skip empty hash buckets faster in /proc/net/tcp · 6eac5604
      Andi Kleen 提交于
      On most systems most of the TCP established/time-wait hash buckets are empty.
      When walking the hash table for /proc/net/tcp their read locks would
      always be aquired just to find out they're empty. This patch changes the code
      to check first if the buckets have any entries before taking the lock, which
      is much cheaper than taking a lock. Since the hash tables are large
      this makes a measurable difference on processing /proc/net/tcp, 
      especially on architectures with slow read_lock (e.g. PPC) 
      
      On a 2GB Core2 system time cat /proc/net/tcp > /dev/null (with a mostly
      empty hash table) goes from 0.046s to 0.005s.
      
      On systems with slower atomics (like P4 or POWER4) or larger hash tables
      (more RAM) the difference is much higher.
      
      This can be noticeable because there are some daemons around who regularly
      scan /proc/net/tcp.
      
      Original idea for this patch from Marcus Meissner, but redone by me.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6eac5604
  32. 07 8月, 2008 1 次提交
    • G
      tcp: Fix kernel panic when calling tcp_v(4/6)_md5_do_lookup · 6edafaaf
      Gui Jianfeng 提交于
      If the following packet flow happen, kernel will panic.
      MathineA			MathineB
      		SYN
      	---------------------->    
              	SYN+ACK
      	<----------------------
      		ACK(bad seq)
      	---------------------->
      When a bad seq ACK is received, tcp_v4_md5_do_lookup(skb->sk, ip_hdr(skb)->daddr))
      is finally called by tcp_v4_reqsk_send_ack(), but the first parameter(skb->sk) is 
      NULL at that moment, so kernel panic happens.
      This patch fixes this bug.
      
      OOPS output is as following:
      [  302.812793] IP: [<c05cfaa6>] tcp_v4_md5_do_lookup+0x12/0x42
      [  302.817075] Oops: 0000 [#1] SMP 
      [  302.819815] Modules linked in: ipv6 loop dm_multipath rtc_cmos rtc_core rtc_lib pcspkr pcnet32 mii i2c_piix4 parport_pc i2c_core parport ac button ata_piix libata dm_mod mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan]
      [  302.849946] 
      [  302.851198] Pid: 0, comm: swapper Not tainted (2.6.27-rc1-guijf #5)
      [  302.855184] EIP: 0060:[<c05cfaa6>] EFLAGS: 00010296 CPU: 0
      [  302.858296] EIP is at tcp_v4_md5_do_lookup+0x12/0x42
      [  302.861027] EAX: 0000001e EBX: 00000000 ECX: 00000046 EDX: 00000046
      [  302.864867] ESI: ceb69e00 EDI: 1467a8c0 EBP: cf75f180 ESP: c0792e54
      [  302.868333]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
      [  302.871287] Process swapper (pid: 0, ti=c0792000 task=c0712340 task.ti=c0746000)
      [  302.875592] Stack: c06f413a 00000000 cf75f180 ceb69e00 00000000 c05d0d86 000016d0 ceac5400 
      [  302.883275]        c05d28f8 000016d0 ceb69e00 ceb69e20 681bf6e3 00001000 00000000 0a67a8c0 
      [  302.890971]        ceac5400 c04250a3 c06f413a c0792eb0 c0792edc cf59a620 cf59a620 cf59a634 
      [  302.900140] Call Trace:
      [  302.902392]  [<c05d0d86>] tcp_v4_reqsk_send_ack+0x17/0x35
      [  302.907060]  [<c05d28f8>] tcp_check_req+0x156/0x372
      [  302.910082]  [<c04250a3>] printk+0x14/0x18
      [  302.912868]  [<c05d0aa1>] tcp_v4_do_rcv+0x1d3/0x2bf
      [  302.917423]  [<c05d26be>] tcp_v4_rcv+0x563/0x5b9
      [  302.920453]  [<c05bb20f>] ip_local_deliver_finish+0xe8/0x183
      [  302.923865]  [<c05bb10a>] ip_rcv_finish+0x286/0x2a3
      [  302.928569]  [<c059e438>] dev_alloc_skb+0x11/0x25
      [  302.931563]  [<c05a211f>] netif_receive_skb+0x2d6/0x33a
      [  302.934914]  [<d0917941>] pcnet32_poll+0x333/0x680 [pcnet32]
      [  302.938735]  [<c05a3b48>] net_rx_action+0x5c/0xfe
      [  302.941792]  [<c042856b>] __do_softirq+0x5d/0xc1
      [  302.944788]  [<c042850e>] __do_softirq+0x0/0xc1
      [  302.948999]  [<c040564b>] do_softirq+0x55/0x88
      [  302.951870]  [<c04501b1>] handle_fasteoi_irq+0x0/0xa4
      [  302.954986]  [<c04284da>] irq_exit+0x35/0x69
      [  302.959081]  [<c0405717>] do_IRQ+0x99/0xae
      [  302.961896]  [<c040422b>] common_interrupt+0x23/0x28
      [  302.966279]  [<c040819d>] default_idle+0x2a/0x3d
      [  302.969212]  [<c0402552>] cpu_idle+0xb2/0xd2
      [  302.972169]  =======================
      [  302.974274] Code: fc ff 84 d2 0f 84 df fd ff ff e9 34 fe ff ff 83 c4 0c 5b 5e 5f 5d c3 90 90 57 89 d7 56 53 89 c3 50 68 3a 41 6f c0 e8 e9 55 e5 ff <8b> 93 9c 04 00 00 58 85 d2 59 74 1e 8b 72 10 31 db 31 c9 85 f6 
      [  303.011610] EIP: [<c05cfaa6>] tcp_v4_md5_do_lookup+0x12/0x42 SS:ESP 0068:c0792e54
      [  303.018360] Kernel panic - not syncing: Fatal exception in interrupt
      Signed-off-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6edafaaf
  33. 01 8月, 2008 1 次提交
  34. 30 7月, 2008 1 次提交