1. 25 2月, 2009 1 次提交
  2. 30 1月, 2009 1 次提交
    • H
      gro: Avoid copying headers of unmerged packets · 86911732
      Herbert Xu 提交于
      Unfortunately simplicity isn't always the best.  The fraginfo
      interface turned out to be suboptimal.  The problem was quite
      obvious.  For every packet, we have to copy the headers from
      the frags structure into skb->head, even though for 99% of the
      packets this part is immediately thrown away after the merge.
      
      LRO didn't have this problem because it directly read the headers
      from the frags structure.
      
      This patch attempts to address this by creating an interface
      that allows GRO to access the headers in the first frag without
      having to copy it.  Because all drivers that use frags place the
      headers in the first frag this optimisation should be enough.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86911732
  3. 09 1月, 2009 1 次提交
  4. 07 1月, 2009 1 次提交
  5. 30 12月, 2008 1 次提交
  6. 26 11月, 2008 2 次提交
  7. 20 11月, 2008 1 次提交
  8. 17 11月, 2008 1 次提交
    • E
      net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls · 3ab5aee7
      Eric Dumazet 提交于
      RCU was added to UDP lookups, using a fast infrastructure :
      - sockets kmem_cache use SLAB_DESTROY_BY_RCU and dont pay the
        price of call_rcu() at freeing time.
      - hlist_nulls permits to use few memory barriers.
      
      This patch uses same infrastructure for TCP/DCCP established
      and timewait sockets.
      
      Thanks to SLAB_DESTROY_BY_RCU, no slowdown for applications
      using short lived TCP connections. A followup patch, converting
      rwlocks to spinlocks will even speedup this case.
      
      __inet_lookup_established() is pretty fast now we dont have to
      dirty a contended cache line (read_lock/read_unlock)
      
      Only established and timewait hashtable are converted to RCU
      (bind table and listen table are still using traditional locking)
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3ab5aee7
  9. 30 10月, 2008 1 次提交
  10. 29 10月, 2008 1 次提交
  11. 20 10月, 2008 1 次提交
  12. 10 10月, 2008 5 次提交
    • G
      tcpv6: fix error with CONFIG_TCP_MD5SIG disabled · fa3e5b4e
      Guo-Fu Tseng 提交于
      This patch fix error with CONFIG_TCP_MD5SIG disabled.
      Signed-off-by: NGuo-Fu Tseng <cooldavid@cooldavid.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa3e5b4e
    • I
      tcpv6: combine tcp_v6_send_(reset|ack) · 626e264d
      Ilpo Järvinen 提交于
      $ codiff tcp_ipv6.o.old tcp_ipv6.o.new
      net/ipv6/tcp_ipv6.c:
        tcp_v6_md5_hash_hdr | -144
        tcp_v6_send_ack     | -585
        tcp_v6_send_reset   | -540
       3 functions changed, 1269 bytes removed, diff: -1269
      
      net/ipv6/tcp_ipv6.c:
        tcp_v6_send_response | +791
       1 function changed, 791 bytes added, diff: +791
      
      tcp_ipv6.o.new:
       4 functions changed, 791 bytes added, 1269 bytes removed, diff: -478
      
      I choose to leave the reset related netns comment in place (not
      the one that is killed) as I cannot understand its English so
      it's a bit hard for me to evaluate its usefulness :-).
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      626e264d
    • I
      tcpv6: convert opt[] -> topt in tcp_v6_send_reset · 81ada62d
      Ilpo Järvinen 提交于
      after this I get:
      
      $ diff-funcs tcp_v6_send_reset tcp_ipv6.c tcp_ipv6.c tcp_v6_send_ack
       --- tcp_ipv6.c:tcp_v6_send_reset()
       +++ tcp_ipv6.c:tcp_v6_send_ack()
      @@ -1,4 +1,5 @@
      -static void tcp_v6_send_reset(struct sock *sk, struct sk_buff *skb)
      +static void tcp_v6_send_ack(struct sk_buff *skb, u32 seq, u32 ack, u32 win,
      u32 ts,
      +                           struct tcp_md5sig_key *key)
       {
              struct tcphdr *th = tcp_hdr(skb), *t1;
              struct sk_buff *buff;
      @@ -7,31 +8,14 @@
              struct sock *ctl_sk = net->ipv6.tcp_sk;
              unsigned int tot_len = sizeof(struct tcphdr);
              __be32 *topt;
      -#ifdef CONFIG_TCP_MD5SIG
      -       struct tcp_md5sig_key *key;
      -#endif
      -
      -       if (th->rst)
      -               return;
      -
      -       if (!ipv6_unicast_destination(skb))
      -               return;
      
      +       if (ts)
      +               tot_len += TCPOLEN_TSTAMP_ALIGNED;
       #ifdef CONFIG_TCP_MD5SIG
      -       if (sk)
      -               key = tcp_v6_md5_do_lookup(sk, &ipv6_hdr(skb)->daddr);
      -       else
      -               key = NULL;
      -
              if (key)
                      tot_len += TCPOLEN_MD5SIG_ALIGNED;
       #endif
      
      -       /*
      -        * We need to grab some memory, and put together an RST,
      -        * and then put it into the queue to be sent.
      -        */
      -
              buff = alloc_skb(MAX_HEADER + sizeof(struct ipv6hdr) + tot_len,
                               GFP_ATOMIC);
              if (buff == NULL)
      @@ -46,18 +30,20 @@
              t1->dest = th->source;
              t1->source = th->dest;
              t1->doff = tot_len / 4;
      -       t1->rst = 1;
      -
      -       if(th->ack) {
      -               t1->seq = th->ack_seq;
      -       } else {
      -               t1->ack = 1;
      -               t1->ack_seq = htonl(ntohl(th->seq) + th->syn + th->fin
      -                                   + skb->len - (th->doff<<2));
      -       }
      +       t1->seq = htonl(seq);
      +       t1->ack_seq = htonl(ack);
      +       t1->ack = 1;
      +       t1->window = htons(win);
      
              topt = (__be32 *)(t1 + 1);
      
      +       if (ts) {
      +               *topt++ = htonl((TCPOPT_NOP << 24) | (TCPOPT_NOP << 16) |
      +                               (TCPOPT_TIMESTAMP << 8) |
      TCPOLEN_TIMESTAMP);
      +               *topt++ = htonl(tcp_time_stamp);
      +               *topt++ = htonl(ts);
      +       }
      +
       #ifdef CONFIG_TCP_MD5SIG
              if (key) {
                      *topt++ = htonl((TCPOPT_NOP << 24) | (TCPOPT_NOP << 16) |
      @@ -84,15 +70,10 @@
              fl.fl_ip_sport = t1->source;
              security_skb_classify_flow(skb, &fl);
      
      -       /* Pass a socket to ip6_dst_lookup either it is for RST
      -        * Underlying function will use this to retrieve the network
      -        * namespace
      -        */
              if (!ip6_dst_lookup(ctl_sk, &buff->dst, &fl)) {
                      if (xfrm_lookup(&buff->dst, &fl, NULL, 0) >= 0) {
                              ip6_xmit(ctl_sk, buff, &fl, NULL, 0);
                              TCP_INC_STATS_BH(net, TCP_MIB_OUTSEGS);
      -                       TCP_INC_STATS_BH(net, TCP_MIB_OUTRSTS);
                              return;
                      }
              }
      
      
      ...which starts to be trivial to combine.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      81ada62d
    • I
    • I
      tcpv[46]: fix md5 pseudoheader address field ordering · 78e645cb
      Ilpo Järvinen 提交于
      Maybe it's just me but I guess those md5 people made a mess
      out of it by having *_md5_hash_* to use daddr, saddr order
      instead of the one that is natural (and equal to what csum
      functions use). For the segment were sending, the original
      addresses are reversed so buff's saddr == skb's daddr and
      vice-versa.
      
      Maybe I can finally proceed with unification of some code
      after fixing it first... :-)
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      78e645cb
  13. 09 10月, 2008 3 次提交
  14. 08 10月, 2008 1 次提交
  15. 01 10月, 2008 1 次提交
  16. 13 9月, 2008 1 次提交
  17. 09 9月, 2008 1 次提交
    • D
      netns : fix kernel panic in timewait socket destruction · d315492b
      Daniel Lezcano 提交于
      How to reproduce ?
       - create a network namespace
       - use tcp protocol and get timewait socket
       - exit the network namespace
       - after a moment (when the timewait socket is destroyed), the kernel
         panics.
      
      # BUG: unable to handle kernel NULL pointer dereference at
      0000000000000007
      IP: [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8
      PGD 119985067 PUD 11c5c0067 PMD 0
      Oops: 0000 [1] SMP
      CPU 1
      Modules linked in: ipv6 button battery ac loop dm_mod tg3 libphy ext3 jbd
      edd fan thermal processor thermal_sys sg sata_svw libata dock serverworks
      sd_mod scsi_mod ide_disk ide_core [last unloaded: freq_table]
      Pid: 0, comm: swapper Not tainted 2.6.27-rc2 #3
      RIP: 0010:[<ffffffff821e394d>] [<ffffffff821e394d>]
      inet_twdr_do_twkill_work+0x6e/0xb8
      RSP: 0018:ffff88011ff7fed0 EFLAGS: 00010246
      RAX: ffffffffffffffff RBX: ffffffff82339420 RCX: ffff88011ff7ff30
      RDX: 0000000000000001 RSI: ffff88011a4d03c0 RDI: ffff88011ac2fc00
      RBP: ffffffff823392e0 R08: 0000000000000000 R09: ffff88002802a200
      R10: ffff8800a5c4b000 R11: ffffffff823e4080 R12: ffff88011ac2fc00
      R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
      FS: 0000000041cbd940(0000) GS:ffff8800bff839c0(0000)
      knlGS:0000000000000000
      CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      CR2: 0000000000000007 CR3: 00000000bd87c000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process swapper (pid: 0, threadinfo ffff8800bff9e000, task
      ffff88011ff76690)
      Stack: ffffffff823392e0 0000000000000100 ffffffff821e3a3a
      0000000000000008
      0000000000000000 ffffffff821e3a61 ffff8800bff7c000 ffffffff8203c7e7
      ffff88011ff7ff10 ffff88011ff7ff10 0000000000000021 ffffffff82351108
      Call Trace:
      <IRQ> [<ffffffff821e3a3a>] ? inet_twdr_hangman+0x0/0x9e
      [<ffffffff821e3a61>] ? inet_twdr_hangman+0x27/0x9e
      [<ffffffff8203c7e7>] ? run_timer_softirq+0x12c/0x193
      [<ffffffff820390d1>] ? __do_softirq+0x5e/0xcd
      [<ffffffff8200d08c>] ? call_softirq+0x1c/0x28
      [<ffffffff8200e611>] ? do_softirq+0x2c/0x68
      [<ffffffff8201a055>] ? smp_apic_timer_interrupt+0x8e/0xa9
      [<ffffffff8200cad6>] ? apic_timer_interrupt+0x66/0x70
      <EOI> [<ffffffff82011f4c>] ? default_idle+0x27/0x3b
      [<ffffffff8200abbd>] ? cpu_idle+0x5f/0x7d
      
      
      Code: e8 01 00 00 4c 89 e7 41 ff c5 e8 8d fd ff ff 49 8b 44 24 38 4c 89 e7
      65 8b 14 25 24 00 00 00 89 d2 48 8b 80 e8 00 00 00 48 f7 d0 <48> 8b 04 d0
      48 ff 40 58 e8 fc fc ff ff 48 89 df e8 c0 5f 04 00
      RIP [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8
      RSP <ffff88011ff7fed0>
      CR2: 0000000000000007
      
      This patch provides a function to purge all timewait sockets related
      to a network namespace. The timewait sockets life cycle is not tied with
      the network namespace, that means the timewait sockets stay alive while
      the network namespace dies. The timewait sockets are for avoiding to
      receive a duplicate packet from the network, if the network namespace is
      freed, the network stack is removed, so no chance to receive any packets
      from the outside world. Furthermore, having a pending destruction timer
      on these sockets with a network namespace freed is not safe and will lead
      to an oops if the timer callback which try to access data belonging to 
      the namespace like for example in:
      	inet_twdr_do_twkill_work
      		-> NET_INC_STATS_BH(twsk_net(tw), LINUX_MIB_TIMEWAITED);
      
      Purging the timewait sockets at the network namespace destruction will:
       1) speed up memory freeing for the namespace
       2) fix kernel panic on asynchronous timewait destruction
      Signed-off-by: NDaniel Lezcano <dlezcano@fr.ibm.com>
      Acked-by: NDenis V. Lunev <den@openvz.org>
      Acked-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d315492b
  18. 07 8月, 2008 1 次提交
    • G
      tcp: Fix kernel panic when calling tcp_v(4/6)_md5_do_lookup · 6edafaaf
      Gui Jianfeng 提交于
      If the following packet flow happen, kernel will panic.
      MathineA			MathineB
      		SYN
      	---------------------->    
              	SYN+ACK
      	<----------------------
      		ACK(bad seq)
      	---------------------->
      When a bad seq ACK is received, tcp_v4_md5_do_lookup(skb->sk, ip_hdr(skb)->daddr))
      is finally called by tcp_v4_reqsk_send_ack(), but the first parameter(skb->sk) is 
      NULL at that moment, so kernel panic happens.
      This patch fixes this bug.
      
      OOPS output is as following:
      [  302.812793] IP: [<c05cfaa6>] tcp_v4_md5_do_lookup+0x12/0x42
      [  302.817075] Oops: 0000 [#1] SMP 
      [  302.819815] Modules linked in: ipv6 loop dm_multipath rtc_cmos rtc_core rtc_lib pcspkr pcnet32 mii i2c_piix4 parport_pc i2c_core parport ac button ata_piix libata dm_mod mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan]
      [  302.849946] 
      [  302.851198] Pid: 0, comm: swapper Not tainted (2.6.27-rc1-guijf #5)
      [  302.855184] EIP: 0060:[<c05cfaa6>] EFLAGS: 00010296 CPU: 0
      [  302.858296] EIP is at tcp_v4_md5_do_lookup+0x12/0x42
      [  302.861027] EAX: 0000001e EBX: 00000000 ECX: 00000046 EDX: 00000046
      [  302.864867] ESI: ceb69e00 EDI: 1467a8c0 EBP: cf75f180 ESP: c0792e54
      [  302.868333]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
      [  302.871287] Process swapper (pid: 0, ti=c0792000 task=c0712340 task.ti=c0746000)
      [  302.875592] Stack: c06f413a 00000000 cf75f180 ceb69e00 00000000 c05d0d86 000016d0 ceac5400 
      [  302.883275]        c05d28f8 000016d0 ceb69e00 ceb69e20 681bf6e3 00001000 00000000 0a67a8c0 
      [  302.890971]        ceac5400 c04250a3 c06f413a c0792eb0 c0792edc cf59a620 cf59a620 cf59a634 
      [  302.900140] Call Trace:
      [  302.902392]  [<c05d0d86>] tcp_v4_reqsk_send_ack+0x17/0x35
      [  302.907060]  [<c05d28f8>] tcp_check_req+0x156/0x372
      [  302.910082]  [<c04250a3>] printk+0x14/0x18
      [  302.912868]  [<c05d0aa1>] tcp_v4_do_rcv+0x1d3/0x2bf
      [  302.917423]  [<c05d26be>] tcp_v4_rcv+0x563/0x5b9
      [  302.920453]  [<c05bb20f>] ip_local_deliver_finish+0xe8/0x183
      [  302.923865]  [<c05bb10a>] ip_rcv_finish+0x286/0x2a3
      [  302.928569]  [<c059e438>] dev_alloc_skb+0x11/0x25
      [  302.931563]  [<c05a211f>] netif_receive_skb+0x2d6/0x33a
      [  302.934914]  [<d0917941>] pcnet32_poll+0x333/0x680 [pcnet32]
      [  302.938735]  [<c05a3b48>] net_rx_action+0x5c/0xfe
      [  302.941792]  [<c042856b>] __do_softirq+0x5d/0xc1
      [  302.944788]  [<c042850e>] __do_softirq+0x0/0xc1
      [  302.948999]  [<c040564b>] do_softirq+0x55/0x88
      [  302.951870]  [<c04501b1>] handle_fasteoi_irq+0x0/0xa4
      [  302.954986]  [<c04284da>] irq_exit+0x35/0x69
      [  302.959081]  [<c0405717>] do_IRQ+0x99/0xae
      [  302.961896]  [<c040422b>] common_interrupt+0x23/0x28
      [  302.966279]  [<c040819d>] default_idle+0x2a/0x3d
      [  302.969212]  [<c0402552>] cpu_idle+0xb2/0xd2
      [  302.972169]  =======================
      [  302.974274] Code: fc ff 84 d2 0f 84 df fd ff ff e9 34 fe ff ff 83 c4 0c 5b 5e 5f 5d c3 90 90 57 89 d7 56 53 89 c3 50 68 3a 41 6f c0 e8 e9 55 e5 ff <8b> 93 9c 04 00 00 58 85 d2 59 74 1e 8b 72 10 31 db 31 c9 85 f6 
      [  303.011610] EIP: [<c05cfaa6>] tcp_v4_md5_do_lookup+0x12/0x42 SS:ESP 0068:c0792e54
      [  303.018360] Kernel panic - not syncing: Fatal exception in interrupt
      Signed-off-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6edafaaf
  19. 01 8月, 2008 2 次提交
  20. 30 7月, 2008 1 次提交
  21. 26 7月, 2008 1 次提交
  22. 19 7月, 2008 1 次提交
  23. 17 7月, 2008 4 次提交
  24. 28 6月, 2008 1 次提交
  25. 18 6月, 2008 1 次提交
  26. 15 6月, 2008 1 次提交
  27. 12 6月, 2008 3 次提交