1. 19 8月, 2010 1 次提交
  2. 08 8月, 2010 1 次提交
  3. 03 8月, 2010 2 次提交
  4. 02 8月, 2010 4 次提交
  5. 31 7月, 2010 1 次提交
  6. 23 7月, 2010 4 次提交
  7. 22 7月, 2010 1 次提交
  8. 20 7月, 2010 1 次提交
  9. 16 7月, 2010 1 次提交
  10. 15 7月, 2010 1 次提交
  11. 13 7月, 2010 2 次提交
  12. 09 7月, 2010 1 次提交
    • S
      gre: propagate ipv6 transport class · dd4ba83d
      Stephen Hemminger 提交于
      This patch makes IPV6 over IPv4 GRE tunnel propagate the transport
      class field from the underlying IPV6 header to the IPV4 Type Of Service
      field. Without the patch, all IPV6 packets in tunnel look the same to QoS.
      
      This assumes that IPV6 transport class is exactly the same
      as IPv4 TOS. Not sure if that is always the case?  Maybe need
      to mask off some bits.
      
      The mask and shift to get tclass is copied from ipv6/datagram.c
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dd4ba83d
  13. 08 7月, 2010 1 次提交
  14. 06 7月, 2010 1 次提交
  15. 05 7月, 2010 3 次提交
  16. 01 7月, 2010 2 次提交
    • C
      fragment: add fast path for in-order fragments · d6bebca9
      Changli Gao 提交于
      add fast path for in-order fragments
      
      As the fragments are sent in order in most of OSes, such as Windows, Darwin and
      FreeBSD, it is likely the new fragments are at the end of the inet_frag_queue.
      In the fast path, we check if the skb at the end of the inet_frag_queue is the
      prev we expect.
      Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
      ----
       include/net/inet_frag.h |    1 +
       net/ipv4/ip_fragment.c  |   12 ++++++++++++
       net/ipv6/reassembly.c   |   11 +++++++++++
       3 files changed, 24 insertions(+)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6bebca9
    • E
      snmp: 64bit ipstats_mib for all arches · 4ce3c183
      Eric Dumazet 提交于
      /proc/net/snmp and /proc/net/netstat expose SNMP counters.
      
      Width of these counters is either 32 or 64 bits, depending on the size
      of "unsigned long" in kernel.
      
      This means user program parsing these files must already be prepared to
      deal with 64bit values, regardless of user program being 32 or 64 bit.
      
      This patch introduces 64bit snmp values for IPSTAT mib, where some
      counters can wrap pretty fast if they are 32bit wide.
      
      # netstat -s|egrep "InOctets|OutOctets"
          InOctets: 244068329096
          OutOctets: 244069348848
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ce3c183
  17. 29 6月, 2010 2 次提交
  18. 28 6月, 2010 2 次提交
  19. 27 6月, 2010 2 次提交
  20. 26 6月, 2010 2 次提交
  21. 25 6月, 2010 1 次提交
    • K
      tcp: do not send reset to already closed sockets · 565b7b2d
      Konstantin Khorenko 提交于
      i've found that tcp_close() can be called for an already closed
      socket, but still sends reset in this case (tcp_send_active_reset())
      which seems to be incorrect.  Moreover, a packet with reset is sent
      with different source port as original port number has been already
      cleared on socket.  Besides that incrementing stat counter for
      LINUX_MIB_TCPABORTONCLOSE also does not look correct in this case.
      
      Initially this issue was found on 2.6.18-x RHEL5 kernel, but the same
      seems to be true for the current mainstream kernel (checked on
      2.6.35-rc3).  Please, correct me if i missed something.
      
      How that happens:
      
      1) the server receives a packet for socket in TCP_CLOSE_WAIT state
         that triggers a tcp_reset():
      
      Call Trace:
       <IRQ>  [<ffffffff8025b9b9>] tcp_reset+0x12f/0x1e8
       [<ffffffff80046125>] tcp_rcv_state_process+0x1c0/0xa08
       [<ffffffff8003eb22>] tcp_v4_do_rcv+0x310/0x37a
       [<ffffffff80028bea>] tcp_v4_rcv+0x74d/0xb43
       [<ffffffff8024ef4c>] ip_local_deliver_finish+0x0/0x259
       [<ffffffff80037131>] ip_local_deliver+0x200/0x2f4
       [<ffffffff8003843c>] ip_rcv+0x64c/0x69f
       [<ffffffff80021d89>] netif_receive_skb+0x4c4/0x4fa
       [<ffffffff80032eca>] process_backlog+0x90/0xec
       [<ffffffff8000cc50>] net_rx_action+0xbb/0x1f1
       [<ffffffff80012d3a>] __do_softirq+0xf5/0x1ce
       [<ffffffff8001147a>] handle_IRQ_event+0x56/0xb0
       [<ffffffff8006334c>] call_softirq+0x1c/0x28
       [<ffffffff80070476>] do_softirq+0x2c/0x85
       [<ffffffff80070441>] do_IRQ+0x149/0x152
       [<ffffffff80062665>] ret_from_intr+0x0/0xa
       <EOI>  [<ffffffff80008a2e>] __handle_mm_fault+0x6cd/0x1303
       [<ffffffff80008903>] __handle_mm_fault+0x5a2/0x1303
       [<ffffffff80033a9d>] cache_free_debugcheck+0x21f/0x22e
       [<ffffffff8006a263>] do_page_fault+0x49a/0x7dc
       [<ffffffff80066487>] thread_return+0x89/0x174
       [<ffffffff800c5aee>] audit_syscall_exit+0x341/0x35c
       [<ffffffff80062e39>] error_exit+0x0/0x84
      
      tcp_rcv_state_process()
      ...  // (sk_state == TCP_CLOSE_WAIT here)
      ...
              /* step 2: check RST bit */
              if(th->rst) {
                      tcp_reset(sk);
                      goto discard;
              }
      ...
      ---------------------------------
      tcp_rcv_state_process
       tcp_reset
        tcp_done
         tcp_set_state(sk, TCP_CLOSE);
           inet_put_port
            __inet_put_port
             inet_sk(sk)->num = 0;
      
         sk->sk_shutdown = SHUTDOWN_MASK;
      
      2) After that the process (socket owner) tries to write something to
         that socket and "inet_autobind" sets a _new_ (which differs from
         the original!) port number for the socket:
      
       Call Trace:
        [<ffffffff80255a12>] inet_bind_hash+0x33/0x5f
        [<ffffffff80257180>] inet_csk_get_port+0x216/0x268
        [<ffffffff8026bcc9>] inet_autobind+0x22/0x8f
        [<ffffffff80049140>] inet_sendmsg+0x27/0x57
        [<ffffffff8003a9d9>] do_sock_write+0xae/0xea
        [<ffffffff80226ac7>] sock_writev+0xdc/0xf6
        [<ffffffff800680c7>] _spin_lock_irqsave+0x9/0xe
        [<ffffffff8001fb49>] __pollwait+0x0/0xdd
        [<ffffffff8008d533>] default_wake_function+0x0/0xe
        [<ffffffff800a4f10>] autoremove_wake_function+0x0/0x2e
        [<ffffffff800f0b49>] do_readv_writev+0x163/0x274
        [<ffffffff80066538>] thread_return+0x13a/0x174
        [<ffffffff800145d8>] tcp_poll+0x0/0x1c9
        [<ffffffff800c56d3>] audit_syscall_entry+0x180/0x1b3
        [<ffffffff800f0dd0>] sys_writev+0x49/0xe4
        [<ffffffff800622dd>] tracesys+0xd5/0xe0
      
      3) sendmsg fails at last with -EPIPE (=> 'write' returns -EPIPE in userspace):
      
      F: tcp_sendmsg1 -EPIPE: sk=ffff81000bda00d0, sport=49847, old_state=7, new_state=7, sk_err=0, sk_shutdown=3
      
      Call Trace:
       [<ffffffff80027557>] tcp_sendmsg+0xcb/0xe87
       [<ffffffff80033300>] release_sock+0x10/0xae
       [<ffffffff8016f20f>] vgacon_cursor+0x0/0x1a7
       [<ffffffff8026bd32>] inet_autobind+0x8b/0x8f
       [<ffffffff8003a9d9>] do_sock_write+0xae/0xea
       [<ffffffff80226ac7>] sock_writev+0xdc/0xf6
       [<ffffffff800680c7>] _spin_lock_irqsave+0x9/0xe
       [<ffffffff8001fb49>] __pollwait+0x0/0xdd
       [<ffffffff8008d533>] default_wake_function+0x0/0xe
       [<ffffffff800a4f10>] autoremove_wake_function+0x0/0x2e
       [<ffffffff800f0b49>] do_readv_writev+0x163/0x274
       [<ffffffff80066538>] thread_return+0x13a/0x174
       [<ffffffff800145d8>] tcp_poll+0x0/0x1c9
       [<ffffffff800c56d3>] audit_syscall_entry+0x180/0x1b3
       [<ffffffff800f0dd0>] sys_writev+0x49/0xe4
       [<ffffffff800622dd>] tracesys+0xd5/0xe0
      
      tcp_sendmsg()
      ...
              /* Wait for a connection to finish. */
              if ((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) {
                      int old_state = sk->sk_state;
                      if ((err = sk_stream_wait_connect(sk, &timeo)) != 0) {
      if (f_d && (err == -EPIPE)) {
              printk("F: tcp_sendmsg1 -EPIPE: sk=%p, sport=%u, old_state=%d, new_state=%d, "
                      "sk_err=%d, sk_shutdown=%d\n",
                      sk, ntohs(inet_sk(sk)->sport), old_state, sk->sk_state,
                      sk->sk_err, sk->sk_shutdown);
              dump_stack();
      }
                              goto out_err;
                      }
              }
      ...
      
      4) Then the process (socket owner) understands that it's time to close
         that socket and does that (and thus triggers sending reset packet):
      
      Call Trace:
      ...
       [<ffffffff80032077>] dev_queue_xmit+0x343/0x3d6
       [<ffffffff80034698>] ip_output+0x351/0x384
       [<ffffffff80251ae9>] dst_output+0x0/0xe
       [<ffffffff80036ec6>] ip_queue_xmit+0x567/0x5d2
       [<ffffffff80095700>] vprintk+0x21/0x33
       [<ffffffff800070f0>] check_poison_obj+0x2e/0x206
       [<ffffffff80013587>] poison_obj+0x36/0x45
       [<ffffffff8025dea6>] tcp_send_active_reset+0x15/0x14d
       [<ffffffff80023481>] dbg_redzone1+0x1c/0x25
       [<ffffffff8025dea6>] tcp_send_active_reset+0x15/0x14d
       [<ffffffff8000ca94>] cache_alloc_debugcheck_after+0x189/0x1c8
       [<ffffffff80023405>] tcp_transmit_skb+0x764/0x786
       [<ffffffff8025df8a>] tcp_send_active_reset+0xf9/0x14d
       [<ffffffff80258ff1>] tcp_close+0x39a/0x960
       [<ffffffff8026be12>] inet_release+0x69/0x80
       [<ffffffff80059b31>] sock_release+0x4f/0xcf
       [<ffffffff80059d4c>] sock_close+0x2c/0x30
       [<ffffffff800133c9>] __fput+0xac/0x197
       [<ffffffff800252bc>] filp_close+0x59/0x61
       [<ffffffff8001eff6>] sys_close+0x85/0xc7
       [<ffffffff800622dd>] tracesys+0xd5/0xe0
      
      So, in brief:
      
      * a received packet for socket in TCP_CLOSE_WAIT state triggers
        tcp_reset() which clears inet_sk(sk)->num and put socket into
        TCP_CLOSE state
      
      * an attempt to write to that socket forces inet_autobind() to get a
        new port (but the write itself fails with -EPIPE)
      
      * tcp_close() called for socket in TCP_CLOSE state sends an active
        reset via socket with newly allocated port
      
      This adds an additional check in tcp_close() for already closed
      sockets. We do not want to send anything to closed sockets.
      Signed-off-by: NKonstantin Khorenko <khorenko@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      565b7b2d
  22. 24 6月, 2010 1 次提交
  23. 22 6月, 2010 1 次提交
  24. 17 6月, 2010 2 次提交
    • P
      netfilter: nf_nat: support user-specified SNAT rules in LOCAL_IN · c68cd6cc
      Patrick McHardy 提交于
      2.6.34 introduced 'conntrack zones' to deal with cases where packets
      from multiple identical networks are handled by conntrack/NAT. Packets
      are looped through veth devices, during which they are NATed to private
      addresses, after which they can continue normally through the stack
      and possibly have NAT rules applied a second time.
      
      This works well, but is needlessly complicated for cases where only
      a single SNAT/DNAT mapping needs to be applied to these packets. In that
      case, all that needs to be done is to assign each network to a seperate
      zone and perform NAT as usual. However this doesn't work for packets
      destined for the machine performing NAT itself since its corrently not
      possible to configure SNAT mappings for the LOCAL_IN chain.
      
      This patch adds a new INPUT chain to the NAT table and changes the
      targets performing SNAT to be usable in that chain.
      
      Example usage with two identical networks (192.168.0.0/24) on eth0/eth1:
      
      iptables -t raw -A PREROUTING -i eth0 -j CT --zone 1
      iptables -t raw -A PREROUTING -i eth0 -j MARK --set-mark 1
      iptables -t raw -A PREROUTING -i eth1 -j CT --zone 2
      iptabels -t raw -A PREROUTING -i eth1 -j MARK --set-mark 2
      
      iptables -t nat -A INPUT       -m mark --mark 1 -j NETMAP --to 10.0.0.0/24
      iptables -t nat -A POSTROUTING -m mark --mark 1 -j NETMAP --to 10.0.0.0/24
      iptables -t nat -A INPUT       -m mark --mark 2 -j NETMAP --to 10.0.1.0/24
      iptables -t nat -A POSTROUTING -m mark --mark 2 -j NETMAP --to 10.0.1.0/24
      
      iptables -t raw -A PREROUTING -d 10.0.0.0/24 -j CT --zone 1
      iptables -t raw -A OUTPUT     -d 10.0.0.0/24 -j CT --zone 1
      iptables -t raw -A PREROUTING -d 10.0.1.0/24 -j CT --zone 2
      iptables -t raw -A OUTPUT     -d 10.0.1.0/24 -j CT --zone 2
      
      iptables -t nat -A PREROUTING -d 10.0.0.0/24 -j NETMAP --to 192.168.0.0/24
      iptables -t nat -A OUTPUT     -d 10.0.0.0/24 -j NETMAP --to 192.168.0.0/24
      iptables -t nat -A PREROUTING -d 10.0.1.0/24 -j NETMAP --to 192.168.0.0/24
      iptables -t nat -A OUTPUT     -d 10.0.1.0/24 -j NETMAP --to 192.168.0.0/24
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      c68cd6cc
    • F
      syncookies: check decoded options against sysctl settings · 8c763681
      Florian Westphal 提交于
      Discard the ACK if we find options that do not match current sysctl
      settings.
      
      Previously it was possible to create a connection with sack, wscale,
      etc. enabled even if the feature was disabled via sysctl.
      
      Also remove an unneeded call to tcp_sack_reset() in
      cookie_check_timestamp: Both call sites (cookie_v4_check,
      cookie_v6_check) zero "struct tcp_options_received", hand it to
      tcp_parse_options() (which does not change tcp_opt->num_sacks/dsack)
      and then call cookie_check_timestamp().
      
      Even if num_sacks/dsacks were changed, the structure is allocated on
      the stack and after cookie_check_timestamp returns only a few selected
      members are copied to the inet_request_sock.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c763681