1. 08 10月, 2009 1 次提交
  2. 07 10月, 2009 2 次提交
    • S
      net: mark net_proto_ops as const · ec1b4cf7
      Stephen Hemminger 提交于
      All usages of structure net_proto_ops should be declared const.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec1b4cf7
    • I
      add vif using local interface index instead of IP · ee5e81f0
      Ilia K 提交于
      When routing daemon wants to enable forwarding of multicast traffic it
      performs something like:
      
             struct vifctl vc = {
                     .vifc_vifi  = 1,
                     .vifc_flags = 0,
                     .vifc_threshold = 1,
                     .vifc_rate_limit = 0,
                     .vifc_lcl_addr = ip, /* <--- ip address of physical
      interface, e.g. eth0 */
                     .vifc_rmt_addr.s_addr = htonl(INADDR_ANY),
               };
             setsockopt(fd, IPPROTO_IP, MRT_ADD_VIF, &vc, sizeof(vc));
      
      This leads (in the kernel) to calling  vif_add() function call which
      search the (physical) device using assigned IP address:
             dev = ip_dev_find(net, vifc->vifc_lcl_addr.s_addr);
      
      The current API (struct vifctl) does not allow to specify an
      interface other way than using it's IP, and if there are more than a
      single interface with specified IP only the first one will be found.
      
      The attached patch (against 2.6.30.4) allows to specify an interface
      by its index, instead of IP address:
      
             struct vifctl vc = {
                     .vifc_vifi  = 1,
                     .vifc_flags = VIFF_USE_IFINDEX,   /* NEW */
                     .vifc_threshold = 1,
                     .vifc_rate_limit = 0,
                     .vifc_lcl_ifindex = if_nametoindex("eth0"),   /* NEW */
                     .vifc_rmt_addr.s_addr = htonl(INADDR_ANY),
               };
             setsockopt(fd, IPPROTO_IP, MRT_ADD_VIF, &vc, sizeof(vc));
      Signed-off-by: NIlia K. <mail4ilia@gmail.com>
      
      === modified file 'include/linux/mroute.h'
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee5e81f0
  3. 05 10月, 2009 3 次提交
  4. 03 10月, 2009 1 次提交
  5. 02 10月, 2009 3 次提交
    • A
      net: Use sk_mark for routing lookup in more places · 914a9ab3
      Atis Elsts 提交于
      This patch against v2.6.31 adds support for route lookup using sk_mark in some 
      more places. The benefits from this patch are the following.
      First, SO_MARK option now has effect on UDP sockets too.
      Second, ip_queue_xmit() and inet_sk_rebuild_header() could fail to do routing 
      lookup correctly if TCP sockets with SO_MARK were used.
      Signed-off-by: NAtis Elsts <atis@mikrotik.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      914a9ab3
    • O
      IPv4 TCP fails to send window scale option when window scale is zero · 89e95a61
      Ori Finkelman 提交于
      Acknowledge TCP window scale support by inserting the proper option in SYN/ACK
      and SYN headers even if our window scale is zero.
      
      This fixes the following observed behavior:
      
      1. Client sends a SYN with TCP window scaling option and non zero window scale
      value to a Linux box.
      2. Linux box notes large receive window from client.
      3. Linux decides on a zero value of window scale for its part.
      4. Due to compare against requested window scale size option, Linux does not to
       send windows scale TCP option header on SYN/ACK at all.
      
      With the following result:
      
      Client box thinks TCP window scaling is not supported, since SYN/ACK had no
      TCP window scale option, while Linux thinks that TCP window scaling is
      supported (and scale might be non zero), since SYN had  TCP window scale
      option and we have a mismatched idea between the client and server
      regarding window sizes.
      
      Probably it also fixes up the following bug (not observed in practice):
      
      1. Linux box opens TCP connection to some server.
      2. Linux decides on zero value of window scale.
      3. Due to compare against computed window scale size option, Linux does
      not to set windows scale TCP  option header on SYN.
      
      With the expected result that the server OS does not use window scale option
      due to not receiving such an option in the SYN headers, leading to suboptimal
      performance.
      Signed-off-by: NGilad Ben-Yossef <gilad@codefidence.com>
      Signed-off-by: NOri Finkelman <ori@comsleep.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89e95a61
    • A
      net/ipv4/tcp.c: fix min() type mismatch warning · 4fdb78d3
      Andrew Morton 提交于
      net/ipv4/tcp.c: In function 'do_tcp_setsockopt':
      net/ipv4/tcp.c:2050: warning: comparison of distinct pointer types lacks a cast
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4fdb78d3
  6. 01 10月, 2009 1 次提交
  7. 25 9月, 2009 2 次提交
  8. 24 9月, 2009 1 次提交
  9. 22 9月, 2009 1 次提交
  10. 16 9月, 2009 1 次提交
    • R
      tcp: fix CONFIG_TCP_MD5SIG + CONFIG_PREEMPT timer BUG() · 657e9649
      Robert Varga 提交于
      I have recently came across a preemption imbalance detected by:
      
      <4>huh, entered ffffffff80644630 with preempt_count 00000102, exited with 00000101?
      <0>------------[ cut here ]------------
      <2>kernel BUG at /usr/src/linux/kernel/timer.c:664!
      <0>invalid opcode: 0000 [1] PREEMPT SMP
      
      with ffffffff80644630 being inet_twdr_hangman().
      
      This appeared after I enabled CONFIG_TCP_MD5SIG and played with it a
      bit, so I looked at what might have caused it.
      
      One thing that struck me as strange is tcp_twsk_destructor(), as it
      calls tcp_put_md5sig_pool() -- which entails a put_cpu(), causing the
      detected imbalance. Found on 2.6.23.9, but 2.6.31 is affected as well,
      as far as I can tell.
      Signed-off-by: NRobert Varga <nite@hq.alert.sk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      657e9649
  11. 15 9月, 2009 3 次提交
  12. 09 9月, 2009 1 次提交
  13. 03 9月, 2009 2 次提交
    • W
      tcp: replace hard coded GFP_KERNEL with sk_allocation · aa133076
      Wu Fengguang 提交于
      This fixed a lockdep warning which appeared when doing stress
      memory tests over NFS:
      
      	inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
      
      	page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock
      
      	mount_root => nfs_root_data => tcp_close => lock sk_lock =>
      			tcp_send_fin => alloc_skb_fclone => page reclaim
      
      David raised a concern that if the allocation fails in tcp_send_fin(), and it's
      GFP_ATOMIC, we are going to yield() (which sleeps) and loop endlessly waiting
      for the allocation to succeed.
      
      But fact is, the original GFP_KERNEL also sleeps. GFP_ATOMIC+yield() looks
      weird, but it is no worse the implicit sleep inside GFP_KERNEL. Both could
      loop endlessly under memory pressure.
      
      CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      CC: David S. Miller <davem@davemloft.net>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa133076
    • E
      ip: Report qdisc packet drops · 6ce9e7b5
      Eric Dumazet 提交于
      Christoph Lameter pointed out that packet drops at qdisc level where not
      accounted in SNMP counters. Only if application sets IP_RECVERR, drops
      are reported to user (-ENOBUFS errors) and SNMP counters updated.
      
      IP_RECVERR is used to enable extended reliable error message passing,
      but these are not needed to update system wide SNMP stats.
      
      This patch changes things a bit to allow SNMP counters to be updated,
      regardless of IP_RECVERR being set or not on the socket.
      
      Example after an UDP tx flood
      # netstat -s 
      ...
      IP:
          1487048 outgoing packets dropped
      ...
      Udp:
      ...
          SndbufErrors: 1487048
      
      
      send() syscalls, do however still return an OK status, to not
      break applications.
      
      Note : send() manual page explicitly says for -ENOBUFS error :
      
       "The output queue for a network interface was full.
        This generally indicates that the interface has stopped sending,
        but may be caused by transient congestion.
        (Normally, this does not occur in Linux. Packets are just silently
        dropped when a device queue overflows.) "
      
      This is not true for IP_RECVERR enabled sockets : a send() syscall
      that hit a qdisc drop returns an ENOBUFS error.
      
      Many thanks to Christoph, David, and last but not least, Alexey !
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ce9e7b5
  14. 02 9月, 2009 3 次提交
  15. 01 9月, 2009 4 次提交
  16. 29 8月, 2009 6 次提交
    • J
      tcp: Remove redundant copy of MD5 authentication key · 9a7030b7
      John Dykstra 提交于
      Remove the copy of the MD5 authentication key from tcp_check_req().
      This key has already been copied by tcp_v4_syn_recv_sock() or
      tcp_v6_syn_recv_sock().
      Signed-off-by: NJohn Dykstra <john.dykstra1@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a7030b7
    • O
      tcp: fix premature termination of FIN_WAIT2 time-wait sockets · 80a1096b
      Octavian Purdila 提交于
      There is a race condition in the time-wait sockets code that can lead
      to premature termination of FIN_WAIT2 and, subsequently, to RST
      generation when the FIN,ACK from the peer finally arrives:
      
      Time     TCP header
      0.000000 30755 > http [SYN] Seq=0 Win=2920 Len=0 MSS=1460 TSV=282912 TSER=0
      0.000008 http > 30755 aSYN, ACK] Seq=0 Ack=1 Win=2896 Len=0 MSS=1460 TSV=...
      0.136899 HEAD /1b.html?n1Lg=v1 HTTP/1.0 [Packet size limited during capture]
      0.136934 HTTP/1.0 200 OK [Packet size limited during capture]
      0.136945 http > 30755 [FIN, ACK] Seq=187 Ack=207 Win=2690 Len=0 TSV=270521...
      0.136974 30755 > http [ACK] Seq=207 Ack=187 Win=2734 Len=0 TSV=283049 TSER=...
      0.177983 30755 > http [ACK] Seq=207 Ack=188 Win=2733 Len=0 TSV=283089 TSER=...
      0.238618 30755 > http [FIN, ACK] Seq=207 Ack=188 Win=2733 Len=0 TSV=283151...
      0.238625 http > 30755 [RST] Seq=188 Win=0 Len=0
      
      Say twdr->slot = 1 and we are running inet_twdr_hangman and in this
      instance inet_twdr_do_twkill_work returns 1. At that point we will
      mark slot 1 and schedule inet_twdr_twkill_work. We will also make
      twdr->slot = 2.
      
      Next, a connection is closed and tcp_time_wait(TCP_FIN_WAIT2, timeo)
      is called which will create a new FIN_WAIT2 time-wait socket and will
      place it in the last to be reached slot, i.e. twdr->slot = 1.
      
      At this point say inet_twdr_twkill_work will run which will start
      destroying the time-wait sockets in slot 1, including the just added
      TCP_FIN_WAIT2 one.
      
      To avoid this issue we increment the slot only if all entries in the
      slot have been purged.
      
      This change may delay the slots cleanup by a time-wait death row
      period but only if the worker thread didn't had the time to run/purge
      the current slot in the next period (6 seconds with default sysctl
      settings). However, on such a busy system even without this change we
      would probably see delays...
      Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80a1096b
    • J
      fib_trie: resize rework · 80b71b80
      Jens Låås 提交于
      Here is rework and cleanup of the resize function.
      
      Some bugs we had. We were using ->parent when we should use 
      node_parent(). Also we used ->parent which is not assigned by
      inflate in inflate loop.
      
      Also a fix to set thresholds to power 2 to fit halve 
      and double strategy.
      
      max_resize is renamed to max_work which better indicates
      it's function.
      
      Reaching max_work is not an error, so warning is removed. 
      max_work only limits amount of work done per resize.
      (limits CPU-usage, outstanding memory etc).
      
      The clean-up makes it relatively easy to add fixed sized 
      root-nodes if we would like to decrease the memory pressure
      on routers with large routing tables and dynamic routing.
      If we'll need that...
      
      Its been tested with 280k routes.
      
      Work done together with Robert Olsson.
      Signed-off-by: NJens Låås <jens.laas@its.uu.se>
      Signed-off-by: NRobert Olsson <robert.olsson@its.uu.se>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80b71b80
    • E
      net: ip_rt_send_redirect() optimization · 30038fc6
      Eric Dumazet 提交于
      While doing some forwarding benchmarks, I noticed
      ip_rt_send_redirect() is rather expensive, even if send_redirects is
      false for the device.
      
      Fix is to avoid two atomic ops, we dont really need to take a
      reference on in_dev
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30038fc6
    • E
      tcp: keepalive cleanups · df19a626
      Eric Dumazet 提交于
      Introduce keepalive_probes(tp) helper, and use it, like 
      keepalive_time_when(tp) and keepalive_intvl_when(tp)
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df19a626
    • E
      ipv4: af_inet.c cleanups · 3d1427f8
      Eric Dumazet 提交于
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3d1427f8
  17. 28 8月, 2009 1 次提交
  18. 25 8月, 2009 3 次提交
  19. 24 8月, 2009 1 次提交