1. 09 9月, 2010 2 次提交
    • J
      net: blackhole route should always be recalculated · ae2688d5
      Jianzhao Wang 提交于
      Blackhole routes are used when xfrm_lookup() returns -EREMOTE (error
      triggered by IKE for example), hence this kind of route is always
      temporary and so we should check if a better route exists for next
      packets.
      Bug has been introduced by commit d11a4dc1.
      Signed-off-by: NJianzhao Wang <jianzhao.wang@6wind.com>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae2688d5
    • J
      ipv4: Suppress lockdep-RCU false positive in FIB trie (3) · f6b085b6
      Jarek Poplawski 提交于
      Hi,
      Here is one more of these warnings and a patch below:
      
      Sep  5 23:52:33 del kernel: [46044.244833] ===================================================
      Sep  5 23:52:33 del kernel: [46044.269681] [ INFO: suspicious rcu_dereference_check() usage. ]
      Sep  5 23:52:33 del kernel: [46044.277000] ---------------------------------------------------
      Sep  5 23:52:33 del kernel: [46044.285185] net/ipv4/fib_trie.c:1756 invoked rcu_dereference_check() without protection!
      Sep  5 23:52:33 del kernel: [46044.293627]
      Sep  5 23:52:33 del kernel: [46044.293632] other info that might help us debug this:
      Sep  5 23:52:33 del kernel: [46044.293634]
      Sep  5 23:52:33 del kernel: [46044.325333]
      Sep  5 23:52:33 del kernel: [46044.325335] rcu_scheduler_active = 1, debug_locks = 0
      Sep  5 23:52:33 del kernel: [46044.348013] 1 lock held by pppd/1717:
      Sep  5 23:52:33 del kernel: [46044.357548]  #0:  (rtnl_mutex){+.+.+.}, at: [<c125dc1f>] rtnl_lock+0xf/0x20
      Sep  5 23:52:33 del kernel: [46044.367647]
      Sep  5 23:52:33 del kernel: [46044.367652] stack backtrace:
      Sep  5 23:52:33 del kernel: [46044.387429] Pid: 1717, comm: pppd Not tainted 2.6.35.4.4a #3
      Sep  5 23:52:33 del kernel: [46044.398764] Call Trace:
      Sep  5 23:52:33 del kernel: [46044.409596]  [<c12f9aba>] ? printk+0x18/0x1e
      Sep  5 23:52:33 del kernel: [46044.420761]  [<c1053969>] lockdep_rcu_dereference+0xa9/0xb0
      Sep  5 23:52:33 del kernel: [46044.432229]  [<c12b7235>] trie_firstleaf+0x65/0x70
      Sep  5 23:52:33 del kernel: [46044.443941]  [<c12b74d4>] fib_table_flush+0x14/0x170
      Sep  5 23:52:33 del kernel: [46044.455823]  [<c1033e92>] ? local_bh_enable_ip+0x62/0xd0
      Sep  5 23:52:33 del kernel: [46044.467995]  [<c12fc39f>] ? _raw_spin_unlock_bh+0x2f/0x40
      Sep  5 23:52:33 del kernel: [46044.480404]  [<c12b24d0>] ? fib_sync_down_dev+0x120/0x180
      Sep  5 23:52:33 del kernel: [46044.493025]  [<c12b069d>] fib_flush+0x2d/0x60
      Sep  5 23:52:33 del kernel: [46044.505796]  [<c12b06f5>] fib_disable_ip+0x25/0x50
      Sep  5 23:52:33 del kernel: [46044.518772]  [<c12b10d3>] fib_netdev_event+0x73/0xd0
      Sep  5 23:52:33 del kernel: [46044.531918]  [<c1048dfd>] notifier_call_chain+0x2d/0x70
      Sep  5 23:52:33 del kernel: [46044.545358]  [<c1048f0a>] raw_notifier_call_chain+0x1a/0x20
      Sep  5 23:52:33 del kernel: [46044.559092]  [<c124f687>] call_netdevice_notifiers+0x27/0x60
      Sep  5 23:52:33 del kernel: [46044.573037]  [<c124faec>] __dev_notify_flags+0x5c/0x80
      Sep  5 23:52:33 del kernel: [46044.586489]  [<c124fb47>] dev_change_flags+0x37/0x60
      Sep  5 23:52:33 del kernel: [46044.599394]  [<c12a8a8d>] devinet_ioctl+0x54d/0x630
      Sep  5 23:52:33 del kernel: [46044.612277]  [<c12aabb7>] inet_ioctl+0x97/0xc0
      Sep  5 23:52:34 del kernel: [46044.625208]  [<c123f6af>] sock_ioctl+0x6f/0x270
      Sep  5 23:52:34 del kernel: [46044.638046]  [<c109d2b0>] ? handle_mm_fault+0x420/0x6c0
      Sep  5 23:52:34 del kernel: [46044.650968]  [<c123f640>] ? sock_ioctl+0x0/0x270
      Sep  5 23:52:34 del kernel: [46044.663865]  [<c10c3188>] vfs_ioctl+0x28/0xa0
      Sep  5 23:52:34 del kernel: [46044.676556]  [<c10c38fa>] do_vfs_ioctl+0x6a/0x5c0
      Sep  5 23:52:34 del kernel: [46044.688989]  [<c1048676>] ? up_read+0x16/0x30
      Sep  5 23:52:34 del kernel: [46044.701411]  [<c1021376>] ? do_page_fault+0x1d6/0x3a0
      Sep  5 23:52:34 del kernel: [46044.714223]  [<c10b6588>] ? fget_light+0xf8/0x2f0
      Sep  5 23:52:34 del kernel: [46044.726601]  [<c1241f98>] ? sys_socketcall+0x208/0x2c0
      Sep  5 23:52:34 del kernel: [46044.739140]  [<c10c3eb3>] sys_ioctl+0x63/0x70
      Sep  5 23:52:34 del kernel: [46044.751967]  [<c12fca3d>] syscall_call+0x7/0xb
      Sep  5 23:52:34 del kernel: [46044.764734]  [<c12f0000>] ? cookie_v6_check+0x3d0/0x630
      
      -------------->
      
      This patch fixes the warning:
       ===================================================
       [ INFO: suspicious rcu_dereference_check() usage. ]
       ---------------------------------------------------
       net/ipv4/fib_trie.c:1756 invoked rcu_dereference_check() without protection!
      
       other info that might help us debug this:
      
       rcu_scheduler_active = 1, debug_locks = 0
       1 lock held by pppd/1717:
        #0:  (rtnl_mutex){+.+.+.}, at: [<c125dc1f>] rtnl_lock+0xf/0x20
      
       stack backtrace:
       Pid: 1717, comm: pppd Not tainted 2.6.35.4a #3
       Call Trace:
        [<c12f9aba>] ? printk+0x18/0x1e
        [<c1053969>] lockdep_rcu_dereference+0xa9/0xb0
        [<c12b7235>] trie_firstleaf+0x65/0x70
        [<c12b74d4>] fib_table_flush+0x14/0x170
        ...
      
      Allow trie_firstleaf() to be called either under rcu_read_lock()
      protection or with RTNL held. The same annotation is added to
      node_parent_rcu() to prevent a similar warning a bit later.
      
      Followup of commits 634a4b20 and 4eaa0e3c.
      Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f6b085b6
  2. 08 9月, 2010 1 次提交
  3. 02 9月, 2010 1 次提交
  4. 28 8月, 2010 1 次提交
    • J
      net/ipv4: Eliminate kstrdup memory leak · c34186ed
      Julia Lawall 提交于
      The string clone is only used as a temporary copy of the argument val
      within the while loop, and so it should be freed before leaving the
      function.  The call to strsep, however, modifies clone, so a pointer to the
      front of the string is kept in saved_clone, to make it possible to free it.
      
      The sematic match that finds this problem is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @r exists@
      local idexpression x;
      expression E;
      identifier l;
      statement S;
      @@
      
      *x= \(kasprintf\|kstrdup\)(...);
      ...
      if (x == NULL) S
      ... when != kfree(x)
          when != E = x
      if (...) {
        <... when != kfree(x)
      * goto l;
        ...>
      * return ...;
      }
      // </smpl>
      Signed-off-by: NJulia Lawall <julia@diku.dk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c34186ed
  5. 26 8月, 2010 2 次提交
    • K
      tcp: select(writefds) don't hang up when a peer close connection · d84ba638
      KOSAKI Motohiro 提交于
      This issue come from ruby language community. Below test program
      hang up when only run on Linux.
      
      	% uname -mrsv
      	Linux 2.6.26-2-486 #1 Sat Dec 26 08:37:39 UTC 2009 i686
      	% ruby -rsocket -ve '
      	BasicSocket.do_not_reverse_lookup = true
      	serv = TCPServer.open("127.0.0.1", 0)
      	s1 = TCPSocket.open("127.0.0.1", serv.addr[1])
      	s2 = serv.accept
      	s2.close
      	s1.write("a") rescue p $!
      	s1.write("a") rescue p $!
      	Thread.new {
      	  s1.write("a")
      	}.join'
      	ruby 1.9.3dev (2010-07-06 trunk 28554) [i686-linux]
      	#<Errno::EPIPE: Broken pipe>
      	[Hang Here]
      
      FreeBSD, Solaris, Mac doesn't. because Ruby's write() method call
      select() internally. and tcp_poll has a bug.
      
      SUS defined 'ready for writing' of select() as following.
      
      |  A descriptor shall be considered ready for writing when a call to an output
      |  function with O_NONBLOCK clear would not block, whether or not the function
      |  would transfer data successfully.
      
      That said, EPIPE situation is clearly one of 'ready for writing'.
      
      We don't have read-side issue because tcp_poll() already has read side
      shutdown care.
      
      |        if (sk->sk_shutdown & RCV_SHUTDOWN)
      |                mask |= POLLIN | POLLRDNORM | POLLRDHUP;
      
      So, Let's insert same logic in write side.
      
      - reference url
        http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/31065
        http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/31068Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d84ba638
    • E
      tcp: fix three tcp sysctls tuning · c5ed63d6
      Eric Dumazet 提交于
      As discovered by Anton Blanchard, current code to autotune 
      tcp_death_row.sysctl_max_tw_buckets, sysctl_tcp_max_orphans and
      sysctl_max_syn_backlog makes little sense.
      
      The bigger a page is, the less tcp_max_orphans is : 4096 on a 512GB
      machine in Anton's case.
      
      (tcp_hashinfo.bhash_size * sizeof(struct inet_bind_hashbucket))
      is much bigger if spinlock debugging is on. Its wrong to select bigger
      limits in this case (where kernel structures are also bigger)
      
      bhash_size max is 65536, and we get this value even for small machines. 
      
      A better ground is to use size of ehash table, this also makes code
      shorter and more obvious.
      
      Based on a patch from Anton, and another from David.
      Reported-and-tested-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5ed63d6
  6. 25 8月, 2010 1 次提交
  7. 24 8月, 2010 1 次提交
  8. 18 8月, 2010 1 次提交
  9. 08 8月, 2010 1 次提交
  10. 03 8月, 2010 2 次提交
  11. 02 8月, 2010 4 次提交
  12. 31 7月, 2010 1 次提交
  13. 23 7月, 2010 4 次提交
  14. 22 7月, 2010 1 次提交
  15. 20 7月, 2010 1 次提交
  16. 16 7月, 2010 1 次提交
  17. 15 7月, 2010 1 次提交
  18. 13 7月, 2010 2 次提交
  19. 09 7月, 2010 1 次提交
    • S
      gre: propagate ipv6 transport class · dd4ba83d
      Stephen Hemminger 提交于
      This patch makes IPV6 over IPv4 GRE tunnel propagate the transport
      class field from the underlying IPV6 header to the IPV4 Type Of Service
      field. Without the patch, all IPV6 packets in tunnel look the same to QoS.
      
      This assumes that IPV6 transport class is exactly the same
      as IPv4 TOS. Not sure if that is always the case?  Maybe need
      to mask off some bits.
      
      The mask and shift to get tclass is copied from ipv6/datagram.c
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dd4ba83d
  20. 08 7月, 2010 1 次提交
  21. 06 7月, 2010 1 次提交
  22. 05 7月, 2010 3 次提交
  23. 01 7月, 2010 2 次提交
    • C
      fragment: add fast path for in-order fragments · d6bebca9
      Changli Gao 提交于
      add fast path for in-order fragments
      
      As the fragments are sent in order in most of OSes, such as Windows, Darwin and
      FreeBSD, it is likely the new fragments are at the end of the inet_frag_queue.
      In the fast path, we check if the skb at the end of the inet_frag_queue is the
      prev we expect.
      Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
      ----
       include/net/inet_frag.h |    1 +
       net/ipv4/ip_fragment.c  |   12 ++++++++++++
       net/ipv6/reassembly.c   |   11 +++++++++++
       3 files changed, 24 insertions(+)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6bebca9
    • E
      snmp: 64bit ipstats_mib for all arches · 4ce3c183
      Eric Dumazet 提交于
      /proc/net/snmp and /proc/net/netstat expose SNMP counters.
      
      Width of these counters is either 32 or 64 bits, depending on the size
      of "unsigned long" in kernel.
      
      This means user program parsing these files must already be prepared to
      deal with 64bit values, regardless of user program being 32 or 64 bit.
      
      This patch introduces 64bit snmp values for IPSTAT mib, where some
      counters can wrap pretty fast if they are 32bit wide.
      
      # netstat -s|egrep "InOctets|OutOctets"
          InOctets: 244068329096
          OutOctets: 244069348848
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ce3c183
  24. 29 6月, 2010 2 次提交
  25. 28 6月, 2010 2 次提交