1. 23 11月, 2012 1 次提交
    • J
      ipv4: do not cache looped multicasts · 63617421
      Julian Anastasov 提交于
      	Starting from 3.6 we cache output routes for
      multicasts only when using route to 224/4. For local receivers
      we can set RTCF_LOCAL flag depending on the membership but
      in such case we use maddr and saddr which are not caching
      keys as before. Additionally, we can not use same place to
      cache routes that differ in RTCF_LOCAL flag value.
      
      	Fix it by caching only RTCF_MULTICAST entries
      without RTCF_LOCAL (send-only, no loopback). As a side effect,
      we avoid unneeded lookup for fnhe when not caching because
      multicasts are not redirected and they do not learn PMTU.
      
      	Thanks to Maxime Bizon for showing the caching
      problems in __mkroute_output for 3.6 kernels: different
      RTCF_LOCAL flag in cache can lead to wrong ip_mc_output or
      ip_output call and the visible problem is that traffic can
      not reach local receivers via loopback.
      Reported-by: NMaxime Bizon <mbizon@freebox.fr>
      Tested-by: NMaxime Bizon <mbizon@freebox.fr>
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      63617421
  2. 13 11月, 2012 1 次提交
  3. 19 10月, 2012 1 次提交
  4. 11 10月, 2012 1 次提交
  5. 09 10月, 2012 7 次提交
  6. 19 9月, 2012 3 次提交
  7. 11 9月, 2012 1 次提交
  8. 08 9月, 2012 2 次提交
  9. 01 9月, 2012 1 次提交
    • A
      ipv4: Minor logic clean-up in ipv4_mtu · 98d75c37
      Alexander Duyck 提交于
      In ipv4_mtu there is some logic where we are testing for a non-zero value
      and a timer expiration, then setting the value to zero, and then testing if
      the value is zero we set it to a value based on the dst.  Instead of
      bothering with the extra steps it is easier to just cleanup the logic so
      that we set it to the dst based value if it is zero or if the timer has
      expired.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      98d75c37
  10. 31 8月, 2012 1 次提交
    • E
      ipv4: must use rcu protection while calling fib_lookup · c5ae7d41
      Eric Dumazet 提交于
      Following lockdep splat was reported by Pavel Roskin :
      
      [ 1570.586223] ===============================
      [ 1570.586225] [ INFO: suspicious RCU usage. ]
      [ 1570.586228] 3.6.0-rc3-wl-main #98 Not tainted
      [ 1570.586229] -------------------------------
      [ 1570.586231] /home/proski/src/linux/net/ipv4/route.c:645 suspicious rcu_dereference_check() usage!
      [ 1570.586233]
      [ 1570.586233] other info that might help us debug this:
      [ 1570.586233]
      [ 1570.586236]
      [ 1570.586236] rcu_scheduler_active = 1, debug_locks = 0
      [ 1570.586238] 2 locks held by Chrome_IOThread/4467:
      [ 1570.586240]  #0:  (slock-AF_INET){+.-...}, at: [<ffffffff814f2c0c>] release_sock+0x2c/0xa0
      [ 1570.586253]  #1:  (fnhe_lock){+.-...}, at: [<ffffffff815302fc>] update_or_create_fnhe+0x2c/0x270
      [ 1570.586260]
      [ 1570.586260] stack backtrace:
      [ 1570.586263] Pid: 4467, comm: Chrome_IOThread Not tainted 3.6.0-rc3-wl-main #98
      [ 1570.586265] Call Trace:
      [ 1570.586271]  [<ffffffff810976ed>] lockdep_rcu_suspicious+0xfd/0x130
      [ 1570.586275]  [<ffffffff8153042c>] update_or_create_fnhe+0x15c/0x270
      [ 1570.586278]  [<ffffffff815305b3>] __ip_rt_update_pmtu+0x73/0xb0
      [ 1570.586282]  [<ffffffff81530619>] ip_rt_update_pmtu+0x29/0x90
      [ 1570.586285]  [<ffffffff815411dc>] inet_csk_update_pmtu+0x2c/0x80
      [ 1570.586290]  [<ffffffff81558d1e>] tcp_v4_mtu_reduced+0x2e/0xc0
      [ 1570.586293]  [<ffffffff81553bc4>] tcp_release_cb+0xa4/0xb0
      [ 1570.586296]  [<ffffffff814f2c35>] release_sock+0x55/0xa0
      [ 1570.586300]  [<ffffffff815442ef>] tcp_sendmsg+0x4af/0xf50
      [ 1570.586305]  [<ffffffff8156fc60>] inet_sendmsg+0x120/0x230
      [ 1570.586308]  [<ffffffff8156fb40>] ? inet_sk_rebuild_header+0x40/0x40
      [ 1570.586312]  [<ffffffff814f4bdd>] ? sock_update_classid+0xbd/0x3b0
      [ 1570.586315]  [<ffffffff814f4c50>] ? sock_update_classid+0x130/0x3b0
      [ 1570.586320]  [<ffffffff814ec435>] do_sock_write+0xc5/0xe0
      [ 1570.586323]  [<ffffffff814ec4a3>] sock_aio_write+0x53/0x80
      [ 1570.586328]  [<ffffffff8114bc83>] do_sync_write+0xa3/0xe0
      [ 1570.586332]  [<ffffffff8114c5a5>] vfs_write+0x165/0x180
      [ 1570.586335]  [<ffffffff8114c805>] sys_write+0x45/0x90
      [ 1570.586340]  [<ffffffff815d2722>] system_call_fastpath+0x16/0x1b
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NPavel Roskin <proski@gnu.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5ae7d41
  11. 24 8月, 2012 1 次提交
  12. 23 8月, 2012 1 次提交
    • E
      ipv4: properly update pmtu · 9b04f350
      Eric Dumazet 提交于
      Sylvain Munault reported following info :
      
       - TCP connection get "stuck" with data in send queue when doing
         "large" transfers ( like typing 'ps ax' on a ssh connection )
       - Only happens on path where the PMTU is lower than the MTU of
         the interface
       - Is not present right after boot, it only appears 10-20min after
         boot or so. (and that's inside the _same_ TCP connection, it works
         fine at first and then in the same ssh session, it'll get stuck)
       - Definitely seems related to fragments somehow since I see a router
         sending ICMP message saying fragmentation is needed.
       - Exact same setup works fine with kernel 3.5.1
      
      Problem happens when the 10 minutes (ip_rt_mtu_expires) expiration
      period is over.
      
      ip_rt_update_pmtu() calls dst_set_expires() to rearm a new expiration,
      but dst_set_expires() does nothing because dst.expires is already set.
      
      It seems we want to set the expires field to a new value, regardless
      of prior one.
      
      With help from Julian Anastasov.
      Reported-by: NSylvain Munaut <s.munaut@whatever-company.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      CC: Julian Anastasov <ja@ssi.bg>
      Tested-by: NSylvain Munaut <s.munaut@whatever-company.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b04f350
  13. 15 8月, 2012 1 次提交
  14. 10 8月, 2012 1 次提交
  15. 04 8月, 2012 1 次提交
    • E
      ipv4: Introduce IN_DEV_NET_ROUTE_LOCALNET · 9eb43e76
      Eric Dumazet 提交于
      performance profiles show a high cost in the IN_DEV_ROUTE_LOCALNET()
      call done in ip_route_input_slow(), because of multiple dereferences,
      even if cache lines are clean and available in cpu caches.
      
      Since we already have the 'net' pointer, introduce
      IN_DEV_NET_ROUTE_LOCALNET() macro avoiding two dereferences
      (dev_net(in_dev->dev))
      
      Also change the tests to use IN_DEV_NET_ROUTE_LOCALNET() only if saddr
      or/and daddr are loopback addresse.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9eb43e76
  16. 02 8月, 2012 1 次提交
  17. 01 8月, 2012 4 次提交
  18. 31 7月, 2012 1 次提交
    • E
      net: ipv4: fix RCU races on dst refcounts · 404e0a8b
      Eric Dumazet 提交于
      commit c6cffba4 (ipv4: Fix input route performance regression.)
      added various fatal races with dst refcounts.
      
      crashes happen on tcp workloads if routes are added/deleted at the same
      time.
      
      The dst_free() calls from free_fib_info_rcu() are clearly racy.
      
      We need instead regular dst refcounting (dst_release()) and make
      sure dst_release() is aware of RCU grace periods :
      
      Add DST_RCU_FREE flag so that dst_release() respects an RCU grace period
      before dst destruction for cached dst
      
      Introduce a new inet_sk_rx_dst_set() helper, using atomic_inc_not_zero()
      to make sure we dont increase a zero refcount (On a dst currently
      waiting an rcu grace period before destruction)
      
      rt_cache_route() must take a reference on the new cached route, and
      release it if was not able to install it.
      
      With this patch, my machines survive various benchmarks.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      404e0a8b
  19. 27 7月, 2012 1 次提交
  20. 26 7月, 2012 1 次提交
  21. 24 7月, 2012 3 次提交
    • D
      ipv4: Change rt->rt_iif encoding. · 13378cad
      David S. Miller 提交于
      On input packet processing, rt->rt_iif will be zero if we should
      use skb->dev->ifindex.
      
      Since we access rt->rt_iif consistently via inet_iif(), that is
      the only spot whose interpretation have to adjust.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      13378cad
    • D
      ipv4: Prepare for change of rt->rt_iif encoding. · 92101b3b
      David S. Miller 提交于
      Use inet_iif() consistently, and for TCP record the input interface of
      cached RX dst in inet sock.
      
      rt->rt_iif is going to be encoded differently, so that we can
      legitimately cache input routes in the FIB info more aggressively.
      
      When the input interface is "use SKB device index" the rt->rt_iif will
      be set to zero.
      
      This forces us to move the TCP RX dst cache installation into the ipv4
      specific code, and as well it should since doing the route caching for
      ipv6 is pointless at the moment since it is not inspected in the ipv6
      input paths yet.
      
      Also, remove the unlikely on dst->obsolete, all ipv4 dsts have
      obsolete set to a non-zero value to force invocation of the check
      callback.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92101b3b
    • D
      ipv4: Remove all RTCF_DIRECTSRC handliing. · fe3edf45
      David S. Miller 提交于
      The last and final kernel user, ICMP address replies,
      has been removed.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe3edf45
  22. 21 7月, 2012 5 次提交