1. 31 12月, 2011 1 次提交
    • J
      IPv6: Avoid taking write lock for /proc/net/ipv6_route · 32b293a5
      Josh Hunt 提交于
      During some debugging I needed to look into how /proc/net/ipv6_route
      operated and in my digging I found its calling fib6_clean_all() which uses
      "write_lock_bh(&table->tb6_lock)" before doing the walk of the table. I
      found this on 2.6.32, but reading the code I believe the same basic idea
      exists currently. Looking at the rtnetlink code they are only calling
      "read_lock_bh(&table->tb6_lock);" via fib6_dump_table(). While I realize
      reading from proc isn't the recommended way of fetching the ipv6 route
      table; taking a write lock seems unnecessary and would probably cause
      network performance issues.
      
      To verify this I loaded up the ipv6 route table and then ran iperf in 3
      cases:
        * doing nothing
        * reading ipv6 route table via proc
          (while :; do cat /proc/net/ipv6_route > /dev/null; done)
        * reading ipv6 route table via rtnetlink
          (while :; do ip -6 route show table all > /dev/null; done)
      
      * Load the ipv6 route table up with:
        * for ((i = 0;i < 4000;i++)); do ip route add unreachable 2000::$i; done
      
      * iperf commands:
        * client: iperf -i 1 -V -c <ipv6 addr>
        * server: iperf -V -s
      
      * iperf results - 3 runs each (in Mbits/sec)
        * nothing: client: 927,927,927 server: 927,927,927
        * proc: client: 179,97,96,113 server: 142,112,133
        * iproute: client: 928,927,928 server: 927,927,927
      
      lock_stat shows taking the write lock is causing the slowdown. Using this
      info I decided to write a version of fib6_clean_all() which replaces
      write_lock_bh(&table->tb6_lock) with read_lock_bh(&table->tb6_lock). With
      this new function I see the same results as with my rtnetlink iperf test.
      Signed-off-by: NJosh Hunt <joshhunt00@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32b293a5
  2. 30 12月, 2011 2 次提交
  3. 29 12月, 2011 2 次提交
  4. 27 12月, 2011 1 次提交
  5. 14 12月, 2011 2 次提交
  6. 07 12月, 2011 2 次提交
  7. 06 12月, 2011 1 次提交
  8. 05 12月, 2011 1 次提交
  9. 04 12月, 2011 2 次提交
  10. 27 11月, 2011 3 次提交
  11. 23 11月, 2011 1 次提交
  12. 15 11月, 2011 1 次提交
  13. 01 11月, 2011 1 次提交
  14. 29 10月, 2011 1 次提交
  15. 28 9月, 2011 1 次提交
  16. 17 9月, 2011 1 次提交
  17. 03 8月, 2011 1 次提交
  18. 22 7月, 2011 1 次提交
    • E
      ipv6: unshare inetpeers · 21efcfa0
      Eric Dumazet 提交于
      We currently cow metrics a bit too soon in IPv6 case : All routes are
      tied to a single inetpeer entry.
      
      Change ip6_rt_copy() to get destination address as second argument, so
      that we fill rt6i_dst before the dst_copy_metrics() call.
      
      icmp6_dst_alloc() must set rt6i_dst before calling dst_metric_set(), or
      else the cow is done while rt6i_dst is still NULL.
      
      If orig route points to readonly metrics, we can share the pointer
      instead of performing the memory allocation and copy.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21efcfa0
  19. 18 7月, 2011 3 次提交
  20. 02 7月, 2011 2 次提交
    • D
      ipv6: Don't put artificial limit on routing table size. · 957c665f
      David S. Miller 提交于
      IPV6, unlike IPV4, doesn't have a routing cache.
      
      Routing table entries, as well as clones made in response
      to route lookup requests, all live in the same table.  And
      all of these things are together collected in the destination
      cache table for ipv6.
      
      This means that routing table entries count against the garbage
      collection limits, even though such entries cannot ever be reclaimed
      and are added explicitly by the administrator (rather than being
      created in response to lookups).
      
      Therefore it makes no sense to count ipv6 routing table entries
      against the GC limits.
      
      Add a DST_NOCOUNT destination cache entry flag, and skip the counting
      if it is set.  Use this flag bit in ipv6 when adding routing table
      entries.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      957c665f
    • D
      ipv6: Don't change dst->flags using assignments. · 11d53b49
      David S. Miller 提交于
      This blows away any flags already set in the entry.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      11d53b49
  21. 10 6月, 2011 1 次提交
    • G
      rtnetlink: Compute and store minimum ifinfo dump size · c7ac8679
      Greg Rose 提交于
      The message size allocated for rtnl ifinfo dumps was limited to
      a single page.  This is not enough for additional interface info
      available with devices that support SR-IOV and caused a bug in
      which VF info would not be displayed if more than approximately
      40 VFs were created per interface.
      
      Implement a new function pointer for the rtnl_register service that will
      calculate the amount of data required for the ifinfo dump and allocate
      enough data to satisfy the request.
      Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      c7ac8679
  22. 21 5月, 2011 1 次提交
  23. 29 4月, 2011 2 次提交
  24. 26 4月, 2011 1 次提交
    • H
      net: provide cow_metrics() methods to blackhole dst_ops · 0972ddb2
      Held Bernhard 提交于
      Since commit 62fa8a84 (net: Implement read-only protection and COW'ing
      of metrics.) the kernel throws an oops.
      
      [  101.620985] BUG: unable to handle kernel NULL pointer dereference at
                 (null)
      [  101.621050] IP: [<          (null)>]           (null)
      [  101.621084] PGD 6e53c067 PUD 3dd6a067 PMD 0
      [  101.621122] Oops: 0010 [#1] SMP
      [  101.621153] last sysfs file: /sys/devices/virtual/ppp/ppp/uevent
      [  101.621192] CPU 2
      [  101.621206] Modules linked in: l2tp_ppp pppox ppp_generic slhc
      l2tp_netlink l2tp_core deflate zlib_deflate twofish_x86_64
      twofish_common des_generic cbc ecb sha1_generic hmac af_key
      iptable_filter snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device loop
      snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec
      snd_pcm snd_timer snd i2c_i801 iTCO_wdt psmouse soundcore snd_page_alloc
      evdev uhci_hcd ehci_hcd thermal
      [  101.621552]
      [  101.621567] Pid: 5129, comm: openl2tpd Not tainted 2.6.39-rc4-Quad #3
      Gigabyte Technology Co., Ltd. G33-DS3R/G33-DS3R
      [  101.621637] RIP: 0010:[<0000000000000000>]  [<          (null)>]   (null)
      [  101.621684] RSP: 0018:ffff88003ddeba60  EFLAGS: 00010202
      [  101.621716] RAX: ffff88003ddb5600 RBX: ffff88003ddb5600 RCX:
      0000000000000020
      [  101.621758] RDX: ffffffff81a69a00 RSI: ffffffff81b7ee61 RDI:
      ffff88003ddb5600
      [  101.621800] RBP: ffff8800537cd900 R08: 0000000000000000 R09:
      ffff88003ddb5600
      [  101.621840] R10: 0000000000000005 R11: 0000000000014b38 R12:
      ffff88003ddb5600
      [  101.621881] R13: ffffffff81b7e480 R14: ffffffff81b7e8b8 R15:
      ffff88003ddebad8
      [  101.621924] FS:  00007f06e4182700(0000) GS:ffff88007fd00000(0000)
      knlGS:0000000000000000
      [  101.621971] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  101.622005] CR2: 0000000000000000 CR3: 0000000045274000 CR4:
      00000000000006e0
      [  101.622046] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
      0000000000000000
      [  101.622087] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
      0000000000000400
      [  101.622129] Process openl2tpd (pid: 5129, threadinfo
      ffff88003ddea000, task ffff88003de9a280)
      [  101.622177] Stack:
      [  101.622191]  ffffffff81447efa ffff88007d3ded80 ffff88003de9a280
      ffff88007d3ded80
      [  101.622245]  0000000000000001 ffff88003ddebbb8 ffffffff8148d5a7
      0000000000000212
      [  101.622299]  ffff88003dcea000 ffff88003dcea188 ffffffff00000001
      ffffffff81b7e480
      [  101.622353] Call Trace:
      [  101.622374]  [<ffffffff81447efa>] ? ipv4_blackhole_route+0x1ba/0x210
      [  101.622415]  [<ffffffff8148d5a7>] ? xfrm_lookup+0x417/0x510
      [  101.622450]  [<ffffffff8127672a>] ? extract_buf+0x9a/0x140
      [  101.622485]  [<ffffffff8144c6a0>] ? __ip_flush_pending_frames+0x70/0x70
      [  101.622526]  [<ffffffff8146fbbf>] ? udp_sendmsg+0x62f/0x810
      [  101.622562]  [<ffffffff813f98a6>] ? sock_sendmsg+0x116/0x130
      [  101.622599]  [<ffffffff8109df58>] ? find_get_page+0x18/0x90
      [  101.622633]  [<ffffffff8109fd6a>] ? filemap_fault+0x12a/0x4b0
      [  101.622668]  [<ffffffff813fb5c4>] ? move_addr_to_kernel+0x64/0x90
      [  101.622706]  [<ffffffff81405d5a>] ? verify_iovec+0x7a/0xf0
      [  101.622739]  [<ffffffff813fc772>] ? sys_sendmsg+0x292/0x420
      [  101.622774]  [<ffffffff810b994a>] ? handle_pte_fault+0x8a/0x7c0
      [  101.622810]  [<ffffffff810b76fe>] ? __pte_alloc+0xae/0x130
      [  101.622844]  [<ffffffff810ba2f8>] ? handle_mm_fault+0x138/0x380
      [  101.622880]  [<ffffffff81024af9>] ? do_page_fault+0x189/0x410
      [  101.622915]  [<ffffffff813fbe03>] ? sys_getsockname+0xf3/0x110
      [  101.622952]  [<ffffffff81450c4d>] ? ip_setsockopt+0x4d/0xa0
      [  101.622986]  [<ffffffff813f9932>] ? sockfd_lookup_light+0x22/0x90
      [  101.623024]  [<ffffffff814b61fb>] ? system_call_fastpath+0x16/0x1b
      [  101.623060] Code:  Bad RIP value.
      [  101.623090] RIP  [<          (null)>]           (null)
      [  101.623125]  RSP <ffff88003ddeba60>
      [  101.623146] CR2: 0000000000000000
      [  101.650871] ---[ end trace ca3856a7d8e8dad4 ]---
      [  101.651011] __sk_free: optmem leakage (160 bytes) detected.
      
      The oops happens in dst_metrics_write_ptr()
      include/net/dst.h:124: return dst->ops->cow_metrics(dst, p);
      
      dst->ops->cow_metrics is NULL and causes the oops.
      
      Provide cow_metrics() methods, like we did in commit 214f45c9
      (net: provide default_advmss() methods to blackhole dst_ops)
      Signed-off-by: NHeld Bernhard <berny156@gmx.de>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0972ddb2
  25. 23 4月, 2011 1 次提交
  26. 22 4月, 2011 1 次提交
  27. 16 4月, 2011 1 次提交
  28. 23 3月, 2011 1 次提交
  29. 13 3月, 2011 1 次提交