1. 23 1月, 2012 1 次提交
  2. 19 1月, 2012 1 次提交
    • F
      net: race condition in ipv6 forwarding and disable_ipv6 parameters · 013d97e9
      Francesco Ruggeri 提交于
      There is a race condition in addrconf_sysctl_forward() and
      addrconf_sysctl_disable().
      These functions change idev->cnf.forwarding (resp. idev->cnf.disable_ipv6)
      and then try to grab the rtnl lock before performing any actions.
      If that fails they restore the original value and restart the syscall.
      This creates race conditions if ipv6 code tries to access
      these parameters, or if multiple instances try to do the same operation.
      As an example of the former, if __ipv6_ifa_notify() finds a 0 in
      idev->cnf.forwarding when invoked by addrconf_ifdown() it may not free
      anycast addresses, ultimately resulting in the net_device not being freed.
      This patch reads the user parameters into a temporary location and only
      writes the actual parameters when the rtnl lock is acquired.
      Tested in 2.6.38.8.
      Signed-off-by: NFrancesco Ruggeri <fruggeri@aristanetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      013d97e9
  3. 18 1月, 2012 1 次提交
  4. 17 1月, 2012 1 次提交
  5. 14 1月, 2012 1 次提交
  6. 13 1月, 2012 1 次提交
  7. 06 1月, 2012 1 次提交
  8. 05 1月, 2012 2 次提交
    • M
      ipv6/addrconf: speedup /proc/net/if_inet6 filling · 1d578303
      Mihai Maruseac 提交于
      This ensures a linear behaviour when filling /proc/net/if_inet6 thus making
      ifconfig run really fast on IPv6 only addresses. In fact, with this patch and
      the IPv4 one sent a while ago, ifconfig will run in linear time regardless of
      address type.
      
      IPv4 related patch: f04565dd
      	 dev: use name hash for dev_seq_ops
      	 ...
      
      Some statistics (running ifconfig > /dev/null on a different setup):
      
      iface count / IPv6 no-patch time / IPv6 patched time / IPv4 time
      ----------------------------------------------------------------
            6250  |       0.23 s       |      0.13 s       |  0.11 s
           12500  |       0.62 s       |      0.28 s       |  0.22 s
           25000  |       2.91 s       |      0.57 s       |  0.46 s
           50000  |      11.37 s       |      1.21 s       |  0.94 s
          128000  |      86.78 s       |      3.05 s       |  2.54 s
      Signed-off-by: NMihai Maruseac <mmaruseac@ixiacom.com>
      Cc: Daniel Baluta <dbaluta@ixiacom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d578303
    • N
      ipv6: Check RA for sllao when configuring optimistic ipv6 address (v2) · e6bff995
      Neil Horman 提交于
      Recently Dave noticed that a test we did in ipv6_add_addr to see if we next hop
      route for the interface we're adding an addres to was wrong (see commit
      7ffbcecb).  for one, it never triggers, and two,
      it was completely wrong to begin with.  This test was meant to cover this
      section of RFC 4429:
      
      3.3 Modifications to RFC 2462 Stateless Address Autoconfiguration
      
         * (modifies section 5.5) A host MAY choose to configure a new address
              as an Optimistic Address.  A host that does not know the SLLAO
              of its router SHOULD NOT configure a new address as Optimistic.
              A router SHOULD NOT configure an Optimistic Address.
      
      This patch should bring us into proper compliance with the above clause.  Since
      we only add a SLAAC address after we've received a RA which may or may not
      contain a source link layer address option, we can pass a pointer to that option
      to addrconf_prefix_rcv (which may be null if the option is not present), and
      only set the optimistic flag if the option was found in the RA.
      
      Change notes:
      (v2) modified the new parameter to addrconf_prefix_rcv to be a bool rather than
      a pointer to make its use more clear as per request from davem.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6bff995
  9. 31 12月, 2011 1 次提交
    • J
      IPv6: Avoid taking write lock for /proc/net/ipv6_route · 32b293a5
      Josh Hunt 提交于
      During some debugging I needed to look into how /proc/net/ipv6_route
      operated and in my digging I found its calling fib6_clean_all() which uses
      "write_lock_bh(&table->tb6_lock)" before doing the walk of the table. I
      found this on 2.6.32, but reading the code I believe the same basic idea
      exists currently. Looking at the rtnetlink code they are only calling
      "read_lock_bh(&table->tb6_lock);" via fib6_dump_table(). While I realize
      reading from proc isn't the recommended way of fetching the ipv6 route
      table; taking a write lock seems unnecessary and would probably cause
      network performance issues.
      
      To verify this I loaded up the ipv6 route table and then ran iperf in 3
      cases:
        * doing nothing
        * reading ipv6 route table via proc
          (while :; do cat /proc/net/ipv6_route > /dev/null; done)
        * reading ipv6 route table via rtnetlink
          (while :; do ip -6 route show table all > /dev/null; done)
      
      * Load the ipv6 route table up with:
        * for ((i = 0;i < 4000;i++)); do ip route add unreachable 2000::$i; done
      
      * iperf commands:
        * client: iperf -i 1 -V -c <ipv6 addr>
        * server: iperf -V -s
      
      * iperf results - 3 runs each (in Mbits/sec)
        * nothing: client: 927,927,927 server: 927,927,927
        * proc: client: 179,97,96,113 server: 142,112,133
        * iproute: client: 928,927,928 server: 927,927,927
      
      lock_stat shows taking the write lock is causing the slowdown. Using this
      info I decided to write a version of fib6_clean_all() which replaces
      write_lock_bh(&table->tb6_lock) with read_lock_bh(&table->tb6_lock). With
      this new function I see the same results as with my rtnetlink iperf test.
      Signed-off-by: NJosh Hunt <joshhunt00@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32b293a5
  10. 30 12月, 2011 2 次提交
  11. 29 12月, 2011 4 次提交
  12. 27 12月, 2011 1 次提交
  13. 23 12月, 2011 1 次提交
    • E
      net: introduce DST_NOPEER dst flag · e688a604
      Eric Dumazet 提交于
      Chris Boot reported crashes occurring in ipv6_select_ident().
      
      [  461.457562] RIP: 0010:[<ffffffff812dde61>]  [<ffffffff812dde61>]
      ipv6_select_ident+0x31/0xa7
      
      [  461.578229] Call Trace:
      [  461.580742] <IRQ>
      [  461.582870]  [<ffffffff812efa7f>] ? udp6_ufo_fragment+0x124/0x1a2
      [  461.589054]  [<ffffffff812dbfe0>] ? ipv6_gso_segment+0xc0/0x155
      [  461.595140]  [<ffffffff812700c6>] ? skb_gso_segment+0x208/0x28b
      [  461.601198]  [<ffffffffa03f236b>] ? ipv6_confirm+0x146/0x15e
      [nf_conntrack_ipv6]
      [  461.608786]  [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77
      [  461.614227]  [<ffffffff81271d64>] ? dev_hard_start_xmit+0x357/0x543
      [  461.620659]  [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111
      [  461.626440]  [<ffffffffa0379745>] ? br_parse_ip_options+0x19a/0x19a
      [bridge]
      [  461.633581]  [<ffffffff812722ff>] ? dev_queue_xmit+0x3af/0x459
      [  461.639577]  [<ffffffffa03747d2>] ? br_dev_queue_push_xmit+0x72/0x76
      [bridge]
      [  461.646887]  [<ffffffffa03791e3>] ? br_nf_post_routing+0x17d/0x18f
      [bridge]
      [  461.653997]  [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77
      [  461.659473]  [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge]
      [  461.665485]  [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111
      [  461.671234]  [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge]
      [  461.677299]  [<ffffffffa0379215>] ?
      nf_bridge_update_protocol+0x20/0x20 [bridge]
      [  461.684891]  [<ffffffffa03bb0e5>] ? nf_ct_zone+0xa/0x17 [nf_conntrack]
      [  461.691520]  [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge]
      [  461.697572]  [<ffffffffa0374812>] ? NF_HOOK.constprop.8+0x3c/0x56
      [bridge]
      [  461.704616]  [<ffffffffa0379031>] ?
      nf_bridge_push_encap_header+0x1c/0x26 [bridge]
      [  461.712329]  [<ffffffffa037929f>] ? br_nf_forward_finish+0x8a/0x95
      [bridge]
      [  461.719490]  [<ffffffffa037900a>] ?
      nf_bridge_pull_encap_header+0x1c/0x27 [bridge]
      [  461.727223]  [<ffffffffa0379974>] ? br_nf_forward_ip+0x1c0/0x1d4 [bridge]
      [  461.734292]  [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77
      [  461.739758]  [<ffffffffa03748cc>] ? __br_deliver+0xa0/0xa0 [bridge]
      [  461.746203]  [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111
      [  461.751950]  [<ffffffffa03748cc>] ? __br_deliver+0xa0/0xa0 [bridge]
      [  461.758378]  [<ffffffffa037533a>] ? NF_HOOK.constprop.4+0x56/0x56
      [bridge]
      
      This is caused by bridge netfilter special dst_entry (fake_rtable), a
      special shared entry, where attaching an inetpeer makes no sense.
      
      Problem is present since commit 87c48fa3 (ipv6: make fragment
      identifications less predictable)
      
      Introduce DST_NOPEER dst flag and make sure ipv6_select_ident() and
      __ip_select_ident() fallback to the 'no peer attached' handling.
      Reported-by: NChris Boot <bootc@bootc.net>
      Tested-by: NChris Boot <bootc@bootc.net>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e688a604
  14. 20 12月, 2011 1 次提交
  15. 14 12月, 2011 2 次提交
  16. 13 12月, 2011 6 次提交
    • F
      netfilter: add ipv6 reverse path filter match · e26f9a48
      Florian Westphal 提交于
      This is not merged with the ipv4 match into xt_rpfilter.c
      to avoid ipv6 module dependency issues.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      e26f9a48
    • G
      per-netns ipv4 sysctl_tcp_mem · 3dc43e3e
      Glauber Costa 提交于
      This patch allows each namespace to independently set up
      its levels for tcp memory pressure thresholds. This patch
      alone does not buy much: we need to make this values
      per group of process somehow. This is achieved in the
      patches that follows in this patchset.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3dc43e3e
    • G
      tcp memory pressure controls · d1a4c0b3
      Glauber Costa 提交于
      This patch introduces memory pressure controls for the tcp
      protocol. It uses the generic socket memory pressure code
      introduced in earlier patches, and fills in the
      necessary data in cg_proto struct.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com>
      CC: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1a4c0b3
    • G
      foundations of per-cgroup memory pressure controlling. · 180d8cd9
      Glauber Costa 提交于
      This patch replaces all uses of struct sock fields' memory_pressure,
      memory_allocated, sockets_allocated, and sysctl_mem to acessor
      macros. Those macros can either receive a socket argument, or a mem_cgroup
      argument, depending on the context they live in.
      
      Since we're only doing a macro wrapping here, no performance impact at all is
      expected in the case where we don't have cgroups disabled.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Reviewed-by: NHiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: Eric W. Biederman <ebiederm@xmission.com>
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      180d8cd9
    • T
      ipip, sit: copy parms.name after register_netdevice · 72b36015
      Ted Feng 提交于
      Same fix as 731abb9c for ipip and sit tunnel.
      Commit 1c5cae81 removed an explicit call to dev_alloc_name in
      ipip_tunnel_locate and ipip6_tunnel_locate, because register_netdevice
      will now create a valid name, however the tunnel keeps a copy of the
      name in the private parms structure. Fix this by copying the name back
      after register_netdevice has successfully returned.
      
      This shows up if you do a simple tunnel add, followed by a tunnel show:
      
      $ sudo ip tunnel add mode ipip remote 10.2.20.211
      $ ip tunnel
      tunl0: ip/ip  remote any  local any  ttl inherit  nopmtudisc
      tunl%d: ip/ip  remote 10.2.20.211  local any  ttl inherit
      $ sudo ip tunnel add mode sit remote 10.2.20.212
      $ ip tunnel
      sit0: ipv6/ip  remote any  local any  ttl 64  nopmtudisc 6rd-prefix 2002::/16
      sit%d: ioctl 89f8 failed: No such device
      sit%d: ipv6/ip  remote 10.2.20.212  local any  ttl inherit
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NTed Feng <artisdom@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      72b36015
    • L
      ipv6: Fix for adding multicast route for loopback device automatically. · 4af04aba
      Li Wei 提交于
      There is no obvious reason to add a default multicast route for loopback
      devices, otherwise there would be a route entry whose dst.error set to
      -ENETUNREACH that would blocking all multicast packets.
      
      ====================
      
      [ more detailed explanation ]
      
      The problem is that the resulting routing table depends on the sequence
      of interface's initialization and in some situation, that would block all
      muticast packets. Suppose there are two interfaces on my computer
      (lo and eth0), if we initailize 'lo' before 'eth0', the resuting routing
      table(for multicast) would be
      
      # ip -6 route show | grep ff00::
      unreachable ff00::/8 dev lo metric 256 error -101
      ff00::/8 dev eth0 metric 256
      
      When sending multicasting packets, routing subsystem will return the first
      route entry which with a error set to -101(ENETUNREACH).
      
      I know the kernel will set the default ipv6 address for 'lo' when it is up
      and won't set the default multicast route for it, but there is no reason to
      stop 'init' program from setting address for 'lo', and that is exactly what
      systemd did.
      
      I am sure there is something wrong with kernel or systemd, currently I preferred
      kernel caused this problem.
      
      ====================
      Signed-off-by: NLi Wei <lw@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4af04aba
  17. 10 12月, 2011 1 次提交
  18. 07 12月, 2011 2 次提交
  19. 06 12月, 2011 1 次提交
  20. 05 12月, 2011 1 次提交
  21. 04 12月, 2011 4 次提交
  22. 02 12月, 2011 1 次提交
  23. 01 12月, 2011 2 次提交
  24. 29 11月, 2011 1 次提交
    • E
      net: dont call jump_label_dec from irq context · b90e5794
      Eric Dumazet 提交于
      Igor Maravic reported an error caused by jump_label_dec() being called
      from IRQ context :
      
       BUG: sleeping function called from invalid context at kernel/mutex.c:271
       in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper
       1 lock held by swapper/0:
        #0:  (&n->timer){+.-...}, at: [<ffffffff8107ce90>] call_timer_fn+0x0/0x340
       Pid: 0, comm: swapper Not tainted 3.2.0-rc2-net-next-mpls+ #1
      Call Trace:
       <IRQ>  [<ffffffff8104f417>] __might_sleep+0x137/0x1f0
       [<ffffffff816b9a2f>] mutex_lock_nested+0x2f/0x370
       [<ffffffff810a89fd>] ? trace_hardirqs_off+0xd/0x10
       [<ffffffff8109a37f>] ? local_clock+0x6f/0x80
       [<ffffffff810a90a5>] ? lock_release_holdtime.part.22+0x15/0x1a0
       [<ffffffff81557929>] ? sock_def_write_space+0x59/0x160
       [<ffffffff815e936e>] ? arp_error_report+0x3e/0x90
       [<ffffffff810969cd>] atomic_dec_and_mutex_lock+0x5d/0x80
       [<ffffffff8112fc1d>] jump_label_dec+0x1d/0x50
       [<ffffffff81566525>] net_disable_timestamp+0x15/0x20
       [<ffffffff81557a75>] sock_disable_timestamp+0x45/0x50
       [<ffffffff81557b00>] __sk_free+0x80/0x200
       [<ffffffff815578d0>] ? sk_send_sigurg+0x70/0x70
       [<ffffffff815e936e>] ? arp_error_report+0x3e/0x90
       [<ffffffff81557cba>] sock_wfree+0x3a/0x70
       [<ffffffff8155c2b0>] skb_release_head_state+0x70/0x120
       [<ffffffff8155c0b6>] __kfree_skb+0x16/0x30
       [<ffffffff8155c119>] kfree_skb+0x49/0x170
       [<ffffffff815e936e>] arp_error_report+0x3e/0x90
       [<ffffffff81575bd9>] neigh_invalidate+0x89/0xc0
       [<ffffffff81578dbe>] neigh_timer_handler+0x9e/0x2a0
       [<ffffffff81578d20>] ? neigh_update+0x640/0x640
       [<ffffffff81073558>] __do_softirq+0xc8/0x3a0
      
      Since jump_label_{inc|dec} must be called from process context only,
      we must defer jump_label_dec() if net_disable_timestamp() is called
      from interrupt context.
      Reported-by: NIgor Maravic <igorm@etf.rs>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b90e5794