1. 29 11月, 2013 1 次提交
  2. 24 11月, 2013 5 次提交
  3. 19 11月, 2013 3 次提交
  4. 18 11月, 2013 1 次提交
  5. 15 11月, 2013 5 次提交
  6. 11 11月, 2013 3 次提交
  7. 09 11月, 2013 4 次提交
  8. 06 11月, 2013 5 次提交
    • J
      ipv6: Fix possible ipv6 seqlock deadlock · 5ac68e7c
      John Stultz 提交于
      While enabling lockdep on seqlocks, I ran across the warning below
      caused by the ipv6 stats being updated in both irq and non-irq context.
      
      This patch changes from IP6_INC_STATS_BH to IP6_INC_STATS (suggested
      by Eric Dumazet) to resolve this problem.
      
      [   11.120383] =================================
      [   11.121024] [ INFO: inconsistent lock state ]
      [   11.121663] 3.12.0-rc1+ #68 Not tainted
      [   11.122229] ---------------------------------
      [   11.122867] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      [   11.123741] init/4483 [HC0[0]:SC1[3]:HE1:SE0] takes:
      [   11.124505]  (&stats->syncp.seq#6){+.?...}, at: [<c1ab80c2>] ndisc_send_ns+0xe2/0x130
      [   11.125736] {SOFTIRQ-ON-W} state was registered at:
      [   11.126447]   [<c10e0eb7>] __lock_acquire+0x5c7/0x1af0
      [   11.127222]   [<c10e2996>] lock_acquire+0x96/0xd0
      [   11.127925]   [<c1a9a2c3>] write_seqcount_begin+0x33/0x40
      [   11.128766]   [<c1a9aa03>] ip6_dst_lookup_tail+0x3a3/0x460
      [   11.129582]   [<c1a9e0ce>] ip6_dst_lookup_flow+0x2e/0x80
      [   11.130014]   [<c1ad18e0>] ip6_datagram_connect+0x150/0x4e0
      [   11.130014]   [<c1a4d0b5>] inet_dgram_connect+0x25/0x70
      [   11.130014]   [<c198dd61>] SYSC_connect+0xa1/0xc0
      [   11.130014]   [<c198f571>] SyS_connect+0x11/0x20
      [   11.130014]   [<c198fe6b>] SyS_socketcall+0x12b/0x300
      [   11.130014]   [<c1bbf880>] syscall_call+0x7/0xb
      [   11.130014] irq event stamp: 1184
      [   11.130014] hardirqs last  enabled at (1184): [<c1086901>] local_bh_enable+0x71/0x110
      [   11.130014] hardirqs last disabled at (1183): [<c10868cd>] local_bh_enable+0x3d/0x110
      [   11.130014] softirqs last  enabled at (0): [<c108014d>] copy_process.part.42+0x45d/0x11a0
      [   11.130014] softirqs last disabled at (1147): [<c1086e05>] irq_exit+0xa5/0xb0
      [   11.130014]
      [   11.130014] other info that might help us debug this:
      [   11.130014]  Possible unsafe locking scenario:
      [   11.130014]
      [   11.130014]        CPU0
      [   11.130014]        ----
      [   11.130014]   lock(&stats->syncp.seq#6);
      [   11.130014]   <Interrupt>
      [   11.130014]     lock(&stats->syncp.seq#6);
      [   11.130014]
      [   11.130014]  *** DEADLOCK ***
      [   11.130014]
      [   11.130014] 3 locks held by init/4483:
      [   11.130014]  #0:  (rcu_read_lock){.+.+..}, at: [<c109363c>] SyS_setpriority+0x4c/0x620
      [   11.130014]  #1:  (((&ifa->dad_timer))){+.-...}, at: [<c108c1c0>] call_timer_fn+0x0/0xf0
      [   11.130014]  #2:  (rcu_read_lock){.+.+..}, at: [<c1ab6494>] ndisc_send_skb+0x54/0x5d0
      [   11.130014]
      [   11.130014] stack backtrace:
      [   11.130014] CPU: 0 PID: 4483 Comm: init Not tainted 3.12.0-rc1+ #68
      [   11.130014] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [   11.130014]  00000000 00000000 c55e5c10 c1bb0e71 c57128b0 c55e5c4c c1badf79 c1ec1123
      [   11.130014]  c1ec1484 00001183 00000000 00000000 00000001 00000003 00000001 00000000
      [   11.130014]  c1ec1484 00000004 c5712dcc 00000000 c55e5c84 c10de492 00000004 c10755f2
      [   11.130014] Call Trace:
      [   11.130014]  [<c1bb0e71>] dump_stack+0x4b/0x66
      [   11.130014]  [<c1badf79>] print_usage_bug+0x1d3/0x1dd
      [   11.130014]  [<c10de492>] mark_lock+0x282/0x2f0
      [   11.130014]  [<c10755f2>] ? kvm_clock_read+0x22/0x30
      [   11.130014]  [<c10dd8b0>] ? check_usage_backwards+0x150/0x150
      [   11.130014]  [<c10e0e74>] __lock_acquire+0x584/0x1af0
      [   11.130014]  [<c10b1baf>] ? sched_clock_cpu+0xef/0x190
      [   11.130014]  [<c10de58c>] ? mark_held_locks+0x8c/0xf0
      [   11.130014]  [<c10e2996>] lock_acquire+0x96/0xd0
      [   11.130014]  [<c1ab80c2>] ? ndisc_send_ns+0xe2/0x130
      [   11.130014]  [<c1ab66d3>] ndisc_send_skb+0x293/0x5d0
      [   11.130014]  [<c1ab80c2>] ? ndisc_send_ns+0xe2/0x130
      [   11.130014]  [<c1ab80c2>] ndisc_send_ns+0xe2/0x130
      [   11.130014]  [<c108cc32>] ? mod_timer+0xf2/0x160
      [   11.130014]  [<c1aa706e>] ? addrconf_dad_timer+0xce/0x150
      [   11.130014]  [<c1aa70aa>] addrconf_dad_timer+0x10a/0x150
      [   11.130014]  [<c1aa6fa0>] ? addrconf_dad_completed+0x1c0/0x1c0
      [   11.130014]  [<c108c233>] call_timer_fn+0x73/0xf0
      [   11.130014]  [<c108c1c0>] ? __internal_add_timer+0xb0/0xb0
      [   11.130014]  [<c1aa6fa0>] ? addrconf_dad_completed+0x1c0/0x1c0
      [   11.130014]  [<c108c5b1>] run_timer_softirq+0x141/0x1e0
      [   11.130014]  [<c1086b20>] ? __do_softirq+0x70/0x1b0
      [   11.130014]  [<c1086b70>] __do_softirq+0xc0/0x1b0
      [   11.130014]  [<c1086e05>] irq_exit+0xa5/0xb0
      [   11.130014]  [<c106cfd5>] smp_apic_timer_interrupt+0x35/0x50
      [   11.130014]  [<c1bbfbca>] apic_timer_interrupt+0x32/0x38
      [   11.130014]  [<c10936ed>] ? SyS_setpriority+0xfd/0x620
      [   11.130014]  [<c10e26c9>] ? lock_release+0x9/0x240
      [   11.130014]  [<c10936d7>] ? SyS_setpriority+0xe7/0x620
      [   11.130014]  [<c1bbee6d>] ? _raw_read_unlock+0x1d/0x30
      [   11.130014]  [<c1093701>] SyS_setpriority+0x111/0x620
      [   11.130014]  [<c109363c>] ? SyS_setpriority+0x4c/0x620
      [   11.130014]  [<c1bbf880>] syscall_call+0x7/0xb
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: James Morris <jmorris@namei.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: netdev@vger.kernel.org
      Link: http://lkml.kernel.org/r/1381186321-4906-5-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5ac68e7c
    • J
      net: Explicitly initialize u64_stats_sync structures for lockdep · 827da44c
      John Stultz 提交于
      In order to enable lockdep on seqcount/seqlock structures, we
      must explicitly initialize any locks.
      
      The u64_stats_sync structure, uses a seqcount, and thus we need
      to introduce a u64_stats_init() function and use it to initialize
      the structure.
      
      This unfortunately adds a lot of fairly trivial initialization code
      to a number of drivers. But the benefit of ensuring correctness makes
      this worth while.
      
      Because these changes are required for lockdep to be enabled, and the
      changes are quite trivial, I've not yet split this patch out into 30-some
      separate patches, as I figured it would be better to get the various
      maintainers thoughts on how to best merge this change along with
      the seqcount lockdep enablement.
      
      Feedback would be appreciated!
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Acked-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: James Morris <jmorris@namei.org>
      Cc: Jesse Gross <jesse@nicira.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Mirko Lindner <mlindner@marvell.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Roger Luethi <rl@hellgate.ch>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Simon Horman <horms@verge.net.au>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Wensong Zhang <wensong@linux-vs.org>
      Cc: netdev@vger.kernel.org
      Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      827da44c
    • D
      ipv6: drop the judgement in rt6_alloc_cow() · 249a3630
      Duan Jiong 提交于
      Now rt6_alloc_cow() is only called by ip6_pol_route() when
      rt->rt6i_flags doesn't contain both RTF_NONEXTHOP and RTF_GATEWAY,
      and rt->rt6i_flags hasn't been changed in ip6_rt_copy().
      So there is no neccessary to judge whether rt->rt6i_flags contains
      RTF_GATEWAY or not.
      Signed-off-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      249a3630
    • H
      ipv6: fix headroom calculation in udp6_ufo_fragment · 0e033e04
      Hannes Frederic Sowa 提交于
      Commit 1e2bd517 ("udp6: Fix udp
      fragmentation for tunnel traffic.") changed the calculation if
      there is enough space to include a fragment header in the skb from a
      skb->mac_header dervived one to skb_headroom. Because we already peeled
      off the skb to transport_header this is wrong. Change this back to check
      if we have enough room before the mac_header.
      
      This fixes a panic Saran Neti reported. He used the tbf scheduler which
      skb_gso_segments the skb. The offsets get negative and we panic in memcpy
      because the skb was erroneously not expanded at the head.
      Reported-by: NSaran Neti <Saran.Neti@telus.com>
      Cc: Pravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e033e04
    • F
      ipv6: remove old conditions on flow label sharing · b579035f
      Florent Fourcot 提交于
      The code of flow label in Linux Kernel follows
      the rules of RFC 1809 (an informational one) for
      conditions on flow label sharing. There rules are
      not in the last proposed standard for flow label
      (RFC 6437), or in the previous one (RFC 3697).
      
      Since this code does not follow any current or
      old standard, we can remove it.
      
      With this removal, the ipv6_opt_cmp function is
      now a dead code and it can be removed too.
      
      Changelog to v1:
       * add justification for the change
       * remove the condition on IPv6 options
      
      [ Remove ipv6_hdr_cmp and it is now unused as well. -DaveM ]
      Signed-off-by: NFlorent Fourcot <florent.fourcot@enst-bretagne.fr>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b579035f
  9. 01 11月, 2013 1 次提交
  10. 31 10月, 2013 1 次提交
  11. 29 10月, 2013 3 次提交
  12. 28 10月, 2013 1 次提交
    • S
      xfrm: Increase the garbage collector threshold · eeb1b733
      Steffen Klassert 提交于
      With the removal of the routing cache, we lost the
      option to tweak the garbage collector threshold
      along with the maximum routing cache size. So git
      commit 703fb94e ("xfrm: Fix the gc threshold value
      for ipv4") moved back to a static threshold.
      
      It turned out that the current threshold before we
      start garbage collecting is much to small for some
      workloads, so increase it from 1024 to 32768. This
      means that we start the garbage collector if we have
      more than 32768 dst entries in the system and refuse
      new allocations if we are above 65536.
      Reported-by: NWolfgang Walter <linux@stwm.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      eeb1b733
  13. 26 10月, 2013 1 次提交
    • H
      ipv6: ip6_dst_check needs to check for expired dst_entries · e3bc10bd
      Hannes Frederic Sowa 提交于
      On receiving a packet too big icmp error we check if our current cached
      dst_entry in the socket is still valid. This validation check did not
      care about the expiration of the (cached) route.
      
      The error path I traced down:
      The socket receives a packet too big mtu notification. It still has a
      valid dst_entry and thus issues the ip6_rt_pmtu_update on this dst_entry,
      setting RTF_EXPIRE and updates the dst.expiration value (which could
      fail because of not up-to-date expiration values, see previous patch).
      
      In some seldom cases we race with a) the ip6_fib gc or b) another routing
      lookup which would result in a recreation of the cached rt6_info from its
      parent non-cached rt6_info. While copying the rt6_info we reinitialize the
      metrics store by copying it over from the parent thus invalidating the
      just installed pmtu update (both dsts use the same key to the inetpeer
      storage). The dst_entry with the just invalidated metrics data would
      just get its RTF_EXPIRES flag cleared and would continue to stay valid
      for the socket.
      
      We should have not issued the pmtu update on the already expired dst_entry
      in the first placed. By checking the expiration on the dst entry and
      doing a relookup in case it is out of date we close the race because
      we would install a new rt6_info into the fib before we issue the pmtu
      update, thus closing this race.
      
      Not reliably updating the dst.expire value was fixed by the patch "ipv6:
      reset dst.expires value when clearing expire flag".
      Reported-by: NSteinar H. Gunderson <sgunderson@bigfoot.com>
      Reported-by: NValentijn Sessink <valentyn@blub.net>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Tested-by: NValentijn Sessink <valentyn@blub.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3bc10bd
  14. 24 10月, 2013 1 次提交
  15. 23 10月, 2013 1 次提交
    • S
      netfilter: ip6t_REJECT: skip checksum verification for outgoing ipv6 packets · f2020b27
      Stanislav Fomichev 提交于
      Don't verify checksum for outgoing packets because checksum calculation
      may be done by the device.
      
      Without this patch:
      $ ip6tables -I OUTPUT -p tcp --dport 80 -j REJECT --reject-with tcp-reset
      $ time telnet ipv6.google.com 80
      Trying 2a00:1450:4010:c03::67...
      telnet: Unable to connect to remote host: Connection timed out
      
      real    0m7.201s
      user    0m0.000s
      sys     0m0.000s
      
      With the patch applied:
      $ ip6tables -I OUTPUT -p tcp --dport 80 -j REJECT --reject-with tcp-reset
      $ time telnet ipv6.google.com 80
      Trying 2a00:1450:4010:c03::67...
      telnet: Unable to connect to remote host: Connection refused
      
      real    0m0.085s
      user    0m0.000s
      sys     0m0.000s
      Signed-off-by: NStanislav Fomichev <stfomichev@yandex-team.ru>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f2020b27
  16. 22 10月, 2013 4 次提交
    • W
      netfilter: x_tables: fix ordering of jumpstack allocation and table update · b416c144
      Will Deacon 提交于
      During kernel stability testing on an SMP ARMv7 system, Yalin Wang
      reported the following panic from the netfilter code:
      
        1fe0: 0000001c 5e2d3b10 4007e779 4009e110 60000010 00000032 ff565656 ff545454
        [<c06c48dc>] (ipt_do_table+0x448/0x584) from [<c0655ef0>] (nf_iterate+0x48/0x7c)
        [<c0655ef0>] (nf_iterate+0x48/0x7c) from [<c0655f7c>] (nf_hook_slow+0x58/0x104)
        [<c0655f7c>] (nf_hook_slow+0x58/0x104) from [<c0683bbc>] (ip_local_deliver+0x88/0xa8)
        [<c0683bbc>] (ip_local_deliver+0x88/0xa8) from [<c0683718>] (ip_rcv_finish+0x418/0x43c)
        [<c0683718>] (ip_rcv_finish+0x418/0x43c) from [<c062b1c4>] (__netif_receive_skb+0x4cc/0x598)
        [<c062b1c4>] (__netif_receive_skb+0x4cc/0x598) from [<c062b314>] (process_backlog+0x84/0x158)
        [<c062b314>] (process_backlog+0x84/0x158) from [<c062de84>] (net_rx_action+0x70/0x1dc)
        [<c062de84>] (net_rx_action+0x70/0x1dc) from [<c0088230>] (__do_softirq+0x11c/0x27c)
        [<c0088230>] (__do_softirq+0x11c/0x27c) from [<c008857c>] (do_softirq+0x44/0x50)
        [<c008857c>] (do_softirq+0x44/0x50) from [<c0088614>] (local_bh_enable_ip+0x8c/0xd0)
        [<c0088614>] (local_bh_enable_ip+0x8c/0xd0) from [<c06b0330>] (inet_stream_connect+0x164/0x298)
        [<c06b0330>] (inet_stream_connect+0x164/0x298) from [<c061d68c>] (sys_connect+0x88/0xc8)
        [<c061d68c>] (sys_connect+0x88/0xc8) from [<c000e340>] (ret_fast_syscall+0x0/0x30)
        Code: 2a000021 e59d2028 e59de01c e59f011c (e7824103)
        ---[ end trace da227214a82491bd ]---
        Kernel panic - not syncing: Fatal exception in interrupt
      
      This comes about because CPU1 is executing xt_replace_table in response
      to a setsockopt syscall, resulting in:
      
      	ret = xt_jumpstack_alloc(newinfo);
      		--> newinfo->jumpstack = kzalloc(size, GFP_KERNEL);
      
      	[...]
      
      	table->private = newinfo;
      	newinfo->initial_entries = private->initial_entries;
      
      Meanwhile, CPU0 is handling the network receive path and ends up in
      ipt_do_table, resulting in:
      
      	private = table->private;
      
      	[...]
      
      	jumpstack  = (struct ipt_entry **)private->jumpstack[cpu];
      
      On weakly ordered memory architectures, the writes to table->private
      and newinfo->jumpstack from CPU1 can be observed out of order by CPU0.
      Furthermore, on architectures which don't respect ordering of address
      dependencies (i.e. Alpha), the reads from CPU0 can also be re-ordered.
      
      This patch adds an smp_wmb() before the assignment to table->private
      (which is essentially publishing newinfo) to ensure that all writes to
      newinfo will be observed before plugging it into the table structure.
      A dependent-read barrier is also added on the consumer sides, to ensure
      the same ordering requirements are also respected there.
      
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reported-by: NWang, Yalin <Yalin.Wang@sonymobile.com>
      Tested-by: NWang, Yalin <Yalin.Wang@sonymobile.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      b416c144
    • H
      ipv6: probe routes asynchronous in rt6_probe · c2f17e82
      Hannes Frederic Sowa 提交于
      Routes need to be probed asynchronous otherwise the call stack gets
      exhausted when the kernel attemps to deliver another skb inline, like
      e.g. xt_TEE does, and we probe at the same time.
      
      We update neigh->updated still at once, otherwise we would send to
      many probes.
      
      Cc: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c2f17e82
    • E
      ipv6: sit: add GSO/TSO support · 61c1db7f
      Eric Dumazet 提交于
      Now ipv6_gso_segment() is stackable, its relatively easy to
      implement GSO/TSO support for SIT tunnels
      
      Performance results, when segmentation is done after tunnel
      device (as no NIC is yet enabled for TSO SIT support) :
      
      Before patch :
      
      lpq84:~# ./netperf -H 2002:af6:1153:: -Cc
      MIGRATED TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:1153:: () port 0 AF_INET6
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
       87380  16384  16384    10.00      3168.31   4.81     4.64     2.988   2.877
      
      After patch :
      
      lpq84:~# ./netperf -H 2002:af6:1153:: -Cc
      MIGRATED TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:1153:: () port 0 AF_INET6
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
       87380  16384  16384    10.00      5525.00   7.76     5.17     2.763   1.840
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61c1db7f
    • E
      ipv6: gso: make ipv6_gso_segment() stackable · d3e5e006
      Eric Dumazet 提交于
      In order to support GSO on SIT tunnels, we need to make
      inet_gso_segment() stackable.
      
      It should not assume network header starts right after mac
      header.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3e5e006