1. 30 10月, 2013 1 次提交
  2. 29 10月, 2013 2 次提交
  3. 28 10月, 2013 4 次提交
  4. 26 10月, 2013 2 次提交
    • H
      ipv6: ip6_dst_check needs to check for expired dst_entries · e3bc10bd
      Hannes Frederic Sowa 提交于
      On receiving a packet too big icmp error we check if our current cached
      dst_entry in the socket is still valid. This validation check did not
      care about the expiration of the (cached) route.
      
      The error path I traced down:
      The socket receives a packet too big mtu notification. It still has a
      valid dst_entry and thus issues the ip6_rt_pmtu_update on this dst_entry,
      setting RTF_EXPIRE and updates the dst.expiration value (which could
      fail because of not up-to-date expiration values, see previous patch).
      
      In some seldom cases we race with a) the ip6_fib gc or b) another routing
      lookup which would result in a recreation of the cached rt6_info from its
      parent non-cached rt6_info. While copying the rt6_info we reinitialize the
      metrics store by copying it over from the parent thus invalidating the
      just installed pmtu update (both dsts use the same key to the inetpeer
      storage). The dst_entry with the just invalidated metrics data would
      just get its RTF_EXPIRES flag cleared and would continue to stay valid
      for the socket.
      
      We should have not issued the pmtu update on the already expired dst_entry
      in the first placed. By checking the expiration on the dst entry and
      doing a relookup in case it is out of date we close the race because
      we would install a new rt6_info into the fib before we issue the pmtu
      update, thus closing this race.
      
      Not reliably updating the dst.expire value was fixed by the patch "ipv6:
      reset dst.expires value when clearing expire flag".
      Reported-by: NSteinar H. Gunderson <sgunderson@bigfoot.com>
      Reported-by: NValentijn Sessink <valentyn@blub.net>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Tested-by: NValentijn Sessink <valentyn@blub.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3bc10bd
    • A
      netpoll: fix rx_hook() interface by passing the skb · 8fb479a4
      Antonio Quartulli 提交于
      Right now skb->data is passed to rx_hook() even if the skb
      has not been linearised and without giving rx_hook() a way
      to linearise it.
      
      Change the rx_hook() interface and make it accept the skb
      and the offset to the UDP payload as arguments. rx_hook() is
      also renamed to rx_skb_hook() to ensure that out of the tree
      users notice the API change.
      
      In this way any rx_skb_hook() implementation can perform all
      the needed operations to properly (and safely) access the
      skb data.
      Signed-off-by: NAntonio Quartulli <antonio@meshcoding.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8fb479a4
  5. 24 10月, 2013 1 次提交
  6. 23 10月, 2013 1 次提交
    • L
      Revert "bridge: only expire the mdb entry when query is received" · 454594f3
      Linus Lüssing 提交于
      While this commit was a good attempt to fix issues occuring when no
      multicast querier is present, this commit still has two more issues:
      
      1) There are cases where mdb entries do not expire even if there is a
      querier present. The bridge will unnecessarily continue flooding
      multicast packets on the according ports.
      
      2) Never removing an mdb entry could be exploited for a Denial of
      Service by an attacker on the local link, slowly, but steadily eating up
      all memory.
      
      Actually, this commit became obsolete with
      "bridge: disable snooping if there is no querier" (b00589af)
      which included fixes for a few more cases.
      
      Therefore reverting the following commits (the commit stated in the
      commit message plus three of its follow up fixes):
      
      ====================
      Revert "bridge: update mdb expiration timer upon reports."
      This reverts commit f144febd.
      Revert "bridge: do not call setup_timer() multiple times"
      This reverts commit 1faabf2a.
      Revert "bridge: fix some kernel warning in multicast timer"
      This reverts commit c7e8e8a8.
      Revert "bridge: only expire the mdb entry when query is received"
      This reverts commit 9f00b2e7.
      ====================
      
      CC: Cong Wang <amwang@redhat.com>
      Signed-off-by: NLinus Lüssing <linus.luessing@web.de>
      Reviewed-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      454594f3
  7. 22 10月, 2013 5 次提交
    • W
      netfilter: x_tables: fix ordering of jumpstack allocation and table update · b416c144
      Will Deacon 提交于
      During kernel stability testing on an SMP ARMv7 system, Yalin Wang
      reported the following panic from the netfilter code:
      
        1fe0: 0000001c 5e2d3b10 4007e779 4009e110 60000010 00000032 ff565656 ff545454
        [<c06c48dc>] (ipt_do_table+0x448/0x584) from [<c0655ef0>] (nf_iterate+0x48/0x7c)
        [<c0655ef0>] (nf_iterate+0x48/0x7c) from [<c0655f7c>] (nf_hook_slow+0x58/0x104)
        [<c0655f7c>] (nf_hook_slow+0x58/0x104) from [<c0683bbc>] (ip_local_deliver+0x88/0xa8)
        [<c0683bbc>] (ip_local_deliver+0x88/0xa8) from [<c0683718>] (ip_rcv_finish+0x418/0x43c)
        [<c0683718>] (ip_rcv_finish+0x418/0x43c) from [<c062b1c4>] (__netif_receive_skb+0x4cc/0x598)
        [<c062b1c4>] (__netif_receive_skb+0x4cc/0x598) from [<c062b314>] (process_backlog+0x84/0x158)
        [<c062b314>] (process_backlog+0x84/0x158) from [<c062de84>] (net_rx_action+0x70/0x1dc)
        [<c062de84>] (net_rx_action+0x70/0x1dc) from [<c0088230>] (__do_softirq+0x11c/0x27c)
        [<c0088230>] (__do_softirq+0x11c/0x27c) from [<c008857c>] (do_softirq+0x44/0x50)
        [<c008857c>] (do_softirq+0x44/0x50) from [<c0088614>] (local_bh_enable_ip+0x8c/0xd0)
        [<c0088614>] (local_bh_enable_ip+0x8c/0xd0) from [<c06b0330>] (inet_stream_connect+0x164/0x298)
        [<c06b0330>] (inet_stream_connect+0x164/0x298) from [<c061d68c>] (sys_connect+0x88/0xc8)
        [<c061d68c>] (sys_connect+0x88/0xc8) from [<c000e340>] (ret_fast_syscall+0x0/0x30)
        Code: 2a000021 e59d2028 e59de01c e59f011c (e7824103)
        ---[ end trace da227214a82491bd ]---
        Kernel panic - not syncing: Fatal exception in interrupt
      
      This comes about because CPU1 is executing xt_replace_table in response
      to a setsockopt syscall, resulting in:
      
      	ret = xt_jumpstack_alloc(newinfo);
      		--> newinfo->jumpstack = kzalloc(size, GFP_KERNEL);
      
      	[...]
      
      	table->private = newinfo;
      	newinfo->initial_entries = private->initial_entries;
      
      Meanwhile, CPU0 is handling the network receive path and ends up in
      ipt_do_table, resulting in:
      
      	private = table->private;
      
      	[...]
      
      	jumpstack  = (struct ipt_entry **)private->jumpstack[cpu];
      
      On weakly ordered memory architectures, the writes to table->private
      and newinfo->jumpstack from CPU1 can be observed out of order by CPU0.
      Furthermore, on architectures which don't respect ordering of address
      dependencies (i.e. Alpha), the reads from CPU0 can also be re-ordered.
      
      This patch adds an smp_wmb() before the assignment to table->private
      (which is essentially publishing newinfo) to ensure that all writes to
      newinfo will be observed before plugging it into the table structure.
      A dependent-read barrier is also added on the consumer sides, to ensure
      the same ordering requirements are also respected there.
      
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reported-by: NWang, Yalin <Yalin.Wang@sonymobile.com>
      Tested-by: NWang, Yalin <Yalin.Wang@sonymobile.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      b416c144
    • N
      tcp: initialize passive-side sk_pacing_rate after 3WHS · 02cf4ebd
      Neal Cardwell 提交于
      For passive TCP connections, upon receiving the ACK that completes the
      3WHS, make sure we set our pacing rate after we get our first RTT
      sample.
      
      On passive TCP connections, when we receive the ACK completing the
      3WHS we do not take an RTT sample in tcp_ack(), but rather in
      tcp_synack_rtt_meas(). So upon receiving the ACK that completes the
      3WHS, tcp_ack() leaves sk_pacing_rate at its initial value.
      
      Originally the initial sk_pacing_rate value was 0, so passive-side
      connections defaulted to sysctl_tcp_min_tso_segs (2 segs) in skbuffs
      made in the first RTT. With a default initial cwnd of 10 packets, this
      happened to be correct for RTTs 5ms or bigger, so it was hard to
      see problems in WAN or emulated WAN testing.
      
      Since 7eec4174 ("pkt_sched: fq: fix non TCP flows pacing"), the
      initial sk_pacing_rate is 0xffffffff. So after that change, passive
      TCP connections were keeping this value (and using large numbers of
      segments per skbuff) until receiving an ACK for data.
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02cf4ebd
    • H
      ipv6: probe routes asynchronous in rt6_probe · c2f17e82
      Hannes Frederic Sowa 提交于
      Routes need to be probed asynchronous otherwise the call stack gets
      exhausted when the kernel attemps to deliver another skb inline, like
      e.g. xt_TEE does, and we probe at the same time.
      
      We update neigh->updated still at once, otherwise we would send to
      many probes.
      
      Cc: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c2f17e82
    • J
      netfilter: nf_conntrack: fix rt6i_gateway checks for H.323 helper · 56e42441
      Julian Anastasov 提交于
      Now when rt6_nexthop() can return nexthop address we can use it
      for proper nexthop comparison of directly connected destinations.
      For more information refer to commit bbb5823c
      ("netfilter: nf_conntrack: fix rt_gateway checks for H.323 helper").
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      56e42441
    • J
      ipv6: fill rt6i_gateway with nexthop address · 550bab42
      Julian Anastasov 提交于
      Make sure rt6i_gateway contains nexthop information in
      all routes returned from lookup or when routes are directly
      attached to skb for generated ICMP packets.
      
      The effect of this patch should be a faster version of
      rt6_nexthop() and the consideration of local addresses as
      nexthop.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      550bab42
  8. 20 10月, 2013 4 次提交
    • J
      ip_output: do skb ufo init for peeked non ufo skb as well · e93b7d74
      Jiri Pirko 提交于
      Now, if user application does:
      sendto len<mtu flag MSG_MORE
      sendto len>mtu flag 0
      The skb is not treated as fragmented one because it is not initialized
      that way. So move the initialization to fix this.
      
      introduced by:
      commit e89e9cf5 "[IPv4/IPv6]: UFO Scatter-gather approach"
      Signed-off-by: NJiri Pirko <jiri@resnulli.us>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e93b7d74
    • J
      ip6_output: do skb ufo init for peeked non ufo skb as well · c547dbf5
      Jiri Pirko 提交于
      Now, if user application does:
      sendto len<mtu flag MSG_MORE
      sendto len>mtu flag 0
      The skb is not treated as fragmented one because it is not initialized
      that way. So move the initialization to fix this.
      
      introduced by:
      commit e89e9cf5 "[IPv4/IPv6]: UFO Scatter-gather approach"
      Signed-off-by: NJiri Pirko <jiri@resnulli.us>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c547dbf5
    • J
      udp6: respect IPV6_DONTFRAG sockopt in case there are pending frames · e36d3ff9
      Jiri Pirko 提交于
      if up->pending != 0 dontfrag is left with default value -1. That
      causes that application that do:
      sendto len>mtu flag MSG_MORE
      sendto len>mtu flag 0
      will receive EMSGSIZE errno as the result of the second sendto.
      
      This patch fixes it by respecting IPV6_DONTFRAG socket option.
      
      introduced by:
      commit 4b340ae2 "IPv6: Complete IPV6_DONTFRAG support"
      Signed-off-by: NJiri Pirko <jiri@resnulli.us>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e36d3ff9
    • D
      net: unix: inherit SOCK_PASS{CRED, SEC} flags from socket to fix race · 90c6bd34
      Daniel Borkmann 提交于
      In the case of credentials passing in unix stream sockets (dgram
      sockets seem not affected), we get a rather sparse race after
      commit 16e57262 ("af_unix: dont send SCM_CREDENTIALS by default").
      
      We have a stream server on receiver side that requests credential
      passing from senders (e.g. nc -U). Since we need to set SO_PASSCRED
      on each spawned/accepted socket on server side to 1 first (as it's
      not inherited), it can happen that in the time between accept() and
      setsockopt() we get interrupted, the sender is being scheduled and
      continues with passing data to our receiver. At that time SO_PASSCRED
      is neither set on sender nor receiver side, hence in cmsg's
      SCM_CREDENTIALS we get eventually pid:0, uid:65534, gid:65534
      (== overflow{u,g}id) instead of what we actually would like to see.
      
      On the sender side, here nc -U, the tests in maybe_add_creds()
      invoked through unix_stream_sendmsg() would fail, as at that exact
      time, as mentioned, the sender has neither SO_PASSCRED on his side
      nor sees it on the server side, and we have a valid 'other' socket
      in place. Thus, sender believes it would just look like a normal
      connection, not needing/requesting SO_PASSCRED at that time.
      
      As reverting 16e57262 would not be an option due to the significant
      performance regression reported when having creds always passed,
      one way/trade-off to prevent that would be to set SO_PASSCRED on
      the listener socket and allow inheriting these flags to the spawned
      socket on server side in accept(). It seems also logical to do so
      if we'd tell the listener socket to pass those flags onwards, and
      would fix the race.
      
      Before, strace:
      
      recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}],
              msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET,
              cmsg_type=SCM_CREDENTIALS{pid=0, uid=65534, gid=65534}},
              msg_flags=0}, 0) = 5
      
      After, strace:
      
      recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"blub\n", 4096}],
              msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET,
              cmsg_type=SCM_CREDENTIALS{pid=11580, uid=1000, gid=1000}},
              msg_flags=0}, 0) = 5
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90c6bd34
  9. 19 10月, 2013 4 次提交
  10. 18 10月, 2013 6 次提交
  11. 14 10月, 2013 2 次提交
    • J
      mac80211: fix crash if bitrate calculation goes wrong · d86aa4f8
      Johannes Berg 提交于
      If a frame's timestamp is calculated, and the bitrate
      calculation goes wrong and returns zero, the system
      will attempt to divide by zero and crash. Catch this
      case and print the rate information that the driver
      reported when this happens.
      
      Cc: stable@vger.kernel.org
      Reported-by: NThomas Lindroth <thomas.lindroth@gmail.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      d86aa4f8
    • J
      wireless: radiotap: fix parsing buffer overrun · f5563318
      Johannes Berg 提交于
      When parsing an invalid radiotap header, the parser can overrun
      the buffer that is passed in because it doesn't correctly check
       1) the minimum radiotap header size
       2) the space for extended bitmaps
      
      The first issue doesn't affect any in-kernel user as they all
      check the minimum size before calling the radiotap function.
      The second issue could potentially affect the kernel if an skb
      is passed in that consists only of the radiotap header with a
      lot of extended bitmaps that extend past the SKB. In that case
      a read-only buffer overrun by at most 4 bytes is possible.
      
      Fix this by adding the appropriate checks to the parser.
      
      Cc: stable@vger.kernel.org
      Reported-by: NEvan Huus <eapache@gmail.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      f5563318
  12. 12 10月, 2013 5 次提交
    • O
      ipv6: Initialize ip6_tnl.hlen in gre tunnel even if no route is found · bf581759
      Oussama Ghorbel 提交于
      The ip6_tnl.hlen (gre and ipv6 headers length) is independent from the
      outgoing interface, so it would be better to initialize it even when no
      route is found, otherwise its value will be zero.
      While I'm not sure if this could happen in real life, but doing that
      will avoid to call the skb_push function with a zero in ip6gre_header
      function.
      Suggested-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NOussama Ghorbel <ou.ghorbel@gmail.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf581759
    • S
      netem: free skb's in tree on reset · ff704050
      stephen hemminger 提交于
      Netem can leak memory because packets get stored in red-black
      tree and it is not cleared on reset.
      
      Reported by: Сергеев Сергей <adron@yapic.net>
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff704050
    • S
      netem: update backlog after drop · 638a52b8
      stephen hemminger 提交于
      When packet is dropped from rb-tree netem the backlog statistic should
      also be updated.
      Reported-by: NСергеев Сергей <adron@yapic.net>
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      638a52b8
    • E
      l2tp: must disable bh before calling l2tp_xmit_skb() · 455cc32b
      Eric Dumazet 提交于
      François Cachereul made a very nice bug report and suspected
      the bh_lock_sock() / bh_unlok_sock() pair used in l2tp_xmit_skb() from
      process context was not good.
      
      This problem was added by commit 6af88da1
      ("l2tp: Fix locking in l2tp_core.c").
      
      l2tp_eth_dev_xmit() runs from BH context, so we must disable BH
      from other l2tp_xmit_skb() users.
      
      [  452.060011] BUG: soft lockup - CPU#1 stuck for 23s! [accel-pppd:6662]
      [  452.061757] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core pppoe pppox
      ppp_generic slhc ipv6 ext3 mbcache jbd virtio_balloon xfs exportfs dm_mod
      virtio_blk ata_generic virtio_net floppy ata_piix libata virtio_pci virtio_ring virtio [last unloaded: scsi_wait_scan]
      [  452.064012] CPU 1
      [  452.080015] BUG: soft lockup - CPU#2 stuck for 23s! [accel-pppd:6643]
      [  452.080015] CPU 2
      [  452.080015]
      [  452.080015] Pid: 6643, comm: accel-pppd Not tainted 3.2.46.mini #1 Bochs Bochs
      [  452.080015] RIP: 0010:[<ffffffff81059f6c>]  [<ffffffff81059f6c>] do_raw_spin_lock+0x17/0x1f
      [  452.080015] RSP: 0018:ffff88007125fc18  EFLAGS: 00000293
      [  452.080015] RAX: 000000000000aba9 RBX: ffffffff811d0703 RCX: 0000000000000000
      [  452.080015] RDX: 00000000000000ab RSI: ffff8800711f6896 RDI: ffff8800745c8110
      [  452.080015] RBP: ffff88007125fc18 R08: 0000000000000020 R09: 0000000000000000
      [  452.080015] R10: 0000000000000000 R11: 0000000000000280 R12: 0000000000000286
      [  452.080015] R13: 0000000000000020 R14: 0000000000000240 R15: 0000000000000000
      [  452.080015] FS:  00007fdc0cc24700(0000) GS:ffff8800b6f00000(0000) knlGS:0000000000000000
      [  452.080015] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  452.080015] CR2: 00007fdb054899b8 CR3: 0000000074404000 CR4: 00000000000006a0
      [  452.080015] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  452.080015] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  452.080015] Process accel-pppd (pid: 6643, threadinfo ffff88007125e000, task ffff8800b27e6dd0)
      [  452.080015] Stack:
      [  452.080015]  ffff88007125fc28 ffffffff81256559 ffff88007125fc98 ffffffffa01b2bd1
      [  452.080015]  ffff88007125fc58 000000000000000c 00000000029490d0 0000009c71dbe25e
      [  452.080015]  000000000000005c 000000080000000e 0000000000000000 ffff880071170600
      [  452.080015] Call Trace:
      [  452.080015]  [<ffffffff81256559>] _raw_spin_lock+0xe/0x10
      [  452.080015]  [<ffffffffa01b2bd1>] l2tp_xmit_skb+0x189/0x4ac [l2tp_core]
      [  452.080015]  [<ffffffffa01c2d36>] pppol2tp_sendmsg+0x15e/0x19c [l2tp_ppp]
      [  452.080015]  [<ffffffff811c7872>] __sock_sendmsg_nosec+0x22/0x24
      [  452.080015]  [<ffffffff811c83bd>] sock_sendmsg+0xa1/0xb6
      [  452.080015]  [<ffffffff81254e88>] ? __schedule+0x5c1/0x616
      [  452.080015]  [<ffffffff8103c7c6>] ? __dequeue_signal+0xb7/0x10c
      [  452.080015]  [<ffffffff810bbd21>] ? fget_light+0x75/0x89
      [  452.080015]  [<ffffffff811c8444>] ? sockfd_lookup_light+0x20/0x56
      [  452.080015]  [<ffffffff811c9b34>] sys_sendto+0x10c/0x13b
      [  452.080015]  [<ffffffff8125cac2>] system_call_fastpath+0x16/0x1b
      [  452.080015] Code: 81 48 89 e5 72 0c 31 c0 48 81 ff 45 66 25 81 0f 92 c0 5d c3 55 b8 00 01 00 00 48 89 e5 f0 66 0f c1 07 0f b6 d4 38 d0 74 06 f3 90 <8a> 07 eb f6 5d c3 90 90 55 48 89 e5 9c 58 0f 1f 44 00 00 5d c3
      [  452.080015] Call Trace:
      [  452.080015]  [<ffffffff81256559>] _raw_spin_lock+0xe/0x10
      [  452.080015]  [<ffffffffa01b2bd1>] l2tp_xmit_skb+0x189/0x4ac [l2tp_core]
      [  452.080015]  [<ffffffffa01c2d36>] pppol2tp_sendmsg+0x15e/0x19c [l2tp_ppp]
      [  452.080015]  [<ffffffff811c7872>] __sock_sendmsg_nosec+0x22/0x24
      [  452.080015]  [<ffffffff811c83bd>] sock_sendmsg+0xa1/0xb6
      [  452.080015]  [<ffffffff81254e88>] ? __schedule+0x5c1/0x616
      [  452.080015]  [<ffffffff8103c7c6>] ? __dequeue_signal+0xb7/0x10c
      [  452.080015]  [<ffffffff810bbd21>] ? fget_light+0x75/0x89
      [  452.080015]  [<ffffffff811c8444>] ? sockfd_lookup_light+0x20/0x56
      [  452.080015]  [<ffffffff811c9b34>] sys_sendto+0x10c/0x13b
      [  452.080015]  [<ffffffff8125cac2>] system_call_fastpath+0x16/0x1b
      [  452.064012]
      [  452.064012] Pid: 6662, comm: accel-pppd Not tainted 3.2.46.mini #1 Bochs Bochs
      [  452.064012] RIP: 0010:[<ffffffff81059f6e>]  [<ffffffff81059f6e>] do_raw_spin_lock+0x19/0x1f
      [  452.064012] RSP: 0018:ffff8800b6e83ba0  EFLAGS: 00000297
      [  452.064012] RAX: 000000000000aaa9 RBX: ffff8800b6e83b40 RCX: 0000000000000002
      [  452.064012] RDX: 00000000000000aa RSI: 000000000000000a RDI: ffff8800745c8110
      [  452.064012] RBP: ffff8800b6e83ba0 R08: 000000000000c802 R09: 000000000000001c
      [  452.064012] R10: ffff880071096c4e R11: 0000000000000006 R12: ffff8800b6e83b18
      [  452.064012] R13: ffffffff8125d51e R14: ffff8800b6e83ba0 R15: ffff880072a589c0
      [  452.064012] FS:  00007fdc0b81e700(0000) GS:ffff8800b6e80000(0000) knlGS:0000000000000000
      [  452.064012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  452.064012] CR2: 0000000000625208 CR3: 0000000074404000 CR4: 00000000000006a0
      [  452.064012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  452.064012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  452.064012] Process accel-pppd (pid: 6662, threadinfo ffff88007129a000, task ffff8800744f7410)
      [  452.064012] Stack:
      [  452.064012]  ffff8800b6e83bb0 ffffffff81256559 ffff8800b6e83bc0 ffffffff8121c64a
      [  452.064012]  ffff8800b6e83bf0 ffffffff8121ec7a ffff880072a589c0 ffff880071096c62
      [  452.064012]  0000000000000011 ffffffff81430024 ffff8800b6e83c80 ffffffff8121f276
      [  452.064012] Call Trace:
      [  452.064012]  <IRQ>
      [  452.064012]  [<ffffffff81256559>] _raw_spin_lock+0xe/0x10
      [  452.064012]  [<ffffffff8121c64a>] spin_lock+0x9/0xb
      [  452.064012]  [<ffffffff8121ec7a>] udp_queue_rcv_skb+0x186/0x269
      [  452.064012]  [<ffffffff8121f276>] __udp4_lib_rcv+0x297/0x4ae
      [  452.064012]  [<ffffffff8121c178>] ? raw_rcv+0xe9/0xf0
      [  452.064012]  [<ffffffff8121f4a7>] udp_rcv+0x1a/0x1c
      [  452.064012]  [<ffffffff811fe385>] ip_local_deliver_finish+0x12b/0x1a5
      [  452.064012]  [<ffffffff811fe54e>] ip_local_deliver+0x53/0x84
      [  452.064012]  [<ffffffff811fe1d0>] ip_rcv_finish+0x2bc/0x2f3
      [  452.064012]  [<ffffffff811fe78f>] ip_rcv+0x210/0x269
      [  452.064012]  [<ffffffff8101911e>] ? kvm_clock_get_cycles+0x9/0xb
      [  452.064012]  [<ffffffff811d88cd>] __netif_receive_skb+0x3a5/0x3f7
      [  452.064012]  [<ffffffff811d8eba>] netif_receive_skb+0x57/0x5e
      [  452.064012]  [<ffffffff811cf30f>] ? __netdev_alloc_skb+0x1f/0x3b
      [  452.064012]  [<ffffffffa0049126>] virtnet_poll+0x4ba/0x5a4 [virtio_net]
      [  452.064012]  [<ffffffff811d9417>] net_rx_action+0x73/0x184
      [  452.064012]  [<ffffffffa01b2cc2>] ? l2tp_xmit_skb+0x27a/0x4ac [l2tp_core]
      [  452.064012]  [<ffffffff810343b9>] __do_softirq+0xc3/0x1a8
      [  452.064012]  [<ffffffff81013b56>] ? ack_APIC_irq+0x10/0x12
      [  452.064012]  [<ffffffff81256559>] ? _raw_spin_lock+0xe/0x10
      [  452.064012]  [<ffffffff8125e0ac>] call_softirq+0x1c/0x26
      [  452.064012]  [<ffffffff81003587>] do_softirq+0x45/0x82
      [  452.064012]  [<ffffffff81034667>] irq_exit+0x42/0x9c
      [  452.064012]  [<ffffffff8125e146>] do_IRQ+0x8e/0xa5
      [  452.064012]  [<ffffffff8125676e>] common_interrupt+0x6e/0x6e
      [  452.064012]  <EOI>
      [  452.064012]  [<ffffffff810b82a1>] ? kfree+0x8a/0xa3
      [  452.064012]  [<ffffffffa01b2cc2>] ? l2tp_xmit_skb+0x27a/0x4ac [l2tp_core]
      [  452.064012]  [<ffffffffa01b2c25>] ? l2tp_xmit_skb+0x1dd/0x4ac [l2tp_core]
      [  452.064012]  [<ffffffffa01c2d36>] pppol2tp_sendmsg+0x15e/0x19c [l2tp_ppp]
      [  452.064012]  [<ffffffff811c7872>] __sock_sendmsg_nosec+0x22/0x24
      [  452.064012]  [<ffffffff811c83bd>] sock_sendmsg+0xa1/0xb6
      [  452.064012]  [<ffffffff81254e88>] ? __schedule+0x5c1/0x616
      [  452.064012]  [<ffffffff8103c7c6>] ? __dequeue_signal+0xb7/0x10c
      [  452.064012]  [<ffffffff810bbd21>] ? fget_light+0x75/0x89
      [  452.064012]  [<ffffffff811c8444>] ? sockfd_lookup_light+0x20/0x56
      [  452.064012]  [<ffffffff811c9b34>] sys_sendto+0x10c/0x13b
      [  452.064012]  [<ffffffff8125cac2>] system_call_fastpath+0x16/0x1b
      [  452.064012] Code: 89 e5 72 0c 31 c0 48 81 ff 45 66 25 81 0f 92 c0 5d c3 55 b8 00 01 00 00 48 89 e5 f0 66 0f c1 07 0f b6 d4 38 d0 74 06 f3 90 8a 07 <eb> f6 5d c3 90 90 55 48 89 e5 9c 58 0f 1f 44 00 00 5d c3 55 48
      [  452.064012] Call Trace:
      [  452.064012]  <IRQ>  [<ffffffff81256559>] _raw_spin_lock+0xe/0x10
      [  452.064012]  [<ffffffff8121c64a>] spin_lock+0x9/0xb
      [  452.064012]  [<ffffffff8121ec7a>] udp_queue_rcv_skb+0x186/0x269
      [  452.064012]  [<ffffffff8121f276>] __udp4_lib_rcv+0x297/0x4ae
      [  452.064012]  [<ffffffff8121c178>] ? raw_rcv+0xe9/0xf0
      [  452.064012]  [<ffffffff8121f4a7>] udp_rcv+0x1a/0x1c
      [  452.064012]  [<ffffffff811fe385>] ip_local_deliver_finish+0x12b/0x1a5
      [  452.064012]  [<ffffffff811fe54e>] ip_local_deliver+0x53/0x84
      [  452.064012]  [<ffffffff811fe1d0>] ip_rcv_finish+0x2bc/0x2f3
      [  452.064012]  [<ffffffff811fe78f>] ip_rcv+0x210/0x269
      [  452.064012]  [<ffffffff8101911e>] ? kvm_clock_get_cycles+0x9/0xb
      [  452.064012]  [<ffffffff811d88cd>] __netif_receive_skb+0x3a5/0x3f7
      [  452.064012]  [<ffffffff811d8eba>] netif_receive_skb+0x57/0x5e
      [  452.064012]  [<ffffffff811cf30f>] ? __netdev_alloc_skb+0x1f/0x3b
      [  452.064012]  [<ffffffffa0049126>] virtnet_poll+0x4ba/0x5a4 [virtio_net]
      [  452.064012]  [<ffffffff811d9417>] net_rx_action+0x73/0x184
      [  452.064012]  [<ffffffffa01b2cc2>] ? l2tp_xmit_skb+0x27a/0x4ac [l2tp_core]
      [  452.064012]  [<ffffffff810343b9>] __do_softirq+0xc3/0x1a8
      [  452.064012]  [<ffffffff81013b56>] ? ack_APIC_irq+0x10/0x12
      [  452.064012]  [<ffffffff81256559>] ? _raw_spin_lock+0xe/0x10
      [  452.064012]  [<ffffffff8125e0ac>] call_softirq+0x1c/0x26
      [  452.064012]  [<ffffffff81003587>] do_softirq+0x45/0x82
      [  452.064012]  [<ffffffff81034667>] irq_exit+0x42/0x9c
      [  452.064012]  [<ffffffff8125e146>] do_IRQ+0x8e/0xa5
      [  452.064012]  [<ffffffff8125676e>] common_interrupt+0x6e/0x6e
      [  452.064012]  <EOI>  [<ffffffff810b82a1>] ? kfree+0x8a/0xa3
      [  452.064012]  [<ffffffffa01b2cc2>] ? l2tp_xmit_skb+0x27a/0x4ac [l2tp_core]
      [  452.064012]  [<ffffffffa01b2c25>] ? l2tp_xmit_skb+0x1dd/0x4ac [l2tp_core]
      [  452.064012]  [<ffffffffa01c2d36>] pppol2tp_sendmsg+0x15e/0x19c [l2tp_ppp]
      [  452.064012]  [<ffffffff811c7872>] __sock_sendmsg_nosec+0x22/0x24
      [  452.064012]  [<ffffffff811c83bd>] sock_sendmsg+0xa1/0xb6
      [  452.064012]  [<ffffffff81254e88>] ? __schedule+0x5c1/0x616
      [  452.064012]  [<ffffffff8103c7c6>] ? __dequeue_signal+0xb7/0x10c
      [  452.064012]  [<ffffffff810bbd21>] ? fget_light+0x75/0x89
      [  452.064012]  [<ffffffff811c8444>] ? sockfd_lookup_light+0x20/0x56
      [  452.064012]  [<ffffffff811c9b34>] sys_sendto+0x10c/0x13b
      [  452.064012]  [<ffffffff8125cac2>] system_call_fastpath+0x16/0x1b
      Reported-by: NFrançois Cachereul <f.cachereul@alphalink.fr>
      Tested-by: NFrançois Cachereul <f.cachereul@alphalink.fr>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: James Chapman <jchapman@katalix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      455cc32b
    • C
      vti: get rid of nf mark rule in prerouting · 7263a518
      Christophe Gouault 提交于
      This patch fixes and improves the use of vti interfaces (while
      lightly changing the way of configuring them).
      
      Currently:
      
      - it is necessary to identify and mark inbound IPsec
        packets destined to each vti interface, via netfilter rules in
        the mangle table at prerouting hook.
      
      - the vti module cannot retrieve the right tunnel in input since
        commit b9959fd3: vti tunnels all have an i_key, but the tunnel lookup
        is done with flag TUNNEL_NO_KEY, so there no chance to retrieve them.
      
      - the i_key is used by the outbound processing as a mark to lookup
        for the right SP and SA bundle.
      
      This patch uses the o_key to store the vti mark (instead of i_key) and
      enables:
      
      - to avoid the need for previously marking the inbound skbuffs via a
        netfilter rule.
      - to properly retrieve the right tunnel in input, only based on the IPsec
        packet outer addresses.
      - to properly perform an inbound policy check (using the tunnel o_key
        as a mark).
      - to properly perform an outbound SPD and SAD lookup (using the tunnel
        o_key as a mark).
      - to keep the current mark of the skbuff. The skbuff mark is neither
        used nor changed by the vti interface. Only the vti interface o_key
        is used.
      
      SAs have a wildcard mark.
      SPs have a mark equal to the vti interface o_key.
      
      The vti interface must be created as follows (i_key = 0, o_key = mark):
      
         ip link add vti1 mode vti local 1.1.1.1 remote 2.2.2.2 okey 1
      
      The SPs attached to vti1 must be created as follows (mark = vti1 o_key):
      
         ip xfrm policy add dir out mark 1 tmpl src 1.1.1.1 dst 2.2.2.2 \
            proto esp mode tunnel
         ip xfrm policy add dir in  mark 1 tmpl src 2.2.2.2 dst 1.1.1.1 \
            proto esp mode tunnel
      
      The SAs are created with the default wildcard mark. There is no
      distinction between global vs. vti SAs. Just their addresses will
      possibly link them to a vti interface:
      
         ip xfrm state add src 1.1.1.1 dst 2.2.2.2 proto esp spi 1000 mode tunnel \
                       enc "cbc(aes)" "azertyuiopqsdfgh"
      
         ip xfrm state add src 2.2.2.2 dst 1.1.1.1 proto esp spi 2000 mode tunnel \
                       enc "cbc(aes)" "sqbdhgqsdjqjsdfh"
      
      To avoid matching "global" (not vti) SPs in vti interfaces, global SPs
      should no use the default wildcard mark, but explicitly match mark 0.
      
      To avoid a double SPD lookup in input and output (in global and vti SPDs),
      the NOPOLICY and NOXFRM options should be set on the vti interfaces:
      
         echo 1 > /proc/sys/net/ipv4/conf/vti1/disable_policy
         echo 1 > /proc/sys/net/ipv4/conf/vti1/disable_xfrm
      
      The outgoing traffic is steered to vti1 by a route via the vti interface:
      
         ip route add 192.168.0.0/16 dev vti1
      
      The incoming IPsec traffic is steered to vti1 because its outer addresses
      match the vti1 tunnel configuration.
      Signed-off-by: NChristophe Gouault <christophe.gouault@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7263a518
  13. 11 10月, 2013 1 次提交
    • V
      bridge: update mdb expiration timer upon reports. · f144febd
      Vlad Yasevich 提交于
      commit 9f00b2e7
      	bridge: only expire the mdb entry when query is received
      changed the mdb expiration timer to be armed only when QUERY is
      received.  Howerver, this causes issues in an environment where
      the multicast server socket comes and goes very fast while a client
      is trying to send traffic to it.
      
      The root cause is a race where a sequence of LEAVE followed by REPORT
      messages can race against QUERY messages generated in response to LEAVE.
      The QUERY ends up starting the expiration timer, and that timer can
      potentially expire after the new REPORT message has been received signaling
      the new join operation.  This leads to a significant drop in multicast
      traffic and possible complete stall.
      
      The solution is to have REPORT messages update the expiration timer
      on entries that already exist.
      
      CC: Cong Wang <xiyou.wangcong@gmail.com>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      CC: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f144febd
  14. 10 10月, 2013 2 次提交