1. 26 11月, 2014 2 次提交
  2. 25 11月, 2014 1 次提交
  3. 24 11月, 2014 2 次提交
    • L
      ip_tunnel: the lack of vti_link_ops' dellink() cause kernel panic · 20ea60ca
      lucien 提交于
      Now the vti_link_ops do not point the .dellink, for fb tunnel device
      (ip_vti0), the net_device will be removed as the default .dellink is
      unregister_netdevice_queue,but the tunnel still in the tunnel list,
      then if we add a new vti tunnel, in ip_tunnel_find():
      
              hlist_for_each_entry_rcu(t, head, hash_node) {
                      if (local == t->parms.iph.saddr &&
                          remote == t->parms.iph.daddr &&
                          link == t->parms.link &&
      ==>                 type == t->dev->type &&
                          ip_tunnel_key_match(&t->parms, flags, key))
                              break;
              }
      
      the panic will happen, cause dev of ip_tunnel *t is null:
      [ 3835.072977] IP: [<ffffffffa04103fd>] ip_tunnel_find+0x9d/0xc0 [ip_tunnel]
      [ 3835.073008] PGD b2c21067 PUD b7277067 PMD 0
      [ 3835.073008] Oops: 0000 [#1] SMP
      .....
      [ 3835.073008] Stack:
      [ 3835.073008]  ffff8800b72d77f0 ffffffffa0411924 ffff8800bb956000 ffff8800b72d78e0
      [ 3835.073008]  ffff8800b72d78a0 0000000000000000 ffffffffa040d100 ffff8800b72d7858
      [ 3835.073008]  ffffffffa040b2e3 0000000000000000 0000000000000000 0000000000000000
      [ 3835.073008] Call Trace:
      [ 3835.073008]  [<ffffffffa0411924>] ip_tunnel_newlink+0x64/0x160 [ip_tunnel]
      [ 3835.073008]  [<ffffffffa040b2e3>] vti_newlink+0x43/0x70 [ip_vti]
      [ 3835.073008]  [<ffffffff8150d4da>] rtnl_newlink+0x4fa/0x5f0
      [ 3835.073008]  [<ffffffff812f68bb>] ? nla_strlcpy+0x5b/0x70
      [ 3835.073008]  [<ffffffff81508fb0>] ? rtnl_link_ops_get+0x40/0x60
      [ 3835.073008]  [<ffffffff8150d11f>] ? rtnl_newlink+0x13f/0x5f0
      [ 3835.073008]  [<ffffffff81509cf4>] rtnetlink_rcv_msg+0xa4/0x270
      [ 3835.073008]  [<ffffffff8126adf5>] ? sock_has_perm+0x75/0x90
      [ 3835.073008]  [<ffffffff81509c50>] ? rtnetlink_rcv+0x30/0x30
      [ 3835.073008]  [<ffffffff81529e39>] netlink_rcv_skb+0xa9/0xc0
      [ 3835.073008]  [<ffffffff81509c48>] rtnetlink_rcv+0x28/0x30
      ....
      
      modprobe ip_vti
      ip link del ip_vti0 type vti
      ip link add ip_vti0 type vti
      rmmod ip_vti
      
      do that one or more times, kernel will panic.
      
      fix it by assigning ip_tunnel_dellink to vti_link_ops' dellink, in
      which we skip the unregister of fb tunnel device. do the same on ip6_vti.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      20ea60ca
    • A
      ipv6: Do not treat a GSO_TCPV4 request from UDP tunnel over IPv6 as invalid · b6fef4c6
      Alexander Duyck 提交于
      This patch adds SKB_GSO_TCPV4 to the list of supported GSO types handled by
      the IPv6 GSO offloads.  Without this change VXLAN tunnels running over IPv6
      do not currently handle IPv4 TCP TSO requests correctly and end up handing
      the non-segmented frame off to the device.
      
      Below is the before and after for a simple netperf TCP_STREAM test between
      two endpoints tunneling IPv4 over a VXLAN tunnel running on IPv6 on top of
      a 1Gb/s network adapter.
      
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.29       0.88      Before
       87380  16384  16384    10.03     895.69      After
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6fef4c6
  4. 20 11月, 2014 1 次提交
  5. 17 11月, 2014 1 次提交
    • D
      ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs · feb91a02
      Daniel Borkmann 提交于
      It has been reported that generating an MLD listener report on
      devices with large MTUs (e.g. 9000) and a high number of IPv6
      addresses can trigger a skb_over_panic():
      
      skbuff: skb_over_panic: text:ffffffff80612a5d len:3776 put:20
      head:ffff88046d751000 data:ffff88046d751010 tail:0xed0 end:0xec0
      dev:port1
       ------------[ cut here ]------------
      kernel BUG at net/core/skbuff.c:100!
      invalid opcode: 0000 [#1] SMP
      Modules linked in: ixgbe(O)
      CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 3.14.23+ #4
      [...]
      Call Trace:
       <IRQ>
       [<ffffffff80578226>] ? skb_put+0x3a/0x3b
       [<ffffffff80612a5d>] ? add_grhead+0x45/0x8e
       [<ffffffff80612e3a>] ? add_grec+0x394/0x3d4
       [<ffffffff80613222>] ? mld_ifc_timer_expire+0x195/0x20d
       [<ffffffff8061308d>] ? mld_dad_timer_expire+0x45/0x45
       [<ffffffff80255b5d>] ? call_timer_fn.isra.29+0x12/0x68
       [<ffffffff80255d16>] ? run_timer_softirq+0x163/0x182
       [<ffffffff80250e6f>] ? __do_softirq+0xe0/0x21d
       [<ffffffff8025112b>] ? irq_exit+0x4e/0xd3
       [<ffffffff802214bb>] ? smp_apic_timer_interrupt+0x3b/0x46
       [<ffffffff8063f10a>] ? apic_timer_interrupt+0x6a/0x70
      
      mld_newpack() skb allocations are usually requested with dev->mtu
      in size, since commit 72e09ad1 ("ipv6: avoid high order allocations")
      we have changed the limit in order to be less likely to fail.
      
      However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb)
      macros, which determine if we may end up doing an skb_put() for
      adding another record. To avoid possible fragmentation, we check
      the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong
      assumption as the actual max allocation size can be much smaller.
      
      The IGMP case doesn't have this issue as commit 57e1ab6e
      ("igmp: refine skb allocations") stores the allocation size in
      the cb[].
      
      Set a reserved_tailroom to make it fit into the MTU and use
      skb_availroom() helper instead. This also allows to get rid of
      igmp_skb_size().
      Reported-by: NWei Liu <lw1a2.jing@gmail.com>
      Fixes: 72e09ad1 ("ipv6: avoid high order allocations")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: David L Stevens <david.stevens@oracle.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      feb91a02
  6. 11 11月, 2014 1 次提交
  7. 04 11月, 2014 4 次提交
  8. 31 10月, 2014 2 次提交
  9. 30 10月, 2014 1 次提交
  10. 23 10月, 2014 2 次提交
  11. 21 10月, 2014 1 次提交
    • F
      net: gso: use feature flag argument in all protocol gso handlers · 1e16aa3d
      Florian Westphal 提交于
      skb_gso_segment() has a 'features' argument representing offload features
      available to the output path.
      
      A few handlers, e.g. GRE, instead re-fetch the features of skb->dev and use
      those instead of the provided ones when handing encapsulation/tunnels.
      
      Depending on dev->hw_enc_features of the output device skb_gso_segment() can
      then return NULL even when the caller has disabled all GSO feature bits,
      as segmentation of inner header thinks device will take care of segmentation.
      
      This e.g. affects the tbf scheduler, which will silently drop GRE-encap GSO skbs
      that did not fit the remaining token quota as the segmentation does not work
      when device supports corresponding hw offload capabilities.
      
      Cc: Pravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e16aa3d
  12. 19 10月, 2014 2 次提交
  13. 18 10月, 2014 1 次提交
  14. 15 10月, 2014 1 次提交
  15. 14 10月, 2014 1 次提交
    • P
      netfilter: nf_tables: restrict nat/masq expressions to nat chain type · 7210e4e3
      Pablo Neira Ayuso 提交于
      This adds the missing validation code to avoid the use of nat/masq from
      non-nat chains. The validation assumes two possible configuration
      scenarios:
      
      1) Use of nat from base chain that is not of nat type. Reject this
         configuration from the nft_*_init() path of the expression.
      
      2) Use of nat from non-base chain. In this case, we have to wait until
         the non-base chain is referenced by at least one base chain via
         jump/goto. This is resolved from the nft_*_validate() path which is
         called from nf_tables_check_loops().
      
      The user gets an -EOPNOTSUPP in both cases.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      7210e4e3
  16. 11 10月, 2014 1 次提交
  17. 08 10月, 2014 2 次提交
    • D
      ipv6: Do not warn for informational ICMP messages, regardless of type. · ea85a0a2
      David S. Miller 提交于
      There is no reason to emit a log message for these.
      
      Based upon a suggestion from Hannes Frederic Sowa.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      ea85a0a2
    • E
      net: better IFF_XMIT_DST_RELEASE support · 02875878
      Eric Dumazet 提交于
      Testing xmit_more support with netperf and connected UDP sockets,
      I found strange dst refcount false sharing.
      
      Current handling of IFF_XMIT_DST_RELEASE is not optimal.
      
      Dropping dst in validate_xmit_skb() is certainly too late in case
      packet was queued by cpu X but dequeued by cpu Y
      
      The logical point to take care of drop/force is in __dev_queue_xmit()
      before even taking qdisc lock.
      
      As Julian Anastasov pointed out, need for skb_dst() might come from some
      packet schedulers or classifiers.
      
      This patch adds new helper to cleanly express needs of various drivers
      or qdiscs/classifiers.
      
      Drivers that need skb_dst() in their ndo_start_xmit() should call
      following helper in their setup instead of the prior :
      
      	dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
      ->
      	netif_keep_dst(dev);
      
      Instead of using a single bit, we use two bits, one being
      eventually rebuilt in bonding/team drivers.
      
      The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being
      rebuilt in bonding/team. Eventually, we could add something
      smarter later.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02875878
  18. 07 10月, 2014 5 次提交
  19. 05 10月, 2014 1 次提交
  20. 04 10月, 2014 1 次提交
    • T
      fou: eliminate IPv4,v6 specific GRO functions · efc98d08
      Tom Herbert 提交于
      This patch removes fou[46]_gro_receive and fou[46]_gro_complete
      functions. The v4 or v6 variants were chosen for the UDP offloads
      based on the address family of the socket this is not necessary
      or correct. Alternatively, this patch adds is_ipv6 to napi_gro_skb.
      This is set in udp6_gro_receive and unset in udp4_gro_receive. In
      fou_gro_receive the value is used to select the correct inet_offloads
      for the protocol of the outer IP header.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      efc98d08
  21. 03 10月, 2014 3 次提交
  22. 02 10月, 2014 3 次提交
    • T
      gre: Set inner protocol in v4 and v6 GRE transmit · 54bc9bac
      Tom Herbert 提交于
      Call skb_set_inner_protocol to set inner Ethernet protocol to
      protocol being encapsulation by GRE before tunnel_xmit. This is
      needed for GSO if UDP encapsulation (fou) is being done.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54bc9bac
    • T
      sit: Set inner IP protocol in sit · 469471cd
      Tom Herbert 提交于
      Call skb_set_inner_ipproto to set inner IP protocol to IPPROTO_IPV6
      before tunnel_xmit. This is needed if UDP encapsulation (fou) is
      being done.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      469471cd
    • T
      udp: Generalize skb_udp_segment · 8bce6d7d
      Tom Herbert 提交于
      skb_udp_segment is the function called from udp4_ufo_fragment to
      segment a UDP tunnel packet. This function currently assumes
      segmentation is transparent Ethernet bridging (i.e. VXLAN
      encapsulation). This patch generalizes the function to
      operate on either Ethertype or IP protocol.
      
      The inner_protocol field must be set to the protocol of the inner
      header. This can now be either an Ethertype or an IP protocol
      (in a union). A new flag in the skbuff indicates which type is
      effective. skb_set_inner_protocol and skb_set_inner_ipproto
      helper functions were added to set the inner_protocol. These
      functions are called from the point where the tunnel encapsulation
      is occuring.
      
      When skb_udp_tunnel_segment is called, the function to segment the
      inner packet is selected based on the inner IP or Ethertype. In the
      case of an IP protocol encapsulation, the function is derived from
      inet[6]_offloads. In the case of Ethertype, skb->protocol is
      set to the inner_protocol and skb_mac_gso_segment is called. (GRE
      currently does this, but it might be possible to lookup the protocol
      in offload_base and call the appropriate segmenation function
      directly).
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8bce6d7d
  23. 01 10月, 2014 1 次提交
    • H
      ipv6: remove rt6i_genid · 705f1c86
      Hannes Frederic Sowa 提交于
      Eric Dumazet noticed that all no-nonexthop or no-gateway routes which
      are already marked DST_HOST (e.g. input routes routes) will always be
      invalidated during sk_dst_check. Thus per-socket dst caching absolutely
      had no effect and early demuxing had no effect.
      
      Thus this patch removes rt6i_genid: fn_sernum already gets modified during
      add operations, so we only must ensure we mutate fn_sernum during ipv6
      address remove operations. This is a fairly cost extensive operations,
      but address removal should not happen that often. Also our mtu update
      functions do the same and we heard no complains so far. xfrm policy
      changes also cause a call into fib6_flush_trees. Also plug a hole in
      rt6_info (no cacheline changes).
      
      I verified via tracing that this change has effect.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Cc: Martin Lau <kafai@fb.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      705f1c86