1. 29 5月, 2018 5 次提交
  2. 25 5月, 2018 1 次提交
  3. 24 5月, 2018 5 次提交
    • M
      ipv6: sr: Add seg6local action End.BPF · 004d4b27
      Mathieu Xhonneux 提交于
      This patch adds the End.BPF action to the LWT seg6local infrastructure.
      This action works like any other seg6local End action, meaning that an IPv6
      header with SRH is needed, whose DA has to be equal to the SID of the
      action. It will also advance the SRH to the next segment, the BPF program
      does not have to take care of this.
      
      Since the BPF program may not be a source of instability in the kernel, it
      is important to ensure that the integrity of the packet is maintained
      before yielding it back to the IPv6 layer. The hook hence keeps track if
      the SRH has been altered through the helpers, and re-validates its
      content if needed with seg6_validate_srh. The state kept for validation is
      stored in a per-CPU buffer. The BPF program is not allowed to directly
      write into the packet, and only some fields of the SRH can be altered
      through the helper bpf_lwt_seg6_store_bytes.
      
      Performances profiling has shown that the SRH re-validation does not induce
      a significant overhead. If the altered SRH is deemed as invalid, the packet
      is dropped.
      
      This validation is also done before executing any action through
      bpf_lwt_seg6_action, and will not be performed again if the SRH is not
      modified after calling the action.
      
      The BPF program may return 3 types of return codes:
          - BPF_OK: the End.BPF action will look up the next destination through
                   seg6_lookup_nexthop.
          - BPF_REDIRECT: if an action has been executed through the
                bpf_lwt_seg6_action helper, the BPF program should return this
                value, as the skb's destination is already set and the default
                lookup should not be performed.
          - BPF_DROP : the packet will be dropped.
      Signed-off-by: NMathieu Xhonneux <m.xhonneux@gmail.com>
      Acked-by: NDavid Lebrun <dlebrun@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      004d4b27
    • M
      bpf: Add IPv6 Segment Routing helpers · fe94cc29
      Mathieu Xhonneux 提交于
      The BPF seg6local hook should be powerful enough to enable users to
      implement most of the use-cases one could think of. After some thinking,
      we figured out that the following actions should be possible on a SRv6
      packet, requiring 3 specific helpers :
          - bpf_lwt_seg6_store_bytes: Modify non-sensitive fields of the SRH
          - bpf_lwt_seg6_adjust_srh: Allow to grow or shrink a SRH
                                     (to add/delete TLVs)
          - bpf_lwt_seg6_action: Apply some SRv6 network programming actions
                                 (specifically End.X, End.T, End.B6 and
                                  End.B6.Encap)
      
      The specifications of these helpers are provided in the patch (see
      include/uapi/linux/bpf.h).
      
      The non-sensitive fields of the SRH are the following : flags, tag and
      TLVs. The other fields can not be modified, to maintain the SRH
      integrity. Flags, tag and TLVs can easily be modified as their validity
      can be checked afterwards via seg6_validate_srh. It is not allowed to
      modify the segments directly. If one wants to add segments on the path,
      he should stack a new SRH using the End.B6 action via
      bpf_lwt_seg6_action.
      
      Growing, shrinking or editing TLVs via the helpers will flag the SRH as
      invalid, and it will have to be re-validated before re-entering the IPv6
      layer. This flag is stored in a per-CPU buffer, along with the current
      header length in bytes.
      
      Storing the SRH len in bytes in the control block is mandatory when using
      bpf_lwt_seg6_adjust_srh. The Header Ext. Length field contains the SRH
      len rounded to 8 bytes (a padding TLV can be inserted to ensure the 8-bytes
      boundary). When adding/deleting TLVs within the BPF program, the SRH may
      temporary be in an invalid state where its length cannot be rounded to 8
      bytes without remainder, hence the need to store the length in bytes
      separately. The caller of the BPF program can then ensure that the SRH's
      final length is valid using this value. Again, a final SRH modified by a
      BPF program which doesn’t respect the 8-bytes boundary will be discarded
      as it will be considered as invalid.
      
      Finally, a fourth helper is provided, bpf_lwt_push_encap, which is
      available from the LWT BPF IN hook, but not from the seg6local BPF one.
      This helper allows to encapsulate a Segment Routing Header (either with
      a new outer IPv6 header, or by inlining it directly in the existing IPv6
      header) into a non-SRv6 packet. This helper is required if we want to
      offer the possibility to dynamically encapsulate a SRH for non-SRv6 packet,
      as the BPF seg6local hook only works on traffic already containing a SRH.
      This is the BPF equivalent of the seg6 LWT infrastructure, which achieves
      the same purpose but with a static SRH per route.
      
      These helpers require CONFIG_IPV6=y (and not =m).
      Signed-off-by: NMathieu Xhonneux <m.xhonneux@gmail.com>
      Acked-by: NDavid Lebrun <dlebrun@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      fe94cc29
    • M
      ipv6: sr: export function lookup_nexthop · 1c1e761e
      Mathieu Xhonneux 提交于
      The function lookup_nexthop is essential to implement most of the seg6local
      actions. As we want to provide a BPF helper allowing to apply some of these
      actions on the packet being processed, the helper should be able to call
      this function, hence the need to make it public.
      
      Moreover, if one argument is incorrect or if the next hop can not be found,
      an error should be returned by the BPF helper so the BPF program can adapt
      its processing of the packet (return an error, properly force the drop,
      ...). This patch hence makes this function return dst->error to indicate a
      possible error.
      Signed-off-by: NMathieu Xhonneux <m.xhonneux@gmail.com>
      Acked-by: NDavid Lebrun <dlebrun@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      1c1e761e
    • R
      ipv6: support sport, dport and ip_proto in RTM_GETROUTE · eacb9384
      Roopa Prabhu 提交于
      This is a followup to fib6 rules sport, dport and ipproto
      match support. Only supports tcp, udp and icmp for ipproto.
      Used by fib rule self tests.
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eacb9384
    • W
      udp: exclude gso from xfrm paths · ff06342c
      Willem de Bruijn 提交于
      UDP GSO delays final datagram construction to the GSO layer. This
      conflicts with protocol transformations.
      
      Fixes: bec1f6f6 ("udp: generate gso with UDP_SEGMENT")
      CC: Michal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff06342c
  4. 23 5月, 2018 6 次提交
    • V
      netfilter: ip6t_rpfilter: provide input interface for route lookup · cede24d1
      Vincent Bernat 提交于
      In commit 47b7e7f8, this bit was removed at the same time the
      RT6_LOOKUP_F_IFACE flag was removed. However, it is needed when
      link-local addresses are used, which is a very common case: when
      packets are routed, neighbor solicitations are done using link-local
      addresses. For example, the following neighbor solicitation is not
      matched by "-m rpfilter":
      
          IP6 fe80::5254:33ff:fe00:1 > ff02::1:ff00:3: ICMP6, neighbor
          solicitation, who has 2001:db8::5254:33ff:fe00:3, length 32
      
      Commit 47b7e7f8 doesn't quite explain why we shouldn't use
      RT6_LOOKUP_F_IFACE in the rpfilter case. I suppose the interface check
      later in the function would make it redundant. However, the remaining
      of the routing code is using RT6_LOOKUP_F_IFACE when there is no
      source address (which matches rpfilter's case with a non-unicast
      destination, like with neighbor solicitation).
      Signed-off-by: NVincent Bernat <vincent@bernat.im>
      Fixes: 47b7e7f8 ("netfilter: don't set F_IFACE on ipv6 fib lookups")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      cede24d1
    • F
      netfilter: nf_nat: add nat type hooks to nat core · 9971a514
      Florian Westphal 提交于
      Currently the packet rewrite and instantiation of nat NULL bindings
      happens from the protocol specific nat backend.
      
      Invocation occurs either via ip(6)table_nat or the nf_tables nat chain type.
      
      Invocation looks like this (simplified):
      NF_HOOK()
         |
         `---iptable_nat
      	 |
      	 `---> nf_nat_l3proto_ipv4 -> nf_nat_packet
      	               |
                new packet? pass skb though iptables nat chain
                             |
      		       `---> iptable_nat: ipt_do_table
      
      In nft case, this looks the same (nft_chain_nat_ipv4 instead of
      iptable_nat).
      
      This is a problem for two reasons:
      1. Can't use iptables nat and nf_tables nat at the same time,
         as the first user adds a nat binding (nf_nat_l3proto_ipv4 adds a
         NULL binding if do_table() did not find a matching nat rule so we
         can detect post-nat tuple collisions).
      2. If you use e.g. nft_masq, snat, redir, etc. uses must also register
         an empty base chain so that the nat core gets called fro NF_HOOK()
         to do the reverse translation, which is neither obvious nor user
         friendly.
      
      After this change, the base hook gets registered not from iptable_nat or
      nftables nat hooks, but from the l3 nat core.
      
      iptables/nft nat base hooks get registered with the nat core instead:
      
      NF_HOOK()
         |
         `---> nf_nat_l3proto_ipv4 -> nf_nat_packet
      		|
               new packet? pass skb through iptables/nftables nat chains
                      |
      		+-> iptables_nat: ipt_do_table
      	        +-> nft nat chain x
      	        `-> nft nat chain y
      
      The nat core deals with null bindings and reverse translation.
      When no mapping exists, it calls the registered nat lookup hooks until
      one creates a new mapping.
      If both iptables and nftables nat hooks exist, the first matching
      one is used (i.e., higher priority wins).
      
      Also, nft users do not need to create empty nat hooks anymore,
      nat core always registers the base hooks that take care of reverse/reply
      translation.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      9971a514
    • F
      netfilter: nf_tables: allow chain type to override hook register · 4e25ceb8
      Florian Westphal 提交于
      Will be used in followup patch when nat types no longer
      use nf_register_net_hook() but will instead register with the nat core.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      4e25ceb8
    • F
      netfilter: xtables: allow table definitions not backed by hook_ops · ba7d284a
      Florian Westphal 提交于
      The ip(6)tables nat table is currently receiving skbs from the netfilter
      core, after a followup patch skbs will be coming from the netfilter nat
      core instead, so the table is no longer backed by normal hook_ops.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ba7d284a
    • F
      netfilter: nf_nat: move common nat code to nat core · 1f55236b
      Florian Westphal 提交于
      Copy-pasted, both l3 helpers almost use same code here.
      Split out the common part into an 'inet' helper.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      1f55236b
    • D
      net/ipv6: Simplify route replace and appending into multipath route · f34436a4
      David Ahern 提交于
      Bring consistency to ipv6 route replace and append semantics.
      
      Remove rt6_qualify_for_ecmp which is just guess work. It fails in 2 cases:
      1. can not replace a route with a reject route. Existing code appends
         a new route instead of replacing the existing one.
      
      2. can not have a multipath route where a leg uses a dev only nexthop
      
      Existing use cases affected by this change:
      1. adding a route with existing prefix and metric using NLM_F_CREATE
         without NLM_F_APPEND or NLM_F_EXCL (ie., what iproute2 calls
         'prepend'). Existing code auto-determines that the new nexthop can
         be appended to an existing route to create a multipath route. This
         change breaks that by requiring the APPEND flag for the new route
         to be added to an existing one. Instead the prepend just adds another
         route entry.
      
      2. route replace. Existing code replaces first matching multipath route
         if new route is multipath capable and fallback to first matching
         non-ECMP route (reject or dev only route) in case one isn't available.
         New behavior replaces first matching route. (Thanks to Ido for spotting
         this one)
      
      Note: Newer iproute2 is needed to display multipath routes with a dev-only
            nexthop. This is due to a bug in iproute2 and parsing nexthops.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f34436a4
  5. 22 5月, 2018 1 次提交
  6. 20 5月, 2018 1 次提交
    • W
      net: ip6_gre: fix tunnel metadata device sharing. · b80d0b93
      William Tu 提交于
      Currently ip6gre and ip6erspan share single metadata mode device,
      using 'collect_md_tun'.  Thus, when doing:
        ip link add dev ip6gre11 type ip6gretap external
        ip link add dev ip6erspan12 type ip6erspan external
        RTNETLINK answers: File exists
      simply fails due to the 2nd tries to create the same collect_md_tun.
      
      The patch fixes it by adding a separate collect md tunnel device
      for the ip6erspan, 'collect_md_tun_erspan'.  As a result, a couple
      of places need to refactor/split up in order to distinguish ip6gre
      and ip6erspan.
      
      First, move the collect_md check at ip6gre_tunnel_{unlink,link} and
      create separate function {ip6gre,ip6ersapn}_tunnel_{link_md,unlink_md}.
      Then before link/unlink, make sure the link_md/unlink_md is called.
      Finally, a separate ndo_uninit is created for ip6erspan.  Tested it
      using the samples/bpf/test_tunnel_bpf.sh.
      
      Fixes: ef7baf5e ("ip6_gre: add ip6 erspan collect_md mode")
      Signed-off-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b80d0b93
  7. 18 5月, 2018 9 次提交
    • W
      net: test tailroom before appending to linear skb · 113f99c3
      Willem de Bruijn 提交于
      Device features may change during transmission. In particular with
      corking, a device may toggle scatter-gather in between allocating
      and writing to an skb.
      
      Do not unconditionally assume that !NETIF_F_SG at write time implies
      that the same held at alloc time and thus the skb has sufficient
      tailroom.
      
      This issue predates git history.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      113f99c3
    • P
      net: ip6_gre: Fix ip6erspan hlen calculation · 2d665034
      Petr Machata 提交于
      Even though ip6erspan_tap_init() sets up hlen and tun_hlen according to
      what ERSPAN needs, it goes ahead to call ip6gre_tnl_link_config() which
      overwrites these settings with GRE-specific ones.
      
      Similarly for changelink callbacks, which are handled by
      ip6gre_changelink() calls ip6gre_tnl_change() calls
      ip6gre_tnl_link_config() as well.
      
      The difference ends up being 12 vs. 20 bytes, and this is generally not
      a problem, because a 12-byte request likely ends up allocating more and
      the extra 8 bytes are thus available. However correct it is not.
      
      So replace the newlink and changelink callbacks with an ERSPAN-specific
      ones, reusing the newly-introduced _common() functions.
      
      Fixes: 5a963eb6 ("ip6_gre: Add ERSPAN native tunnel support")
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Acked-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d665034
    • P
      net: ip6_gre: Split up ip6gre_changelink() · c8632fc3
      Petr Machata 提交于
      Extract from ip6gre_changelink() a reusable function
      ip6gre_changelink_common(). This will allow introduction of
      ERSPAN-specific _changelink() function with not a lot of code
      duplication.
      
      Fixes: 5a963eb6 ("ip6_gre: Add ERSPAN native tunnel support")
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Acked-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c8632fc3
    • P
      net: ip6_gre: Split up ip6gre_newlink() · 7fa38a7c
      Petr Machata 提交于
      Extract from ip6gre_newlink() a reusable function
      ip6gre_newlink_common(). The ip6gre_tnl_link_config() call needs to be
      made customizable for ERSPAN, thus reorder it with calls to
      ip6_tnl_change_mtu() and dev_hold(), and extract the whole tail to the
      caller, ip6gre_newlink(). Thus enable an ERSPAN-specific _newlink()
      function without a lot of duplicity.
      
      Fixes: 5a963eb6 ("ip6_gre: Add ERSPAN native tunnel support")
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Acked-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7fa38a7c
    • P
      net: ip6_gre: Split up ip6gre_tnl_change() · a6465350
      Petr Machata 提交于
      Split a reusable function ip6gre_tnl_copy_tnl_parm() from
      ip6gre_tnl_change(). This will allow ERSPAN-specific code to
      reuse the common parts while customizing the behavior for ERSPAN.
      
      Fixes: 5a963eb6 ("ip6_gre: Add ERSPAN native tunnel support")
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Acked-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a6465350
    • P
      net: ip6_gre: Split up ip6gre_tnl_link_config() · a483373e
      Petr Machata 提交于
      The function ip6gre_tnl_link_config() is used for setting up
      configuration of both ip6gretap and ip6erspan tunnels. Split the
      function into the common part and the route-lookup part. The latter then
      takes the calculated header length as an argument. This split will allow
      the patches down the line to sneak in a custom header length computation
      for the ERSPAN tunnel.
      
      Fixes: 5a963eb6 ("ip6_gre: Add ERSPAN native tunnel support")
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Acked-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a483373e
    • P
      net: ip6_gre: Fix headroom request in ip6erspan_tunnel_xmit() · 5691484d
      Petr Machata 提交于
      dev->needed_headroom is not primed until ip6_tnl_xmit(), so it starts
      out zero. Thus the call to skb_cow_head() fails to actually make sure
      there's enough headroom to push the ERSPAN headers to. That can lead to
      the panic cited below. (Reproducer below that).
      
      Fix by requesting either needed_headroom if already primed, or just the
      bare minimum needed for the header otherwise.
      
      [  190.703567] kernel BUG at net/core/skbuff.c:104!
      [  190.708384] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
      [  190.714007] Modules linked in: act_mirred cls_matchall ip6_gre ip6_tunnel tunnel6 gre sch_ingress vrf veth x86_pkg_temp_thermal mlx_platform nfsd e1000e leds_mlxcpld
      [  190.728975] CPU: 1 PID: 959 Comm: kworker/1:2 Not tainted 4.17.0-rc4-net_master-custom-139 #10
      [  190.737647] Hardware name: Mellanox Technologies Ltd. "MSN2410-CB2F"/"SA000874", BIOS 4.6.5 03/08/2016
      [  190.747006] Workqueue: ipv6_addrconf addrconf_dad_work
      [  190.752222] RIP: 0010:skb_panic+0xc3/0x100
      [  190.756358] RSP: 0018:ffff8801d54072f0 EFLAGS: 00010282
      [  190.761629] RAX: 0000000000000085 RBX: ffff8801c1a8ecc0 RCX: 0000000000000000
      [  190.768830] RDX: 0000000000000085 RSI: dffffc0000000000 RDI: ffffed003aa80e54
      [  190.776025] RBP: ffff8801bd1ec5a0 R08: ffffed003aabce19 R09: ffffed003aabce19
      [  190.783226] R10: 0000000000000001 R11: ffffed003aabce18 R12: ffff8801bf695dbe
      [  190.790418] R13: 0000000000000084 R14: 00000000000006c0 R15: ffff8801bf695dc8
      [  190.797621] FS:  0000000000000000(0000) GS:ffff8801d5400000(0000) knlGS:0000000000000000
      [  190.805786] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  190.811582] CR2: 000055fa929aced0 CR3: 0000000003228004 CR4: 00000000001606e0
      [  190.818790] Call Trace:
      [  190.821264]  <IRQ>
      [  190.823314]  ? ip6erspan_tunnel_xmit+0x5e4/0x1982 [ip6_gre]
      [  190.828940]  ? ip6erspan_tunnel_xmit+0x5e4/0x1982 [ip6_gre]
      [  190.834562]  skb_push+0x78/0x90
      [  190.837749]  ip6erspan_tunnel_xmit+0x5e4/0x1982 [ip6_gre]
      [  190.843219]  ? ip6gre_tunnel_ioctl+0xd90/0xd90 [ip6_gre]
      [  190.848577]  ? debug_check_no_locks_freed+0x210/0x210
      [  190.853679]  ? debug_check_no_locks_freed+0x210/0x210
      [  190.858783]  ? print_irqtrace_events+0x120/0x120
      [  190.863451]  ? sched_clock_cpu+0x18/0x210
      [  190.867496]  ? cyc2ns_read_end+0x10/0x10
      [  190.871474]  ? skb_network_protocol+0x76/0x200
      [  190.875977]  dev_hard_start_xmit+0x137/0x770
      [  190.880317]  ? do_raw_spin_trylock+0x6d/0xa0
      [  190.884624]  sch_direct_xmit+0x2ef/0x5d0
      [  190.888589]  ? pfifo_fast_dequeue+0x3fa/0x670
      [  190.892994]  ? pfifo_fast_change_tx_queue_len+0x810/0x810
      [  190.898455]  ? __lock_is_held+0xa0/0x160
      [  190.902422]  __qdisc_run+0x39e/0xfc0
      [  190.906041]  ? _raw_spin_unlock+0x29/0x40
      [  190.910090]  ? pfifo_fast_enqueue+0x24b/0x3e0
      [  190.914501]  ? sch_direct_xmit+0x5d0/0x5d0
      [  190.918658]  ? pfifo_fast_dequeue+0x670/0x670
      [  190.923047]  ? __dev_queue_xmit+0x172/0x1770
      [  190.927365]  ? preempt_count_sub+0xf/0xd0
      [  190.931421]  __dev_queue_xmit+0x410/0x1770
      [  190.935553]  ? ___slab_alloc+0x605/0x930
      [  190.939524]  ? print_irqtrace_events+0x120/0x120
      [  190.944186]  ? memcpy+0x34/0x50
      [  190.947364]  ? netdev_pick_tx+0x1c0/0x1c0
      [  190.951428]  ? __skb_clone+0x2fd/0x3d0
      [  190.955218]  ? __copy_skb_header+0x270/0x270
      [  190.959537]  ? rcu_read_lock_sched_held+0x93/0xa0
      [  190.964282]  ? kmem_cache_alloc+0x344/0x4d0
      [  190.968520]  ? cyc2ns_read_end+0x10/0x10
      [  190.972495]  ? skb_clone+0x123/0x230
      [  190.976112]  ? skb_split+0x820/0x820
      [  190.979747]  ? tcf_mirred+0x554/0x930 [act_mirred]
      [  190.984582]  tcf_mirred+0x554/0x930 [act_mirred]
      [  190.989252]  ? tcf_mirred_act_wants_ingress.part.2+0x10/0x10 [act_mirred]
      [  190.996109]  ? __lock_acquire+0x706/0x26e0
      [  191.000239]  ? sched_clock_cpu+0x18/0x210
      [  191.004294]  tcf_action_exec+0xcf/0x2a0
      [  191.008179]  tcf_classify+0xfa/0x340
      [  191.011794]  __netif_receive_skb_core+0x8e1/0x1c60
      [  191.016630]  ? debug_check_no_locks_freed+0x210/0x210
      [  191.021732]  ? nf_ingress+0x500/0x500
      [  191.025458]  ? process_backlog+0x347/0x4b0
      [  191.029619]  ? print_irqtrace_events+0x120/0x120
      [  191.034302]  ? lock_acquire+0xd8/0x320
      [  191.038089]  ? process_backlog+0x1b6/0x4b0
      [  191.042246]  ? process_backlog+0xc2/0x4b0
      [  191.046303]  process_backlog+0xc2/0x4b0
      [  191.050189]  net_rx_action+0x5cc/0x980
      [  191.053991]  ? napi_complete_done+0x2c0/0x2c0
      [  191.058386]  ? mark_lock+0x13d/0xb40
      [  191.062001]  ? clockevents_program_event+0x6b/0x1d0
      [  191.066922]  ? print_irqtrace_events+0x120/0x120
      [  191.071593]  ? __lock_is_held+0xa0/0x160
      [  191.075566]  __do_softirq+0x1d4/0x9d2
      [  191.079282]  ? ip6_finish_output2+0x524/0x1460
      [  191.083771]  do_softirq_own_stack+0x2a/0x40
      [  191.087994]  </IRQ>
      [  191.090130]  do_softirq.part.13+0x38/0x40
      [  191.094178]  __local_bh_enable_ip+0x135/0x190
      [  191.098591]  ip6_finish_output2+0x54d/0x1460
      [  191.102916]  ? ip6_forward_finish+0x2f0/0x2f0
      [  191.107314]  ? ip6_mtu+0x3c/0x2c0
      [  191.110674]  ? ip6_finish_output+0x2f8/0x650
      [  191.114992]  ? ip6_output+0x12a/0x500
      [  191.118696]  ip6_output+0x12a/0x500
      [  191.122223]  ? ip6_route_dev_notify+0x5b0/0x5b0
      [  191.126807]  ? ip6_finish_output+0x650/0x650
      [  191.131120]  ? ip6_fragment+0x1a60/0x1a60
      [  191.135182]  ? icmp6_dst_alloc+0x26e/0x470
      [  191.139317]  mld_sendpack+0x672/0x830
      [  191.143021]  ? igmp6_mcf_seq_next+0x2f0/0x2f0
      [  191.147429]  ? __local_bh_enable_ip+0x77/0x190
      [  191.151913]  ipv6_mc_dad_complete+0x47/0x90
      [  191.156144]  addrconf_dad_completed+0x561/0x720
      [  191.160731]  ? addrconf_rs_timer+0x3a0/0x3a0
      [  191.165036]  ? mark_held_locks+0xc9/0x140
      [  191.169095]  ? __local_bh_enable_ip+0x77/0x190
      [  191.173570]  ? addrconf_dad_work+0x50d/0xa20
      [  191.177886]  ? addrconf_dad_work+0x529/0xa20
      [  191.182194]  addrconf_dad_work+0x529/0xa20
      [  191.186342]  ? addrconf_dad_completed+0x720/0x720
      [  191.191088]  ? __lock_is_held+0xa0/0x160
      [  191.195059]  ? process_one_work+0x45d/0xe20
      [  191.199302]  ? process_one_work+0x51e/0xe20
      [  191.203531]  ? rcu_read_lock_sched_held+0x93/0xa0
      [  191.208279]  process_one_work+0x51e/0xe20
      [  191.212340]  ? pwq_dec_nr_in_flight+0x200/0x200
      [  191.216912]  ? get_lock_stats+0x4b/0xf0
      [  191.220788]  ? preempt_count_sub+0xf/0xd0
      [  191.224844]  ? worker_thread+0x219/0x860
      [  191.228823]  ? do_raw_spin_trylock+0x6d/0xa0
      [  191.233142]  worker_thread+0xeb/0x860
      [  191.236848]  ? process_one_work+0xe20/0xe20
      [  191.241095]  kthread+0x206/0x300
      [  191.244352]  ? process_one_work+0xe20/0xe20
      [  191.248587]  ? kthread_stop+0x570/0x570
      [  191.252459]  ret_from_fork+0x3a/0x50
      [  191.256082] Code: 14 3e ff 8b 4b 78 55 4d 89 f9 41 56 41 55 48 c7 c7 a0 cf db 82 41 54 44 8b 44 24 2c 48 8b 54 24 30 48 8b 74 24 20 e8 16 94 13 ff <0f> 0b 48 c7 c7 60 8e 1f 85 48 83 c4 20 e8 55 ef a6 ff 89 74 24
      [  191.275327] RIP: skb_panic+0xc3/0x100 RSP: ffff8801d54072f0
      [  191.281024] ---[ end trace 7ea51094e099e006 ]---
      [  191.285724] Kernel panic - not syncing: Fatal exception in interrupt
      [  191.292168] Kernel Offset: disabled
      [  191.295697] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
      
      Reproducer:
      
      	ip link add h1 type veth peer name swp1
      	ip link add h3 type veth peer name swp3
      
      	ip link set dev h1 up
      	ip address add 192.0.2.1/28 dev h1
      
      	ip link add dev vh3 type vrf table 20
      	ip link set dev h3 master vh3
      	ip link set dev vh3 up
      	ip link set dev h3 up
      
      	ip link set dev swp3 up
      	ip address add dev swp3 2001:db8:2::1/64
      
      	ip link set dev swp1 up
      	tc qdisc add dev swp1 clsact
      
      	ip link add name gt6 type ip6erspan \
      		local 2001:db8:2::1 remote 2001:db8:2::2 oseq okey 123
      	ip link set dev gt6 up
      
      	sleep 1
      
      	tc filter add dev swp1 ingress pref 1000 matchall skip_hw \
      		action mirred egress mirror dev gt6
      	ping -I h1 192.0.2.2
      
      Fixes: e41c7c68 ("ip6erspan: make sure enough headroom at xmit.")
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Acked-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5691484d
    • P
      net: ip6_gre: Request headroom in __gre6_xmit() · 01b8d064
      Petr Machata 提交于
      __gre6_xmit() pushes GRE headers before handing over to ip6_tnl_xmit()
      for generic IP-in-IP processing. However it doesn't make sure that there
      is enough headroom to push the header to. That can lead to the panic
      cited below. (Reproducer below that).
      
      Fix by requesting either needed_headroom if already primed, or just the
      bare minimum needed for the header otherwise.
      
      [  158.576725] kernel BUG at net/core/skbuff.c:104!
      [  158.581510] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
      [  158.587174] Modules linked in: act_mirred cls_matchall ip6_gre ip6_tunnel tunnel6 gre sch_ingress vrf veth x86_pkg_temp_thermal mlx_platform nfsd e1000e leds_mlxcpld
      [  158.602268] CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 4.17.0-rc4-net_master-custom-139 #10
      [  158.610938] Hardware name: Mellanox Technologies Ltd. "MSN2410-CB2F"/"SA000874", BIOS 4.6.5 03/08/2016
      [  158.620426] RIP: 0010:skb_panic+0xc3/0x100
      [  158.624586] RSP: 0018:ffff8801d3f27110 EFLAGS: 00010286
      [  158.629882] RAX: 0000000000000082 RBX: ffff8801c02cc040 RCX: 0000000000000000
      [  158.637127] RDX: 0000000000000082 RSI: dffffc0000000000 RDI: ffffed003a7e4e18
      [  158.644366] RBP: ffff8801bfec8020 R08: ffffed003aabce19 R09: ffffed003aabce19
      [  158.651574] R10: 000000000000000b R11: ffffed003aabce18 R12: ffff8801c364de66
      [  158.658786] R13: 000000000000002c R14: 00000000000000c0 R15: ffff8801c364de68
      [  158.666007] FS:  0000000000000000(0000) GS:ffff8801d5400000(0000) knlGS:0000000000000000
      [  158.674212] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  158.680036] CR2: 00007f4b3702dcd0 CR3: 0000000003228002 CR4: 00000000001606e0
      [  158.687228] Call Trace:
      [  158.689752]  ? __gre6_xmit+0x246/0xd80 [ip6_gre]
      [  158.694475]  ? __gre6_xmit+0x246/0xd80 [ip6_gre]
      [  158.699141]  skb_push+0x78/0x90
      [  158.702344]  __gre6_xmit+0x246/0xd80 [ip6_gre]
      [  158.706872]  ip6gre_tunnel_xmit+0x3bc/0x610 [ip6_gre]
      [  158.711992]  ? __gre6_xmit+0xd80/0xd80 [ip6_gre]
      [  158.716668]  ? debug_check_no_locks_freed+0x210/0x210
      [  158.721761]  ? print_irqtrace_events+0x120/0x120
      [  158.726461]  ? sched_clock_cpu+0x18/0x210
      [  158.730572]  ? sched_clock_cpu+0x18/0x210
      [  158.734692]  ? cyc2ns_read_end+0x10/0x10
      [  158.738705]  ? skb_network_protocol+0x76/0x200
      [  158.743216]  ? netif_skb_features+0x1b2/0x550
      [  158.747648]  dev_hard_start_xmit+0x137/0x770
      [  158.752010]  sch_direct_xmit+0x2ef/0x5d0
      [  158.755992]  ? pfifo_fast_dequeue+0x3fa/0x670
      [  158.760460]  ? pfifo_fast_change_tx_queue_len+0x810/0x810
      [  158.765975]  ? __lock_is_held+0xa0/0x160
      [  158.770002]  __qdisc_run+0x39e/0xfc0
      [  158.773673]  ? _raw_spin_unlock+0x29/0x40
      [  158.777781]  ? pfifo_fast_enqueue+0x24b/0x3e0
      [  158.782191]  ? sch_direct_xmit+0x5d0/0x5d0
      [  158.786372]  ? pfifo_fast_dequeue+0x670/0x670
      [  158.790818]  ? __dev_queue_xmit+0x172/0x1770
      [  158.795195]  ? preempt_count_sub+0xf/0xd0
      [  158.799313]  __dev_queue_xmit+0x410/0x1770
      [  158.803512]  ? ___slab_alloc+0x605/0x930
      [  158.807525]  ? ___slab_alloc+0x605/0x930
      [  158.811540]  ? memcpy+0x34/0x50
      [  158.814768]  ? netdev_pick_tx+0x1c0/0x1c0
      [  158.818895]  ? __skb_clone+0x2fd/0x3d0
      [  158.822712]  ? __copy_skb_header+0x270/0x270
      [  158.827079]  ? rcu_read_lock_sched_held+0x93/0xa0
      [  158.831903]  ? kmem_cache_alloc+0x344/0x4d0
      [  158.836199]  ? skb_clone+0x123/0x230
      [  158.839869]  ? skb_split+0x820/0x820
      [  158.843521]  ? tcf_mirred+0x554/0x930 [act_mirred]
      [  158.848407]  tcf_mirred+0x554/0x930 [act_mirred]
      [  158.853104]  ? tcf_mirred_act_wants_ingress.part.2+0x10/0x10 [act_mirred]
      [  158.860005]  ? __lock_acquire+0x706/0x26e0
      [  158.864162]  ? mark_lock+0x13d/0xb40
      [  158.867832]  tcf_action_exec+0xcf/0x2a0
      [  158.871736]  tcf_classify+0xfa/0x340
      [  158.875402]  __netif_receive_skb_core+0x8e1/0x1c60
      [  158.880334]  ? nf_ingress+0x500/0x500
      [  158.884059]  ? process_backlog+0x347/0x4b0
      [  158.888241]  ? lock_acquire+0xd8/0x320
      [  158.892050]  ? process_backlog+0x1b6/0x4b0
      [  158.896228]  ? process_backlog+0xc2/0x4b0
      [  158.900291]  process_backlog+0xc2/0x4b0
      [  158.904210]  net_rx_action+0x5cc/0x980
      [  158.908047]  ? napi_complete_done+0x2c0/0x2c0
      [  158.912525]  ? rcu_read_unlock+0x80/0x80
      [  158.916534]  ? __lock_is_held+0x34/0x160
      [  158.920541]  __do_softirq+0x1d4/0x9d2
      [  158.924308]  ? trace_event_raw_event_irq_handler_exit+0x140/0x140
      [  158.930515]  run_ksoftirqd+0x1d/0x40
      [  158.934152]  smpboot_thread_fn+0x32b/0x690
      [  158.938299]  ? sort_range+0x20/0x20
      [  158.941842]  ? preempt_count_sub+0xf/0xd0
      [  158.945940]  ? schedule+0x5b/0x140
      [  158.949412]  kthread+0x206/0x300
      [  158.952689]  ? sort_range+0x20/0x20
      [  158.956249]  ? kthread_stop+0x570/0x570
      [  158.960164]  ret_from_fork+0x3a/0x50
      [  158.963823] Code: 14 3e ff 8b 4b 78 55 4d 89 f9 41 56 41 55 48 c7 c7 a0 cf db 82 41 54 44 8b 44 24 2c 48 8b 54 24 30 48 8b 74 24 20 e8 16 94 13 ff <0f> 0b 48 c7 c7 60 8e 1f 85 48 83 c4 20 e8 55 ef a6 ff 89 74 24
      [  158.983235] RIP: skb_panic+0xc3/0x100 RSP: ffff8801d3f27110
      [  158.988935] ---[ end trace 5af56ee845aa6cc8 ]---
      [  158.993641] Kernel panic - not syncing: Fatal exception in interrupt
      [  159.000176] Kernel Offset: disabled
      [  159.003767] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
      
      Reproducer:
      
      	ip link add h1 type veth peer name swp1
      	ip link add h3 type veth peer name swp3
      
      	ip link set dev h1 up
      	ip address add 192.0.2.1/28 dev h1
      
      	ip link add dev vh3 type vrf table 20
      	ip link set dev h3 master vh3
      	ip link set dev vh3 up
      	ip link set dev h3 up
      
      	ip link set dev swp3 up
      	ip address add dev swp3 2001:db8:2::1/64
      
      	ip link set dev swp1 up
      	tc qdisc add dev swp1 clsact
      
      	ip link add name gt6 type ip6gretap \
      		local 2001:db8:2::1 remote 2001:db8:2::2
      	ip link set dev gt6 up
      
      	sleep 1
      
      	tc filter add dev swp1 ingress pref 1000 matchall skip_hw \
      		action mirred egress mirror dev gt6
      	ping -I h1 192.0.2.2
      
      Fixes: c12b395a ("gre: Support GRE over IPv6")
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Acked-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      01b8d064
    • W
      erspan: fix invalid erspan version. · 02f99df1
      William Tu 提交于
      ERSPAN only support version 1 and 2.  When packets send to an
      erspan device which does not have proper version number set,
      drop the packet.  In real case, we observe multicast packets
      sent to the erspan pernet device, erspan0, which does not have
      erspan version configured.
      Reported-by: NGreg Rose <gvrose8192@gmail.com>
      Signed-off-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02f99df1
  8. 12 5月, 2018 1 次提交
  9. 11 5月, 2018 11 次提交