1. 12 6月, 2018 2 次提交
    • D
      net/ipv6: Ensure cfg is properly initialized in ipv6_create_tempaddr · 3f2d67b6
      David Ahern 提交于
      Valdis reported a BUG in ipv6_add_addr:
      
      [ 1820.832682] BUG: unable to handle kernel NULL pointer dereference at 0000000000000209
      [ 1820.832728] RIP: 0010:ipv6_add_addr+0x280/0xd10
      [ 1820.832732] Code: 49 8b 1f 0f 84 6a 0a 00 00 48 85 db 0f 84 4e 0a 00 00 48 8b 03 48 8b 53 08 49 89 45 00 49 8b 47 10
      49 89 55 08 48 85 c0 74 15 <48> 8b 50 08 48 8b 00 49 89 95 b8 01 00 00 49 89 85 b0 01 00 00 4c
      [ 1820.832847] RSP: 0018:ffffaa07c2fd7880 EFLAGS: 00010202
      [ 1820.832853] RAX: 0000000000000201 RBX: ffffaa07c2fd79b0 RCX: 0000000000000000
      [ 1820.832858] RDX: a4cfbfba2cbfa64c RSI: 0000000000000000 RDI: ffffffff8a8e9fa0
      [ 1820.832862] RBP: ffffaa07c2fd7920 R08: 000000000000017a R09: ffffffff8a555300
      [ 1820.832866] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888d18e71c00
      [ 1820.832871] R13: ffff888d0a9b1200 R14: 0000000000000000 R15: ffffaa07c2fd7980
      [ 1820.832876] FS:  00007faa51bdb800(0000) GS:ffff888d1d400000(0000) knlGS:0000000000000000
      [ 1820.832880] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1820.832885] CR2: 0000000000000209 CR3: 000000021e8f8001 CR4: 00000000001606e0
      [ 1820.832888] Call Trace:
      [ 1820.832898]  ? __local_bh_enable_ip+0x119/0x260
      [ 1820.832904]  ? ipv6_create_tempaddr+0x259/0x5a0
      [ 1820.832912]  ? __local_bh_enable_ip+0x139/0x260
      [ 1820.832921]  ipv6_create_tempaddr+0x2da/0x5a0
      [ 1820.832926]  ? ipv6_create_tempaddr+0x2da/0x5a0
      [ 1820.832941]  manage_tempaddrs+0x1a5/0x240
      [ 1820.832951]  inet6_addr_del+0x20b/0x3b0
      [ 1820.832959]  ? nla_parse+0xce/0x1e0
      [ 1820.832968]  inet6_rtm_deladdr+0xd9/0x210
      [ 1820.832981]  rtnetlink_rcv_msg+0x1d4/0x5f0
      
      Looking at the code I found 1 element (peer_pfx) of the newly introduced
      ifa6_config struct that is not initialized. Use a memset rather than hard
      coding an init for each struct element.
      Reported-by: NValdis Kletnieks <valdis.kletnieks@vt.edu>
      Fixes: e6464b8c ("net/ipv6: Convert ipv6_add_addr to struct ifa6_config")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f2d67b6
    • J
      ipv6: allow PMTU exceptions to local routes · 09757646
      Julian Anastasov 提交于
      IPVS setups with local client and remote tunnel server need
      to create exception for the local virtual IP. What we do is to
      change PMTU from 64KB (on "lo") to 1460 in the common case.
      Suggested-by: NMartin KaFai Lau <kafai@fb.com>
      Fixes: 45e4fd26 ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
      Fixes: 7343ff31 ("ipv6: Don't create clones of host routes.")
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09757646
  2. 09 6月, 2018 1 次提交
    • P
      udp: fix rx queue len reported by diag and proc interface · 6c206b20
      Paolo Abeni 提交于
      After commit 6b229cf7 ("udp: add batching to udp_rmem_release()")
      the sk_rmem_alloc field does not measure exactly anymore the
      receive queue length, because we batch the rmem release. The issue
      is really apparent only after commit 0d4a6608 ("udp: do rmem bulk
      free even if the rx sk queue is empty"): the user space can easily
      check for an empty socket with not-0 queue length reported by the 'ss'
      tool or the procfs interface.
      
      We need to use a custom UDP helper to report the correct queue length,
      taking into account the forward allocation deficit.
      
      Reported-by: trevor.francis@46labs.com
      Fixes: 6b229cf7 ("UDP: add batching to udp_rmem_release()")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c206b20
  3. 08 6月, 2018 1 次提交
    • F
      netfilter: x_tables: initialise match/target check parameter struct · c568503e
      Florian Westphal 提交于
      syzbot reports following splat:
      
      BUG: KMSAN: uninit-value in ebt_stp_mt_check+0x24b/0x450
       net/bridge/netfilter/ebt_stp.c:162
       ebt_stp_mt_check+0x24b/0x450 net/bridge/netfilter/ebt_stp.c:162
       xt_check_match+0x1438/0x1650 net/netfilter/x_tables.c:506
       ebt_check_match net/bridge/netfilter/ebtables.c:372 [inline]
       ebt_check_entry net/bridge/netfilter/ebtables.c:702 [inline]
      
      The uninitialised access is
         xt_mtchk_param->nft_compat
      
      ... which should be set to 0.
      Fix it by zeroing the struct beforehand, same for tgchk.
      
      ip(6)tables targetinfo uses c99-style initialiser, so no change
      needed there.
      
      Reported-by: syzbot+da4494182233c23a5fcf@syzkaller.appspotmail.com
      Fixes: 55917a21 ("netfilter: x_tables: add context to know if extension runs from nft_compat")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      c568503e
  4. 06 6月, 2018 2 次提交
  5. 05 6月, 2018 6 次提交
    • A
      netfilter: provide udp*_lib_lookup for nf_tproxy · 6e86000c
      Arnd Bergmann 提交于
      It is now possible to enable the libified nf_tproxy modules without
      also enabling NETFILTER_XT_TARGET_TPROXY, which throws off the
      ifdef logic in the udp core code:
      
      net/ipv6/netfilter/nf_tproxy_ipv6.o: In function `nf_tproxy_get_sock_v6':
      nf_tproxy_ipv6.c:(.text+0x1a8): undefined reference to `udp6_lib_lookup'
      net/ipv4/netfilter/nf_tproxy_ipv4.o: In function `nf_tproxy_get_sock_v4':
      nf_tproxy_ipv4.c:(.text+0x3d0): undefined reference to `udp4_lib_lookup'
      
      We can actually simplify the conditions now to provide the two functions
      exactly when they are needed.
      
      Fixes: 45ca4e0c ("netfilter: Libify xt_TPROXY")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NMáté Eckl <ecklm94@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e86000c
    • S
      net: ipv6: Generate random IID for addresses on RAWIP devices · 9deb441c
      Subash Abhinov Kasiviswanathan 提交于
      RAWIP devices such as rmnet do not have a hardware address and
      instead require the kernel to generate a random IID for the
      IPv6 addresses.
      Signed-off-by: NSean Tranchetti <stranche@codeaurora.org>
      Signed-off-by: NSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9deb441c
    • D
      net/ipv6: prevent use after free in ip6_route_mpath_notify · f7225172
      David Ahern 提交于
      syzbot reported a use-after-free:
      
      BUG: KASAN: use-after-free in ip6_route_mpath_notify+0xe9/0x100 net/ipv6/route.c:4180
      Read of size 4 at addr ffff8801bf789cf0 by task syz-executor756/4555
      
      CPU: 1 PID: 4555 Comm: syz-executor756 Not tainted 4.17.0-rc7+ #78
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1b9/0x294 lib/dump_stack.c:113
       print_address_description+0x6c/0x20b mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
       __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
       ip6_route_mpath_notify+0xe9/0x100 net/ipv6/route.c:4180
       ip6_route_multipath_add+0x615/0x1910 net/ipv6/route.c:4303
       inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
       ...
      
      Allocated by task 4555:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
       kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
       dst_alloc+0xbb/0x1d0 net/core/dst.c:104
       __ip6_dst_alloc+0x35/0xa0 net/ipv6/route.c:361
       ip6_dst_alloc+0x29/0xb0 net/ipv6/route.c:376
       ip6_route_info_create+0x4d4/0x3a30 net/ipv6/route.c:2834
       ip6_route_multipath_add+0xc7e/0x1910 net/ipv6/route.c:4240
       inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
       ...
      
      Freed by task 4555:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
       kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
       __cache_free mm/slab.c:3498 [inline]
       kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
       dst_destroy+0x267/0x3c0 net/core/dst.c:140
       dst_release_immediate+0x71/0x9e net/core/dst.c:205
       fib6_add+0xa40/0x1650 net/ipv6/ip6_fib.c:1305
       __ip6_ins_rt+0x6c/0x90 net/ipv6/route.c:1011
       ip6_route_multipath_add+0x513/0x1910 net/ipv6/route.c:4267
       inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
       ...
      
      The problem is that rt_last can point to a deleted route if the insert
      fails.
      
      One reproducer is to insert a route and then add a multipath route that
      has a duplicate nexthop.e.g,:
          $ ip -6 ro add vrf red 2001:db8:101::/64 nexthop via 2001:db8:1::2
          $ ip -6 ro append vrf red 2001:db8:101::/64 nexthop via 2001:db8:1::4 nexthop via 2001:db8:1::2
      
      Fix by not setting rt_last until the it is verified the insert succeeded.
      
      Fixes: 3b1137fe ("net: ipv6: Change notifications for multipath add to RTA_MULTIPATH")
      Cc: Eric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f7225172
    • M
      ipv6: omit traffic class when calculating flow hash · fa1be7e0
      Michal Kubecek 提交于
      Some of the code paths calculating flow hash for IPv6 use flowlabel member
      of struct flowi6 which, despite its name, encodes both flow label and
      traffic class. If traffic class changes within a TCP connection (as e.g.
      ssh does), ECMP route can switch between path. It's also inconsistent with
      other code paths where ip6_flowlabel() (returning only flow label) is used
      to feed the key.
      
      Use only flow label everywhere, including one place where hash key is set
      using ip6_flowinfo().
      
      Fixes: 51ebd318 ("ipv6: add support of equal cost multipath (ECMP)")
      Fixes: f70ea018 ("net: Add functions to get skb->hash based on flow structures")
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa1be7e0
    • D
      Revert "ipv6: omit traffic class when calculating flow hash" · a925ab48
      David S. Miller 提交于
      This reverts commit 87ae68c8.
      
      Applied the wrong version of this fix, correct version
      coming up.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a925ab48
    • M
      ipv6: omit traffic class when calculating flow hash · 87ae68c8
      Michal Kubecek 提交于
      Some of the code paths calculating flow hash for IPv6 use flowlabel member
      of struct flowi6 which, despite its name, encodes both flow label and
      traffic class. If traffic class changes within a TCP connection (as e.g.
      ssh does), ECMP route can switch between path. It's also incosistent with
      other code paths where ip6_flowlabel() (returning only flow label) is used
      to feed the key.
      
      Use only flow label everywhere, including one place where hash key is set
      using ip6_flowinfo().
      
      Fixes: 51ebd318 ("ipv6: add support of equal cost multipath (ECMP)")
      Fixes: f70ea018 ("net: Add functions to get skb->hash based on flow structures")
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Tested-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87ae68c8
  6. 04 6月, 2018 1 次提交
  7. 03 6月, 2018 1 次提交
    • M
      netfilter: Libify xt_TPROXY · 45ca4e0c
      Máté Eckl 提交于
      The extracted functions will likely be usefull to implement tproxy
      support in nf_tables.
      
      Extrancted functions:
      	- nf_tproxy_sk_is_transparent
      	- nf_tproxy_laddr4
      	- nf_tproxy_handle_time_wait4
      	- nf_tproxy_get_sock_v4
      	- nf_tproxy_laddr6
      	- nf_tproxy_handle_time_wait6
      	- nf_tproxy_get_sock_v6
      
      (nf_)tproxy_handle_time_wait6 also needed some refactor as its current
      implementation was xtables-specific.
      Signed-off-by: NMáté Eckl <ecklm94@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      45ca4e0c
  8. 02 6月, 2018 1 次提交
  9. 01 6月, 2018 1 次提交
  10. 29 5月, 2018 7 次提交
    • D
      net/ipv6: Add support for specifying metric of connected routes · 8308f3ff
      David Ahern 提交于
      Add support for IFA_RT_PRIORITY to ipv6 addresses.
      
      If the metric is changed on an existing address then the new route
      is inserted before removing the old one. Since the metric is one
      of the route keys, the prefix route can not be atomically replaced.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8308f3ff
    • D
      net/ipv6: Pass ifa6_config struct to inet6_addr_modify · d169a1f8
      David Ahern 提交于
      Update inet6_addr_modify to take ifa6_config argument versus a parameter
      list. This is an argument move only; no functional change intended.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d169a1f8
    • D
      net/ipv6: Pass ifa6_config struct to inet6_addr_add · 19b1518c
      David Ahern 提交于
      Move the creation of struct ifa6_config up to callers of inet6_addr_add.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19b1518c
    • D
      net/ipv6: Convert ipv6_add_addr to struct ifa6_config · e6464b8c
      David Ahern 提交于
      Move config parameters for adding an ipv6 address to a struct. struct
      names stem from inet6_rtm_newaddr which is the modern handler for
      adding an address.
      
      Start the conversion to ifa6_config with ipv6_add_addr. This is an argument
      move only; no functional change intended. Mapping of variable changes:
      
          addr      -->  cfg->pfx
          peer_addr -->  cfg->peer_pfx
          pfxlen    -->  cfg->plen
          flags     -->  cfg->ifa_flags
      
      scope, valid_lft, prefered_lft have the same names within cfg
      (with corrected spelling).
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6464b8c
    • Y
      net: remove unnecessary genlmsg_cancel() calls · c1c9a3c9
      YueHaibing 提交于
      the message be freed immediately, no need to trim it
      back to the previous size.
      
      Inspired by commit 7a9b3ec1 ("nl80211: remove unnecessary genlmsg_cancel() calls")
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c1c9a3c9
    • M
      ipv6: sr: fix memory OOB access in seg6_do_srh_encap/inline · bbb40a0b
      Mathieu Xhonneux 提交于
      seg6_do_srh_encap and seg6_do_srh_inline can possibly do an
      out-of-bounds access when adding the SRH to the packet. This no longer
      happen when expanding the skb not only by the size of the SRH (+
      outer IPv6 header), but also by skb->mac_len.
      
      [   53.793056] BUG: KASAN: use-after-free in seg6_do_srh_encap+0x284/0x620
      [   53.794564] Write of size 14 at addr ffff88011975ecfa by task ping/674
      
      [   53.796665] CPU: 0 PID: 674 Comm: ping Not tainted 4.17.0-rc3-ARCH+ #90
      [   53.796670] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
      BIOS 1.11.0-20171110_100015-anatol 04/01/2014
      [   53.796673] Call Trace:
      [   53.796679]  <IRQ>
      [   53.796689]  dump_stack+0x71/0xab
      [   53.796700]  print_address_description+0x6a/0x270
      [   53.796707]  kasan_report+0x258/0x380
      [   53.796715]  ? seg6_do_srh_encap+0x284/0x620
      [   53.796722]  memmove+0x34/0x50
      [   53.796730]  seg6_do_srh_encap+0x284/0x620
      [   53.796741]  ? seg6_do_srh+0x29b/0x360
      [   53.796747]  seg6_do_srh+0x29b/0x360
      [   53.796756]  seg6_input+0x2e/0x2e0
      [   53.796765]  lwtunnel_input+0x93/0xd0
      [   53.796774]  ipv6_rcv+0x690/0x920
      [   53.796783]  ? ip6_input+0x170/0x170
      [   53.796791]  ? eth_gro_receive+0x2d0/0x2d0
      [   53.796800]  ? ip6_input+0x170/0x170
      [   53.796809]  __netif_receive_skb_core+0xcc0/0x13f0
      [   53.796820]  ? netdev_info+0x110/0x110
      [   53.796827]  ? napi_complete_done+0xb6/0x170
      [   53.796834]  ? e1000_clean+0x6da/0xf70
      [   53.796845]  ? process_backlog+0x129/0x2a0
      [   53.796853]  process_backlog+0x129/0x2a0
      [   53.796862]  net_rx_action+0x211/0x5c0
      [   53.796870]  ? napi_complete_done+0x170/0x170
      [   53.796887]  ? run_rebalance_domains+0x11f/0x150
      [   53.796891]  __do_softirq+0x10e/0x39e
      [   53.796894]  do_softirq_own_stack+0x2a/0x40
      [   53.796895]  </IRQ>
      [   53.796898]  do_softirq.part.16+0x54/0x60
      [   53.796900]  __local_bh_enable_ip+0x5b/0x60
      [   53.796903]  ip6_finish_output2+0x416/0x9f0
      [   53.796906]  ? ip6_dst_lookup_flow+0x110/0x110
      [   53.796909]  ? ip6_sk_dst_lookup_flow+0x390/0x390
      [   53.796911]  ? __rcu_read_unlock+0x66/0x80
      [   53.796913]  ? ip6_mtu+0x44/0xf0
      [   53.796916]  ? ip6_output+0xfc/0x220
      [   53.796918]  ip6_output+0xfc/0x220
      [   53.796921]  ? ip6_finish_output+0x2b0/0x2b0
      [   53.796923]  ? memcpy+0x34/0x50
      [   53.796926]  ip6_send_skb+0x43/0xc0
      [   53.796929]  rawv6_sendmsg+0x1216/0x1530
      [   53.796932]  ? __orc_find+0x6b/0xc0
      [   53.796934]  ? rawv6_rcv_skb+0x160/0x160
      [   53.796937]  ? __rcu_read_unlock+0x66/0x80
      [   53.796939]  ? __rcu_read_unlock+0x66/0x80
      [   53.796942]  ? is_bpf_text_address+0x1e/0x30
      [   53.796944]  ? kernel_text_address+0xec/0x100
      [   53.796946]  ? __kernel_text_address+0xe/0x30
      [   53.796948]  ? unwind_get_return_address+0x2f/0x50
      [   53.796950]  ? __save_stack_trace+0x92/0x100
      [   53.796954]  ? save_stack+0x89/0xb0
      [   53.796956]  ? kasan_kmalloc+0xa0/0xd0
      [   53.796958]  ? kmem_cache_alloc+0xd2/0x1f0
      [   53.796961]  ? prepare_creds+0x23/0x160
      [   53.796963]  ? __x64_sys_capset+0x252/0x3e0
      [   53.796966]  ? do_syscall_64+0x69/0x160
      [   53.796968]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   53.796971]  ? __alloc_pages_nodemask+0x170/0x380
      [   53.796973]  ? __alloc_pages_slowpath+0x12c0/0x12c0
      [   53.796977]  ? tty_vhangup+0x20/0x20
      [   53.796979]  ? policy_nodemask+0x1a/0x90
      [   53.796982]  ? __mod_node_page_state+0x8d/0xa0
      [   53.796986]  ? __check_object_size+0xe7/0x240
      [   53.796989]  ? __sys_sendto+0x229/0x290
      [   53.796991]  ? rawv6_rcv_skb+0x160/0x160
      [   53.796993]  __sys_sendto+0x229/0x290
      [   53.796996]  ? __ia32_sys_getpeername+0x50/0x50
      [   53.796999]  ? commit_creds+0x2de/0x520
      [   53.797002]  ? security_capset+0x57/0x70
      [   53.797004]  ? __x64_sys_capset+0x29f/0x3e0
      [   53.797007]  ? __x64_sys_rt_sigsuspend+0xe0/0xe0
      [   53.797011]  ? __do_page_fault+0x664/0x770
      [   53.797014]  __x64_sys_sendto+0x74/0x90
      [   53.797017]  do_syscall_64+0x69/0x160
      [   53.797019]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   53.797022] RIP: 0033:0x7f43b7a6714a
      [   53.797023] RSP: 002b:00007ffd891bd368 EFLAGS: 00000246 ORIG_RAX:
      000000000000002c
      [   53.797026] RAX: ffffffffffffffda RBX: 00000000006129c0 RCX: 00007f43b7a6714a
      [   53.797028] RDX: 0000000000000040 RSI: 00000000006129c0 RDI: 0000000000000004
      [   53.797029] RBP: 00007ffd891be640 R08: 0000000000610940 R09: 000000000000001c
      [   53.797030] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
      [   53.797032] R13: 000000000060e6a0 R14: 0000000000008004 R15: 000000000040b661
      
      [   53.797171] Allocated by task 642:
      [   53.797460]  kasan_kmalloc+0xa0/0xd0
      [   53.797463]  kmem_cache_alloc+0xd2/0x1f0
      [   53.797465]  getname_flags+0x40/0x210
      [   53.797467]  user_path_at_empty+0x1d/0x40
      [   53.797469]  do_faccessat+0x12a/0x320
      [   53.797471]  do_syscall_64+0x69/0x160
      [   53.797473]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [   53.797607] Freed by task 642:
      [   53.797869]  __kasan_slab_free+0x130/0x180
      [   53.797871]  kmem_cache_free+0xa8/0x230
      [   53.797872]  filename_lookup+0x15b/0x230
      [   53.797874]  do_faccessat+0x12a/0x320
      [   53.797876]  do_syscall_64+0x69/0x160
      [   53.797878]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [   53.798014] The buggy address belongs to the object at ffff88011975e600
                      which belongs to the cache names_cache of size 4096
      [   53.799043] The buggy address is located 1786 bytes inside of
                      4096-byte region [ffff88011975e600, ffff88011975f600)
      [   53.800013] The buggy address belongs to the page:
      [   53.800414] page:ffffea000465d600 count:1 mapcount:0
      mapping:0000000000000000 index:0x0 compound_mapcount: 0
      [   53.801259] flags: 0x17fff0000008100(slab|head)
      [   53.801640] raw: 017fff0000008100 0000000000000000 0000000000000000
      0000000100070007
      [   53.803147] raw: dead000000000100 dead000000000200 ffff88011b185a40
      0000000000000000
      [   53.803787] page dumped because: kasan: bad access detected
      
      [   53.804384] Memory state around the buggy address:
      [   53.804788]  ffff88011975eb80: fb fb fb fb fb fb fb fb fb fb fb fb
      fb fb fb fb
      [   53.805384]  ffff88011975ec00: fb fb fb fb fb fb fb fb fb fb fb fb
      fb fb fb fb
      [   53.805979] >ffff88011975ec80: fb fb fb fb fb fb fb fb fb fb fb fb
      fb fb fb fb
      [   53.806577]                                                                 ^
      [   53.807165]  ffff88011975ed00: fb fb fb fb fb fb fb fb fb fb fb fb
      fb fb fb fb
      [   53.807762]  ffff88011975ed80: fb fb fb fb fb fb fb fb fb fb fb fb
      fb fb fb fb
      [   53.808356] ==================================================================
      [   53.808949] Disabling lock debugging due to kernel taint
      
      Fixes: 6c8702c6 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
      Signed-off-by: NDavid Lebrun <dlebrun@google.com>
      Signed-off-by: NMathieu Xhonneux <m.xhonneux@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bbb40a0b
    • F
      netfilter: nat: merge ipv4/ipv6 masquerade code into main nat module · 0168e8b3
      Florian Westphal 提交于
      Instead of using extra modules for these, turn the config options into
      an implicit dependency that adds masq feature to the protocol specific nf_nat module.
      
      before:
         text    data     bss     dec     hex filename
         2001     860       4    2865     b31 net/ipv4/netfilter/nf_nat_masquerade_ipv4.ko
         5579     780       2    6361    18d9 net/ipv4/netfilter/nf_nat_ipv4.ko
         2860     836       8    3704     e78 net/ipv6/netfilter/nf_nat_masquerade_ipv6.ko
         6648     780       2    7430    1d06 net/ipv6/netfilter/nf_nat_ipv6.ko
      
      after:
         text    data     bss     dec     hex filename
         7245     872       8    8125    1fbd net/ipv4/netfilter/nf_nat_ipv4.ko
         9165     848      12   10025    2729 net/ipv6/netfilter/nf_nat_ipv6.ko
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      0168e8b3
  11. 28 5月, 2018 1 次提交
    • A
      bpf: Hooks for sys_sendmsg · 1cedee13
      Andrey Ignatov 提交于
      In addition to already existing BPF hooks for sys_bind and sys_connect,
      the patch provides new hooks for sys_sendmsg.
      
      It leverages existing BPF program type `BPF_PROG_TYPE_CGROUP_SOCK_ADDR`
      that provides access to socket itlself (properties like family, type,
      protocol) and user-passed `struct sockaddr *` so that BPF program can
      override destination IP and port for system calls such as sendto(2) or
      sendmsg(2) and/or assign source IP to the socket.
      
      The hooks are implemented as two new attach types:
      `BPF_CGROUP_UDP4_SENDMSG` and `BPF_CGROUP_UDP6_SENDMSG` for UDPv4 and
      UDPv6 correspondingly.
      
      UDPv4 and UDPv6 separate attach types for same reason as sys_bind and
      sys_connect hooks, i.e. to prevent reading from / writing to e.g.
      user_ip6 fields when user passes sockaddr_in since it'd be out-of-bound.
      
      The difference with already existing hooks is sys_sendmsg are
      implemented only for unconnected UDP.
      
      For TCP it doesn't make sense to change user-provided `struct sockaddr *`
      at sendto(2)/sendmsg(2) time since socket either was already connected
      and has source/destination set or wasn't connected and call to
      sendto(2)/sendmsg(2) would lead to ENOTCONN anyway.
      
      Connected UDP is already handled by sys_connect hooks that can override
      source/destination at connect time and use fast-path later, i.e. these
      hooks don't affect UDP fast-path.
      
      Rewriting source IP is implemented differently than that in sys_connect
      hooks. When sys_sendmsg is used with unconnected UDP it doesn't work to
      just bind socket to desired local IP address since source IP can be set
      on per-packet basis by using ancillary data (cmsg(3)). So no matter if
      socket is bound or not, source IP has to be rewritten on every call to
      sys_sendmsg.
      
      To do so two new fields are added to UAPI `struct bpf_sock_addr`;
      * `msg_src_ip4` to set source IPv4 for UDPv4;
      * `msg_src_ip6` to set source IPv6 for UDPv6.
      Signed-off-by: NAndrey Ignatov <rdna@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      1cedee13
  12. 26 5月, 2018 2 次提交
  13. 25 5月, 2018 1 次提交
  14. 24 5月, 2018 5 次提交
    • M
      ipv6: sr: Add seg6local action End.BPF · 004d4b27
      Mathieu Xhonneux 提交于
      This patch adds the End.BPF action to the LWT seg6local infrastructure.
      This action works like any other seg6local End action, meaning that an IPv6
      header with SRH is needed, whose DA has to be equal to the SID of the
      action. It will also advance the SRH to the next segment, the BPF program
      does not have to take care of this.
      
      Since the BPF program may not be a source of instability in the kernel, it
      is important to ensure that the integrity of the packet is maintained
      before yielding it back to the IPv6 layer. The hook hence keeps track if
      the SRH has been altered through the helpers, and re-validates its
      content if needed with seg6_validate_srh. The state kept for validation is
      stored in a per-CPU buffer. The BPF program is not allowed to directly
      write into the packet, and only some fields of the SRH can be altered
      through the helper bpf_lwt_seg6_store_bytes.
      
      Performances profiling has shown that the SRH re-validation does not induce
      a significant overhead. If the altered SRH is deemed as invalid, the packet
      is dropped.
      
      This validation is also done before executing any action through
      bpf_lwt_seg6_action, and will not be performed again if the SRH is not
      modified after calling the action.
      
      The BPF program may return 3 types of return codes:
          - BPF_OK: the End.BPF action will look up the next destination through
                   seg6_lookup_nexthop.
          - BPF_REDIRECT: if an action has been executed through the
                bpf_lwt_seg6_action helper, the BPF program should return this
                value, as the skb's destination is already set and the default
                lookup should not be performed.
          - BPF_DROP : the packet will be dropped.
      Signed-off-by: NMathieu Xhonneux <m.xhonneux@gmail.com>
      Acked-by: NDavid Lebrun <dlebrun@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      004d4b27
    • M
      bpf: Add IPv6 Segment Routing helpers · fe94cc29
      Mathieu Xhonneux 提交于
      The BPF seg6local hook should be powerful enough to enable users to
      implement most of the use-cases one could think of. After some thinking,
      we figured out that the following actions should be possible on a SRv6
      packet, requiring 3 specific helpers :
          - bpf_lwt_seg6_store_bytes: Modify non-sensitive fields of the SRH
          - bpf_lwt_seg6_adjust_srh: Allow to grow or shrink a SRH
                                     (to add/delete TLVs)
          - bpf_lwt_seg6_action: Apply some SRv6 network programming actions
                                 (specifically End.X, End.T, End.B6 and
                                  End.B6.Encap)
      
      The specifications of these helpers are provided in the patch (see
      include/uapi/linux/bpf.h).
      
      The non-sensitive fields of the SRH are the following : flags, tag and
      TLVs. The other fields can not be modified, to maintain the SRH
      integrity. Flags, tag and TLVs can easily be modified as their validity
      can be checked afterwards via seg6_validate_srh. It is not allowed to
      modify the segments directly. If one wants to add segments on the path,
      he should stack a new SRH using the End.B6 action via
      bpf_lwt_seg6_action.
      
      Growing, shrinking or editing TLVs via the helpers will flag the SRH as
      invalid, and it will have to be re-validated before re-entering the IPv6
      layer. This flag is stored in a per-CPU buffer, along with the current
      header length in bytes.
      
      Storing the SRH len in bytes in the control block is mandatory when using
      bpf_lwt_seg6_adjust_srh. The Header Ext. Length field contains the SRH
      len rounded to 8 bytes (a padding TLV can be inserted to ensure the 8-bytes
      boundary). When adding/deleting TLVs within the BPF program, the SRH may
      temporary be in an invalid state where its length cannot be rounded to 8
      bytes without remainder, hence the need to store the length in bytes
      separately. The caller of the BPF program can then ensure that the SRH's
      final length is valid using this value. Again, a final SRH modified by a
      BPF program which doesn’t respect the 8-bytes boundary will be discarded
      as it will be considered as invalid.
      
      Finally, a fourth helper is provided, bpf_lwt_push_encap, which is
      available from the LWT BPF IN hook, but not from the seg6local BPF one.
      This helper allows to encapsulate a Segment Routing Header (either with
      a new outer IPv6 header, or by inlining it directly in the existing IPv6
      header) into a non-SRv6 packet. This helper is required if we want to
      offer the possibility to dynamically encapsulate a SRH for non-SRv6 packet,
      as the BPF seg6local hook only works on traffic already containing a SRH.
      This is the BPF equivalent of the seg6 LWT infrastructure, which achieves
      the same purpose but with a static SRH per route.
      
      These helpers require CONFIG_IPV6=y (and not =m).
      Signed-off-by: NMathieu Xhonneux <m.xhonneux@gmail.com>
      Acked-by: NDavid Lebrun <dlebrun@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      fe94cc29
    • M
      ipv6: sr: export function lookup_nexthop · 1c1e761e
      Mathieu Xhonneux 提交于
      The function lookup_nexthop is essential to implement most of the seg6local
      actions. As we want to provide a BPF helper allowing to apply some of these
      actions on the packet being processed, the helper should be able to call
      this function, hence the need to make it public.
      
      Moreover, if one argument is incorrect or if the next hop can not be found,
      an error should be returned by the BPF helper so the BPF program can adapt
      its processing of the packet (return an error, properly force the drop,
      ...). This patch hence makes this function return dst->error to indicate a
      possible error.
      Signed-off-by: NMathieu Xhonneux <m.xhonneux@gmail.com>
      Acked-by: NDavid Lebrun <dlebrun@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      1c1e761e
    • R
      ipv6: support sport, dport and ip_proto in RTM_GETROUTE · eacb9384
      Roopa Prabhu 提交于
      This is a followup to fib6 rules sport, dport and ipproto
      match support. Only supports tcp, udp and icmp for ipproto.
      Used by fib rule self tests.
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eacb9384
    • W
      udp: exclude gso from xfrm paths · ff06342c
      Willem de Bruijn 提交于
      UDP GSO delays final datagram construction to the GSO layer. This
      conflicts with protocol transformations.
      
      Fixes: bec1f6f6 ("udp: generate gso with UDP_SEGMENT")
      CC: Michal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff06342c
  15. 23 5月, 2018 6 次提交
    • V
      netfilter: ip6t_rpfilter: provide input interface for route lookup · cede24d1
      Vincent Bernat 提交于
      In commit 47b7e7f8, this bit was removed at the same time the
      RT6_LOOKUP_F_IFACE flag was removed. However, it is needed when
      link-local addresses are used, which is a very common case: when
      packets are routed, neighbor solicitations are done using link-local
      addresses. For example, the following neighbor solicitation is not
      matched by "-m rpfilter":
      
          IP6 fe80::5254:33ff:fe00:1 > ff02::1:ff00:3: ICMP6, neighbor
          solicitation, who has 2001:db8::5254:33ff:fe00:3, length 32
      
      Commit 47b7e7f8 doesn't quite explain why we shouldn't use
      RT6_LOOKUP_F_IFACE in the rpfilter case. I suppose the interface check
      later in the function would make it redundant. However, the remaining
      of the routing code is using RT6_LOOKUP_F_IFACE when there is no
      source address (which matches rpfilter's case with a non-unicast
      destination, like with neighbor solicitation).
      Signed-off-by: NVincent Bernat <vincent@bernat.im>
      Fixes: 47b7e7f8 ("netfilter: don't set F_IFACE on ipv6 fib lookups")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      cede24d1
    • F
      netfilter: nf_nat: add nat type hooks to nat core · 9971a514
      Florian Westphal 提交于
      Currently the packet rewrite and instantiation of nat NULL bindings
      happens from the protocol specific nat backend.
      
      Invocation occurs either via ip(6)table_nat or the nf_tables nat chain type.
      
      Invocation looks like this (simplified):
      NF_HOOK()
         |
         `---iptable_nat
      	 |
      	 `---> nf_nat_l3proto_ipv4 -> nf_nat_packet
      	               |
                new packet? pass skb though iptables nat chain
                             |
      		       `---> iptable_nat: ipt_do_table
      
      In nft case, this looks the same (nft_chain_nat_ipv4 instead of
      iptable_nat).
      
      This is a problem for two reasons:
      1. Can't use iptables nat and nf_tables nat at the same time,
         as the first user adds a nat binding (nf_nat_l3proto_ipv4 adds a
         NULL binding if do_table() did not find a matching nat rule so we
         can detect post-nat tuple collisions).
      2. If you use e.g. nft_masq, snat, redir, etc. uses must also register
         an empty base chain so that the nat core gets called fro NF_HOOK()
         to do the reverse translation, which is neither obvious nor user
         friendly.
      
      After this change, the base hook gets registered not from iptable_nat or
      nftables nat hooks, but from the l3 nat core.
      
      iptables/nft nat base hooks get registered with the nat core instead:
      
      NF_HOOK()
         |
         `---> nf_nat_l3proto_ipv4 -> nf_nat_packet
      		|
               new packet? pass skb through iptables/nftables nat chains
                      |
      		+-> iptables_nat: ipt_do_table
      	        +-> nft nat chain x
      	        `-> nft nat chain y
      
      The nat core deals with null bindings and reverse translation.
      When no mapping exists, it calls the registered nat lookup hooks until
      one creates a new mapping.
      If both iptables and nftables nat hooks exist, the first matching
      one is used (i.e., higher priority wins).
      
      Also, nft users do not need to create empty nat hooks anymore,
      nat core always registers the base hooks that take care of reverse/reply
      translation.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      9971a514
    • F
      netfilter: nf_tables: allow chain type to override hook register · 4e25ceb8
      Florian Westphal 提交于
      Will be used in followup patch when nat types no longer
      use nf_register_net_hook() but will instead register with the nat core.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      4e25ceb8
    • F
      netfilter: xtables: allow table definitions not backed by hook_ops · ba7d284a
      Florian Westphal 提交于
      The ip(6)tables nat table is currently receiving skbs from the netfilter
      core, after a followup patch skbs will be coming from the netfilter nat
      core instead, so the table is no longer backed by normal hook_ops.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ba7d284a
    • F
      netfilter: nf_nat: move common nat code to nat core · 1f55236b
      Florian Westphal 提交于
      Copy-pasted, both l3 helpers almost use same code here.
      Split out the common part into an 'inet' helper.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      1f55236b
    • D
      net/ipv6: Simplify route replace and appending into multipath route · f34436a4
      David Ahern 提交于
      Bring consistency to ipv6 route replace and append semantics.
      
      Remove rt6_qualify_for_ecmp which is just guess work. It fails in 2 cases:
      1. can not replace a route with a reject route. Existing code appends
         a new route instead of replacing the existing one.
      
      2. can not have a multipath route where a leg uses a dev only nexthop
      
      Existing use cases affected by this change:
      1. adding a route with existing prefix and metric using NLM_F_CREATE
         without NLM_F_APPEND or NLM_F_EXCL (ie., what iproute2 calls
         'prepend'). Existing code auto-determines that the new nexthop can
         be appended to an existing route to create a multipath route. This
         change breaks that by requiring the APPEND flag for the new route
         to be added to an existing one. Instead the prepend just adds another
         route entry.
      
      2. route replace. Existing code replaces first matching multipath route
         if new route is multipath capable and fallback to first matching
         non-ECMP route (reject or dev only route) in case one isn't available.
         New behavior replaces first matching route. (Thanks to Ido for spotting
         this one)
      
      Note: Newer iproute2 is needed to display multipath routes with a dev-only
            nexthop. This is due to a bug in iproute2 and parsing nexthops.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f34436a4
  16. 22 5月, 2018 1 次提交
  17. 20 5月, 2018 1 次提交
    • W
      net: ip6_gre: fix tunnel metadata device sharing. · b80d0b93
      William Tu 提交于
      Currently ip6gre and ip6erspan share single metadata mode device,
      using 'collect_md_tun'.  Thus, when doing:
        ip link add dev ip6gre11 type ip6gretap external
        ip link add dev ip6erspan12 type ip6erspan external
        RTNETLINK answers: File exists
      simply fails due to the 2nd tries to create the same collect_md_tun.
      
      The patch fixes it by adding a separate collect md tunnel device
      for the ip6erspan, 'collect_md_tun_erspan'.  As a result, a couple
      of places need to refactor/split up in order to distinguish ip6gre
      and ip6erspan.
      
      First, move the collect_md check at ip6gre_tunnel_{unlink,link} and
      create separate function {ip6gre,ip6ersapn}_tunnel_{link_md,unlink_md}.
      Then before link/unlink, make sure the link_md/unlink_md is called.
      Finally, a separate ndo_uninit is created for ip6erspan.  Tested it
      using the samples/bpf/test_tunnel_bpf.sh.
      
      Fixes: ef7baf5e ("ip6_gre: add ip6 erspan collect_md mode")
      Signed-off-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b80d0b93