1. 18 12月, 2018 1 次提交
  2. 16 12月, 2018 1 次提交
  3. 15 12月, 2018 1 次提交
  4. 08 12月, 2018 2 次提交
    • S
      ipv6: Check available headroom in ip6_xmit() even without options · 66033f47
      Stefano Brivio 提交于
      Even if we send an IPv6 packet without options, MAX_HEADER might not be
      enough to account for the additional headroom required by alignment of
      hardware headers.
      
      On a configuration without HYPERV_NET, WLAN, AX25, and with IPV6_TUNNEL,
      sending short SCTP packets over IPv4 over L2TP over IPv6, we start with
      100 bytes of allocated headroom in sctp_packet_transmit(), end up with 54
      bytes after l2tp_xmit_skb(), and 14 bytes in ip6_finish_output2().
      
      Those would be enough to append our 14 bytes header, but we're going to
      align that to 16 bytes, and write 2 bytes out of the allocated slab in
      neigh_hh_output().
      
      KASan says:
      
      [  264.967848] ==================================================================
      [  264.967861] BUG: KASAN: slab-out-of-bounds in ip6_finish_output2+0x1aec/0x1c70
      [  264.967866] Write of size 16 at addr 000000006af1c7fe by task netperf/6201
      [  264.967870]
      [  264.967876] CPU: 0 PID: 6201 Comm: netperf Not tainted 4.20.0-rc4+ #1
      [  264.967881] Hardware name: IBM 2827 H43 400 (z/VM 6.4.0)
      [  264.967887] Call Trace:
      [  264.967896] ([<00000000001347d6>] show_stack+0x56/0xa0)
      [  264.967903]  [<00000000017e379c>] dump_stack+0x23c/0x290
      [  264.967912]  [<00000000007bc594>] print_address_description+0xf4/0x290
      [  264.967919]  [<00000000007bc8fc>] kasan_report+0x13c/0x240
      [  264.967927]  [<000000000162f5e4>] ip6_finish_output2+0x1aec/0x1c70
      [  264.967935]  [<000000000163f890>] ip6_finish_output+0x430/0x7f0
      [  264.967943]  [<000000000163fe44>] ip6_output+0x1f4/0x580
      [  264.967953]  [<000000000163882a>] ip6_xmit+0xfea/0x1ce8
      [  264.967963]  [<00000000017396e2>] inet6_csk_xmit+0x282/0x3f8
      [  264.968033]  [<000003ff805fb0ba>] l2tp_xmit_skb+0xe02/0x13e0 [l2tp_core]
      [  264.968037]  [<000003ff80631192>] l2tp_eth_dev_xmit+0xda/0x150 [l2tp_eth]
      [  264.968041]  [<0000000001220020>] dev_hard_start_xmit+0x268/0x928
      [  264.968069]  [<0000000001330e8e>] sch_direct_xmit+0x7ae/0x1350
      [  264.968071]  [<000000000122359c>] __dev_queue_xmit+0x2b7c/0x3478
      [  264.968075]  [<00000000013d2862>] ip_finish_output2+0xce2/0x11a0
      [  264.968078]  [<00000000013d9b14>] ip_finish_output+0x56c/0x8c8
      [  264.968081]  [<00000000013ddd1e>] ip_output+0x226/0x4c0
      [  264.968083]  [<00000000013dbd6c>] __ip_queue_xmit+0x894/0x1938
      [  264.968100]  [<000003ff80bc3a5c>] sctp_packet_transmit+0x29d4/0x3648 [sctp]
      [  264.968116]  [<000003ff80b7bf68>] sctp_outq_flush_ctrl.constprop.5+0x8d0/0xe50 [sctp]
      [  264.968131]  [<000003ff80b7c716>] sctp_outq_flush+0x22e/0x7d8 [sctp]
      [  264.968146]  [<000003ff80b35c68>] sctp_cmd_interpreter.isra.16+0x530/0x6800 [sctp]
      [  264.968161]  [<000003ff80b3410a>] sctp_do_sm+0x222/0x648 [sctp]
      [  264.968177]  [<000003ff80bbddac>] sctp_primitive_ASSOCIATE+0xbc/0xf8 [sctp]
      [  264.968192]  [<000003ff80b93328>] __sctp_connect+0x830/0xc20 [sctp]
      [  264.968208]  [<000003ff80bb11ce>] sctp_inet_connect+0x2e6/0x378 [sctp]
      [  264.968212]  [<0000000001197942>] __sys_connect+0x21a/0x450
      [  264.968215]  [<000000000119aff8>] sys_socketcall+0x3d0/0xb08
      [  264.968218]  [<000000000184ea7a>] system_call+0x2a2/0x2c0
      
      [...]
      
      Just like ip_finish_output2() does for IPv4, check that we have enough
      headroom in ip6_xmit(), and reallocate it if we don't.
      
      This issue is older than git history.
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66033f47
    • S
      ipv6: sr: properly initialize flowi6 prior passing to ip6_route_output · 1b4e5ad5
      Shmulik Ladkani 提交于
      In 'seg6_output', stack variable 'struct flowi6 fl6' was missing
      initialization.
      
      Fixes: 6c8702c6 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
      Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b4e5ad5
  5. 06 12月, 2018 2 次提交
    • J
      ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes · ebaf39e6
      Jiri Wiesner 提交于
      The *_frag_reasm() functions are susceptible to miscalculating the byte
      count of packet fragments in case the truesize of a head buffer changes.
      The truesize member may be changed by the call to skb_unclone(), leaving
      the fragment memory limit counter unbalanced even if all fragments are
      processed. This miscalculation goes unnoticed as long as the network
      namespace which holds the counter is not destroyed.
      
      Should an attempt be made to destroy a network namespace that holds an
      unbalanced fragment memory limit counter the cleanup of the namespace
      never finishes. The thread handling the cleanup gets stuck in
      inet_frags_exit_net() waiting for the percpu counter to reach zero. The
      thread is usually in running state with a stacktrace similar to:
      
       PID: 1073   TASK: ffff880626711440  CPU: 1   COMMAND: "kworker/u48:4"
        #5 [ffff880621563d48] _raw_spin_lock at ffffffff815f5480
        #6 [ffff880621563d48] inet_evict_bucket at ffffffff8158020b
        #7 [ffff880621563d80] inet_frags_exit_net at ffffffff8158051c
        #8 [ffff880621563db0] ops_exit_list at ffffffff814f5856
        #9 [ffff880621563dd8] cleanup_net at ffffffff814f67c0
       #10 [ffff880621563e38] process_one_work at ffffffff81096f14
      
      It is not possible to create new network namespaces, and processes
      that call unshare() end up being stuck in uninterruptible sleep state
      waiting to acquire the net_mutex.
      
      The bug was observed in the IPv6 netfilter code by Per Sundstrom.
      I thank him for his analysis of the problem. The parts of this patch
      that apply to IPv4 and IPv6 fragment reassembly are preemptive measures.
      Signed-off-by: NJiri Wiesner <jwiesner@suse.com>
      Reported-by: NPer Sundstrom <per.sundstrom@redqube.se>
      Acked-by: NPeter Oskolkov <posk@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ebaf39e6
    • E
      net: use skb_list_del_init() to remove from RX sublists · 22f6bbb7
      Edward Cree 提交于
      list_del() leaves the skb->next pointer poisoned, which can then lead to
       a crash in e.g. OVS forwarding.  For example, setting up an OVS VXLAN
       forwarding bridge on sfc as per:
      
      ========
      $ ovs-vsctl show
      5dfd9c47-f04b-4aaa-aa96-4fbb0a522a30
          Bridge "br0"
              Port "br0"
                  Interface "br0"
                      type: internal
              Port "enp6s0f0"
                  Interface "enp6s0f0"
              Port "vxlan0"
                  Interface "vxlan0"
                      type: vxlan
                      options: {key="1", local_ip="10.0.0.5", remote_ip="10.0.0.4"}
          ovs_version: "2.5.0"
      ========
      (where 10.0.0.5 is an address on enp6s0f1)
      and sending traffic across it will lead to the following panic:
      ========
      general protection fault: 0000 [#1] SMP PTI
      CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.20.0-rc3-ehc+ #701
      Hardware name: Dell Inc. PowerEdge R710/0M233H, BIOS 6.4.0 07/23/2013
      RIP: 0010:dev_hard_start_xmit+0x38/0x200
      Code: 53 48 89 fb 48 83 ec 20 48 85 ff 48 89 54 24 08 48 89 4c 24 18 0f 84 ab 01 00 00 48 8d 86 90 00 00 00 48 89 f5 48 89 44 24 10 <4c> 8b 33 48 c7 03 00 00 00 00 48 8b 05 c7 d1 b3 00 4d 85 f6 0f 95
      RSP: 0018:ffff888627b437e0 EFLAGS: 00010202
      RAX: 0000000000000000 RBX: dead000000000100 RCX: ffff88862279c000
      RDX: ffff888614a342c0 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: ffff888618a88000 R08: 0000000000000001 R09: 00000000000003e8
      R10: 0000000000000000 R11: ffff888614a34140 R12: 0000000000000000
      R13: 0000000000000062 R14: dead000000000100 R15: ffff888616430000
      FS:  0000000000000000(0000) GS:ffff888627b40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f6d2bc6d000 CR3: 000000000200a000 CR4: 00000000000006e0
      Call Trace:
       <IRQ>
       __dev_queue_xmit+0x623/0x870
       ? masked_flow_lookup+0xf7/0x220 [openvswitch]
       ? ep_poll_callback+0x101/0x310
       do_execute_actions+0xaba/0xaf0 [openvswitch]
       ? __wake_up_common+0x8a/0x150
       ? __wake_up_common_lock+0x87/0xc0
       ? queue_userspace_packet+0x31c/0x5b0 [openvswitch]
       ovs_execute_actions+0x47/0x120 [openvswitch]
       ovs_dp_process_packet+0x7d/0x110 [openvswitch]
       ovs_vport_receive+0x6e/0xd0 [openvswitch]
       ? dst_alloc+0x64/0x90
       ? rt_dst_alloc+0x50/0xd0
       ? ip_route_input_slow+0x19a/0x9a0
       ? __udp_enqueue_schedule_skb+0x198/0x1b0
       ? __udp4_lib_rcv+0x856/0xa30
       ? __udp4_lib_rcv+0x856/0xa30
       ? cpumask_next_and+0x19/0x20
       ? find_busiest_group+0x12d/0xcd0
       netdev_frame_hook+0xce/0x150 [openvswitch]
       __netif_receive_skb_core+0x205/0xae0
       __netif_receive_skb_list_core+0x11e/0x220
       netif_receive_skb_list+0x203/0x460
       ? __efx_rx_packet+0x335/0x5e0 [sfc]
       efx_poll+0x182/0x320 [sfc]
       net_rx_action+0x294/0x3c0
       __do_softirq+0xca/0x297
       irq_exit+0xa6/0xb0
       do_IRQ+0x54/0xd0
       common_interrupt+0xf/0xf
       </IRQ>
      ========
      So, in all listified-receive handling, instead pull skbs off the lists with
       skb_list_del_init().
      
      Fixes: 9af86f93 ("net: core: fix use-after-free in __netif_receive_skb_list_core")
      Fixes: 7da517a3 ("net: core: Another step of skb receive list processing")
      Fixes: a4ca8b7d ("net: ipv4: fix drop handling in ip_list_rcv() and ip_list_rcv_finish()")
      Fixes: d8269e2c ("net: ipv6: listify ipv6_rcv() and ip6_rcv_finish()")
      Signed-off-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22f6bbb7
  6. 27 11月, 2018 3 次提交
    • T
      netfilter: nat: fix double register in masquerade modules · 095faf45
      Taehee Yoo 提交于
      There is a reference counter to ensure that masquerade modules register
      notifiers only once. However, the existing reference counter approach is
      not safe, test commands are:
      
         while :
         do
         	   modprobe ip6t_MASQUERADE &
      	   modprobe nft_masq_ipv6 &
      	   modprobe -rv ip6t_MASQUERADE &
      	   modprobe -rv nft_masq_ipv6 &
         done
      
      numbers below represent the reference counter.
      --------------------------------------------------------
      CPU0        CPU1        CPU2        CPU3        CPU4
      [insmod]    [insmod]    [rmmod]     [rmmod]     [insmod]
      --------------------------------------------------------
      0->1
      register    1->2
                  returns     2->1
      			returns     1->0
                                                      0->1
                                                      register <--
                                          unregister
      --------------------------------------------------------
      
      The unregistation of CPU3 should be processed before the
      registration of CPU4.
      
      In order to fix this, use a mutex instead of reference counter.
      
      splat looks like:
      [  323.869557] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:1381]
      [  323.869574] Modules linked in: nf_tables(+) nf_nat_ipv6(-) nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 n]
      [  323.869574] irq event stamp: 194074
      [  323.898930] hardirqs last  enabled at (194073): [<ffffffff90004a0d>] trace_hardirqs_on_thunk+0x1a/0x1c
      [  323.898930] hardirqs last disabled at (194074): [<ffffffff90004a29>] trace_hardirqs_off_thunk+0x1a/0x1c
      [  323.898930] softirqs last  enabled at (182132): [<ffffffff922006ec>] __do_softirq+0x6ec/0xa3b
      [  323.898930] softirqs last disabled at (182109): [<ffffffff90193426>] irq_exit+0x1a6/0x1e0
      [  323.898930] CPU: 0 PID: 1381 Comm: modprobe Not tainted 4.20.0-rc2+ #27
      [  323.898930] RIP: 0010:raw_notifier_chain_register+0xea/0x240
      [  323.898930] Code: 3c 03 0f 8e f2 00 00 00 44 3b 6b 10 7f 4d 49 bc 00 00 00 00 00 fc ff df eb 22 48 8d 7b 10 488
      [  323.898930] RSP: 0018:ffff888101597218 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
      [  323.898930] RAX: 0000000000000000 RBX: ffffffffc04361c0 RCX: 0000000000000000
      [  323.898930] RDX: 1ffffffff26132ae RSI: ffffffffc04aa3c0 RDI: ffffffffc04361d0
      [  323.898930] RBP: ffffffffc04361c8 R08: 0000000000000000 R09: 0000000000000001
      [  323.898930] R10: ffff8881015972b0 R11: fffffbfff26132c4 R12: dffffc0000000000
      [  323.898930] R13: 0000000000000000 R14: 1ffff110202b2e44 R15: ffffffffc04aa3c0
      [  323.898930] FS:  00007f813ed41540(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000
      [  323.898930] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  323.898930] CR2: 0000559bf2c9f120 CR3: 000000010bc80000 CR4: 00000000001006f0
      [  323.898930] Call Trace:
      [  323.898930]  ? atomic_notifier_chain_register+0x2d0/0x2d0
      [  323.898930]  ? down_read+0x150/0x150
      [  323.898930]  ? sched_clock_cpu+0x126/0x170
      [  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
      [  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
      [  323.898930]  register_netdevice_notifier+0xbb/0x790
      [  323.898930]  ? __dev_close_many+0x2d0/0x2d0
      [  323.898930]  ? __mutex_unlock_slowpath+0x17f/0x740
      [  323.898930]  ? wait_for_completion+0x710/0x710
      [  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
      [  323.898930]  ? up_write+0x6c/0x210
      [  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
      [  324.127073]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
      [  324.127073]  nft_chain_filter_init+0x1e/0xe8a [nf_tables]
      [  324.127073]  nf_tables_module_init+0x37/0x92 [nf_tables]
      [ ... ]
      
      Fixes: 8dd33cc9 ("netfilter: nf_nat: generalize IPv4 masquerading support for nf_tables")
      Fixes: be6b635c ("netfilter: nf_nat: generalize IPv6 masquerading support for nf_tables")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      095faf45
    • T
      netfilter: add missing error handling code for register functions · 584eab29
      Taehee Yoo 提交于
      register_{netdevice/inetaddr/inet6addr}_notifier may return an error
      value, this patch adds the code to handle these error paths.
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      584eab29
    • A
      netfilter: ipv6: Preserve link scope traffic original oif · 508b0904
      Alin Nastac 提交于
      When ip6_route_me_harder is invoked, it resets outgoing interface of:
        - link-local scoped packets sent by neighbor discovery
        - multicast packets sent by MLD host
        - multicast packets send by MLD proxy daemon that sets outgoing
          interface through IPV6_PKTINFO ipi6_ifindex
      
      Link-local and multicast packets must keep their original oif after
      ip6_route_me_harder is called.
      Signed-off-by: NAlin Nastac <alin.nastac@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      508b0904
  7. 25 11月, 2018 1 次提交
    • W
      net: always initialize pagedlen · aba36930
      Willem de Bruijn 提交于
      In ip packet generation, pagedlen is initialized for each skb at the
      start of the loop in __ip(6)_append_data, before label alloc_new_skb.
      
      Depending on compiler options, code can be generated that jumps to
      this label, triggering use of an an uninitialized variable.
      
      In practice, at -O2, the generated code moves the initialization below
      the label. But the code should not rely on that for correctness.
      
      Fixes: 15e36f5b ("udp: paged allocation with gso")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aba36930
  8. 24 11月, 2018 1 次提交
    • H
      net/ipv6: re-do dad when interface has IFF_NOARP flag change · 896585d4
      Hangbin Liu 提交于
      When we add a new IPv6 address, we should also join corresponding solicited-node
      multicast address, unless the interface has IFF_NOARP flag, as function
      addrconf_join_solict() did. But if we remove IFF_NOARP flag later, we do
      not do dad and add the mcast address. So we will drop corresponding neighbour
      discovery message that came from other nodes.
      
      A typical example is after creating a ipvlan with mode l3, setting up an ipv6
      address and changing the mode to l2. Then we will not be able to ping this
      address as the interface doesn't join related solicited-node mcast address.
      
      Fix it by re-doing dad when interface changed IFF_NOARP flag. Then we will add
      corresponding mcast group and check if there is a duplicate address on the
      network.
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      896585d4
  9. 19 11月, 2018 1 次提交
  10. 17 11月, 2018 1 次提交
    • X
      ipv6: fix a dst leak when removing its exception · 761f6026
      Xin Long 提交于
      These is no need to hold dst before calling rt6_remove_exception_rt().
      The call to dst_hold_safe() in ip6_link_failure() was for ip6_del_rt(),
      which has been removed in Commit 93531c67 ("net/ipv6: separate
      handling of FIB entries from dst based routes"). Otherwise, it will
      cause a dst leak.
      
      This patch is to simply remove the dst_hold_safe() call before calling
      rt6_remove_exception_rt() and also do the same in ip6_del_cached_rt().
      It's safe, because the removal of the exception that holds its dst's
      refcnt is protected by rt6_exception_lock.
      
      Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes")
      Fixes: 23fb93a4 ("net/ipv6: Cleanup exception and cache route handling")
      Reported-by: NLi Shuang <shuali@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      761f6026
  11. 06 11月, 2018 2 次提交
  12. 03 11月, 2018 1 次提交
    • J
      net/ipv6: Add anycast addresses to a global hashtable · 2384d025
      Jeff Barnhill 提交于
      icmp6_send() function is expensive on systems with a large number of
      interfaces. Every time it’s called, it has to verify that the source
      address does not correspond to an existing anycast address by looping
      through every device and every anycast address on the device.  This can
      result in significant delays for a CPU when there are a large number of
      neighbors and ND timers are frequently timing out and calling
      neigh_invalidate().
      
      Add anycast addresses to a global hashtable to allow quick searching for
      matching anycast addresses.  This is based on inet6_addr_lst in addrconf.c.
      Signed-off-by: NJeff Barnhill <0xeffeff@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2384d025
  13. 27 10月, 2018 2 次提交
    • M
      net: allow traceroute with a specified interface in a vrf · f64bf6b8
      Mike Manning 提交于
      Traceroute executed in a vrf succeeds if no device is given or if the
      vrf is given as the device, but fails if the interface is given as the
      device. This is for default UDP probes, it succeeds for TCP SYN or ICMP
      ECHO probes. As the skb bound dev is the interface and the sk dev is
      the vrf, sk lookup fails for ICMP_DEST_UNREACH and ICMP_TIME_EXCEEDED
      messages. The solution is for the secondary dev to be passed so that
      the interface is available for the device match to succeed, in the same
      way as is already done for non-error cases.
      Signed-off-by: NMike Manning <mmanning@vyatta.att-mail.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f64bf6b8
    • S
      ipv6/ndisc: Preserve IPv6 control buffer if protocol error handlers are called · ee1abcf6
      Stefano Brivio 提交于
      Commit a61bbcf2 ("[NET]: Store skb->timestamp as offset to a base
      timestamp") introduces a neighbour control buffer and zeroes it out in
      ndisc_rcv(), as ndisc_recv_ns() uses it.
      
      Commit f2776ff0 ("[IPV6]: Fix address/interface handling in UDP and
      DCCP, according to the scoping architecture.") introduces the usage of the
      IPv6 control buffer in protocol error handlers (e.g. inet6_iif() in
      present-day __udp6_lib_err()).
      
      Now, with commit b94f1c09 ("ipv6: Use icmpv6_notify() to propagate
      redirect, instead of rt6_redirect()."), we call protocol error handlers
      from ndisc_redirect_rcv(), after the control buffer is already stolen and
      some parts are already zeroed out. This implies that inet6_iif() on this
      path will always return zero.
      
      This gives unexpected results on UDP socket lookup in __udp6_lib_err(), as
      we might actually need to match sockets for a given interface.
      
      Instead of always claiming the control buffer in ndisc_rcv(), do that only
      when needed.
      
      Fixes: b94f1c09 ("ipv6: Use icmpv6_notify() to propagate redirect, instead of rt6_redirect().")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee1abcf6
  14. 26 10月, 2018 1 次提交
  15. 25 10月, 2018 5 次提交
    • F
      netfilter: ipv6: fix oops when defragmenting locally generated fragments · 61792b67
      Florian Westphal 提交于
      Unlike ipv4 and normal ipv6 defrag, netfilter ipv6 defragmentation did
      not save/restore skb->dst.
      
      This causes oops when handling locally generated ipv6 fragments, as
      output path needs a valid dst.
      Reported-by: NMaciej Żenczykowski <zenczykowski@gmail.com>
      Fixes: 84379c9a ("netfilter: ipv6: nf_defrag: drop skb dst before queueing")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      61792b67
    • D
      net/ipv6: Allow onlink routes to have a device mismatch if it is the default route · 4ed591c8
      David Ahern 提交于
      The intent of ip6_route_check_nh_onlink is to make sure the gateway
      given for an onlink route is not actually on a connected route for
      a different interface (e.g., 2001:db8:1::/64 is on dev eth1 and then
      an onlink route has a via 2001:db8:1::1 dev eth2). If the gateway
      lookup hits the default route then it most likely will be a different
      interface than the onlink route which is ok.
      
      Update ip6_route_check_nh_onlink to disregard the device mismatch
      if the gateway lookup hits the default route. Turns out the existing
      onlink tests are passing because there is no default route or it is
      an unreachable default, so update the onlink tests to have a default
      route other than unreachable.
      
      Fixes: fc1e64e1 ("net/ipv6: Add support for onlink flag")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ed591c8
    • S
      net: udp: fix handling of CHECKSUM_COMPLETE packets · db4f1be3
      Sean Tranchetti 提交于
      Current handling of CHECKSUM_COMPLETE packets by the UDP stack is
      incorrect for any packet that has an incorrect checksum value.
      
      udp4/6_csum_init() will both make a call to
      __skb_checksum_validate_complete() to initialize/validate the csum
      field when receiving a CHECKSUM_COMPLETE packet. When this packet
      fails validation, skb->csum will be overwritten with the pseudoheader
      checksum so the packet can be fully validated by software, but the
      skb->ip_summed value will be left as CHECKSUM_COMPLETE so that way
      the stack can later warn the user about their hardware spewing bad
      checksums. Unfortunately, leaving the SKB in this state can cause
      problems later on in the checksum calculation.
      
      Since the the packet is still marked as CHECKSUM_COMPLETE,
      udp_csum_pull_header() will SUBTRACT the checksum of the UDP header
      from skb->csum instead of adding it, leaving us with a garbage value
      in that field. Once we try to copy the packet to userspace in the
      udp4/6_recvmsg(), we'll make a call to skb_copy_and_csum_datagram_msg()
      to checksum the packet data and add it in the garbage skb->csum value
      to perform our final validation check.
      
      Since the value we're validating is not the proper checksum, it's possible
      that the folded value could come out to 0, causing us not to drop the
      packet. Instead, we believe that the packet was checksummed incorrectly
      by hardware since skb->ip_summed is still CHECKSUM_COMPLETE, and we attempt
      to warn the user with netdev_rx_csum_fault(skb->dev);
      
      Unfortunately, since this is the UDP path, skb->dev has been overwritten
      by skb->dev_scratch and is no longer a valid pointer, so we end up
      reading invalid memory.
      
      This patch addresses this problem in two ways:
      	1) Do not use the dev pointer when calling netdev_rx_csum_fault()
      	   from skb_copy_and_csum_datagram_msg(). Since this gets called
      	   from the UDP path where skb->dev has been overwritten, we have
      	   no way of knowing if the pointer is still valid. Also for the
      	   sake of consistency with the other uses of
      	   netdev_rx_csum_fault(), don't attempt to call it if the
      	   packet was checksummed by software.
      
      	2) Add better CHECKSUM_COMPLETE handling to udp4/6_csum_init().
      	   If we receive a packet that's CHECKSUM_COMPLETE that fails
      	   verification (i.e. skb->csum_valid == 0), check who performed
      	   the calculation. It's possible that the checksum was done in
      	   software by the network stack earlier (such as Netfilter's
      	   CONNTRACK module), and if that says the checksum is bad,
      	   we can drop the packet immediately instead of waiting until
      	   we try and copy it to userspace. Otherwise, we need to
      	   mark the SKB as CHECKSUM_NONE, since the skb->csum field
      	   no longer contains the full packet checksum after the
      	   call to __skb_checksum_validate_complete().
      
      Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing")
      Fixes: c84d9490 ("udp: copy skb->truesize in the first cache line")
      Cc: Sam Kumar <samanthakumar@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NSean Tranchetti <stranche@codeaurora.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      db4f1be3
    • D
      net: Don't return invalid table id error when dumping all families · ae677bbb
      David Ahern 提交于
      When doing a route dump across all address families, do not error out
      if the table does not exist. This allows a route dump for AF_UNSPEC
      with a table id that may only exist for some of the families.
      
      Do return the table does not exist error if dumping routes for a
      specific family and the table does not exist.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae677bbb
    • D
      net/ipv6: Put target net when address dump fails due to bad attributes · 242afaa6
      David Ahern 提交于
      If tgt_net is set based on IFA_TARGET_NETNSID attribute in the dump
      request, make sure all error paths call put_net.
      
      Fixes: 6371a71f ("net/ipv6: Add support for dumping addresses for a specific device")
      Fixes: ed6eff11 ("net/ipv6: Update inet6_dump_addr for strict data checking")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      242afaa6
  16. 23 10月, 2018 2 次提交
  17. 21 10月, 2018 1 次提交
    • D
      net/ipv6: Fix index counter for unicast addresses in in6_dump_addrs · 4ba4c566
      David Ahern 提交于
      The loop wants to skip previously dumped addresses, so loops until
      current index >= saved index. If the message fills it wants to save
      the index for the next address to dump - ie., the one that did not
      fit in the current message.
      
      Currently, it is incrementing the index counter before comparing to the
      saved index, and then the saved index is off by 1 - it assumes the
      current address is going to fit in the message.
      
      Change the index handling to increment only after a succesful dump.
      
      Fixes: 502a2ffd ("ipv6: convert idev_list to list macros")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ba4c566
  18. 19 10月, 2018 1 次提交
    • S
      ip6_tunnel: Fix encapsulation layout · d4d576f5
      Stefano Brivio 提交于
      Commit 058214a4 ("ip6_tun: Add infrastructure for doing
      encapsulation") added the ip6_tnl_encap() call in ip6_tnl_xmit(), before
      the call to ipv6_push_frag_opts() to append the IPv6 Tunnel Encapsulation
      Limit option (option 4, RFC 2473, par. 5.1) to the outer IPv6 header.
      
      As long as the option didn't actually end up in generated packets, this
      wasn't an issue. Then commit 89a23c8b ("ip6_tunnel: Fix missing tunnel
      encapsulation limit option") fixed sending of this option, and the
      resulting layout, e.g. for FoU, is:
      
      .-------------------.------------.----------.-------------------.----- - -
      | Outer IPv6 Header | UDP header | Option 4 | Inner IPv6 Header | Payload
      '-------------------'------------'----------'-------------------'----- - -
      
      Needless to say, FoU and GUE (at least) won't work over IPv6. The option
      is appended by default, and I couldn't find a way to disable it with the
      current iproute2.
      
      Turn this into a more reasonable:
      
      .-------------------.----------.------------.-------------------.----- - -
      | Outer IPv6 Header | Option 4 | UDP header | Inner IPv6 Header | Payload
      '-------------------'----------'------------'-------------------'----- - -
      
      With this, and with 84dad559 ("udp6: fix encap return code for
      resubmitting"), FoU and GUE work again over IPv6.
      
      Fixes: 058214a4 ("ip6_tun: Add infrastructure for doing encapsulation")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4d576f5
  19. 18 10月, 2018 1 次提交
  20. 16 10月, 2018 8 次提交
    • D
      net/ipv6: Bail early if user only wants cloned entries · 08e814c9
      David Ahern 提交于
      Similar to IPv4, IPv6 fib no longer contains cloned routes. If a user
      requests a route dump for only cloned entries, no sense walking the FIB
      and returning everything.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      08e814c9
    • D
      net: Enable kernel side filtering of route dumps · effe6792
      David Ahern 提交于
      Update parsing of route dump request to enable kernel side filtering.
      Allow filtering results by protocol (e.g., which routing daemon installed
      the route), route type (e.g., unicast), table id and nexthop device. These
      amount to the low hanging fruit, yet a huge improvement, for dumping
      routes.
      
      ip_valid_fib_dump_req is called with RTNL held, so __dev_get_by_index can
      be used to look up the device index without taking a reference. From
      there filter->dev is only used during dump loops with the lock still held.
      
      Set NLM_F_DUMP_FILTERED in the answer_flags so the user knows the results
      have been filtered should no entries be returned.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      effe6792
    • D
      net: Plumb support for filtering ipv4 and ipv6 multicast route dumps · cb167893
      David Ahern 提交于
      Implement kernel side filtering of routes by egress device index and
      table id. If the table id is given in the filter, lookup table and
      call mr_table_dump directly for it.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb167893
    • D
      net/ipv6: Plumb support for filtering route dumps · 13e38901
      David Ahern 提交于
      Implement kernel side filtering of routes by table id, egress device
      index, protocol, and route type. If the table id is given in the filter,
      lookup the table and call fib6_dump_table directly for it.
      
      Move the existing route flags check for prefix only routes to the new
      filter.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      13e38901
    • D
      net: Add struct for fib dump filter · 4724676d
      David Ahern 提交于
      Add struct fib_dump_filter for options on limiting which routes are
      returned in a dump request. The current list is table id, protocol,
      route type, rtm_flags and nexthop device index. struct net is needed
      to lookup the net_device from the index.
      
      Declare the filter for each route dump handler and plumb the new
      arguments from dump handlers to ip_valid_fib_dump_req.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4724676d
    • E
      ipv6: mcast: fix a use-after-free in inet6_mc_check · dc012f36
      Eric Dumazet 提交于
      syzbot found a use-after-free in inet6_mc_check [1]
      
      The problem here is that inet6_mc_check() uses rcu
      and read_lock(&iml->sflock)
      
      So the fact that ip6_mc_leave_src() is called under RTNL
      and the socket lock does not help us, we need to acquire
      iml->sflock in write mode.
      
      In the future, we should convert all this stuff to RCU.
      
      [1]
      BUG: KASAN: use-after-free in ipv6_addr_equal include/net/ipv6.h:521 [inline]
      BUG: KASAN: use-after-free in inet6_mc_check+0xae7/0xb40 net/ipv6/mcast.c:649
      Read of size 8 at addr ffff8801ce7f2510 by task syz-executor0/22432
      
      CPU: 1 PID: 22432 Comm: syz-executor0 Not tainted 4.19.0-rc7+ #280
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
       print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
       ipv6_addr_equal include/net/ipv6.h:521 [inline]
       inet6_mc_check+0xae7/0xb40 net/ipv6/mcast.c:649
       __raw_v6_lookup+0x320/0x3f0 net/ipv6/raw.c:98
       ipv6_raw_deliver net/ipv6/raw.c:183 [inline]
       raw6_local_deliver+0x3d3/0xcb0 net/ipv6/raw.c:240
       ip6_input_finish+0x467/0x1aa0 net/ipv6/ip6_input.c:345
       NF_HOOK include/linux/netfilter.h:289 [inline]
       ip6_input+0xe9/0x600 net/ipv6/ip6_input.c:426
       ip6_mc_input+0x48a/0xd20 net/ipv6/ip6_input.c:503
       dst_input include/net/dst.h:450 [inline]
       ip6_rcv_finish+0x17a/0x330 net/ipv6/ip6_input.c:76
       NF_HOOK include/linux/netfilter.h:289 [inline]
       ipv6_rcv+0x120/0x640 net/ipv6/ip6_input.c:271
       __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4913
       __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5023
       netif_receive_skb_internal+0x12c/0x620 net/core/dev.c:5126
       napi_frags_finish net/core/dev.c:5664 [inline]
       napi_gro_frags+0x75a/0xc90 net/core/dev.c:5737
       tun_get_user+0x3189/0x4250 drivers/net/tun.c:1923
       tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:1968
       call_write_iter include/linux/fs.h:1808 [inline]
       do_iter_readv_writev+0x8b0/0xa80 fs/read_write.c:680
       do_iter_write+0x185/0x5f0 fs/read_write.c:959
       vfs_writev+0x1f1/0x360 fs/read_write.c:1004
       do_writev+0x11a/0x310 fs/read_write.c:1039
       __do_sys_writev fs/read_write.c:1112 [inline]
       __se_sys_writev fs/read_write.c:1109 [inline]
       __x64_sys_writev+0x75/0xb0 fs/read_write.c:1109
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x457421
      Code: 75 14 b8 14 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 34 b5 fb ff c3 48 83 ec 08 e8 1a 2d 00 00 48 89 04 24 b8 14 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 63 2d 00 00 48 89 d0 48 83 c4 08 48 3d 01
      RSP: 002b:00007f2d30ecaba0 EFLAGS: 00000293 ORIG_RAX: 0000000000000014
      RAX: ffffffffffffffda RBX: 000000000000003e RCX: 0000000000457421
      RDX: 0000000000000001 RSI: 00007f2d30ecabf0 RDI: 00000000000000f0
      RBP: 0000000020000500 R08: 00000000000000f0 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000293 R12: 00007f2d30ecb6d4
      R13: 00000000004c4890 R14: 00000000004d7b90 R15: 00000000ffffffff
      
      Allocated by task 22437:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
       __do_kmalloc mm/slab.c:3718 [inline]
       __kmalloc+0x14e/0x760 mm/slab.c:3727
       kmalloc include/linux/slab.h:518 [inline]
       sock_kmalloc+0x15a/0x1f0 net/core/sock.c:1983
       ip6_mc_source+0x14dd/0x1960 net/ipv6/mcast.c:427
       do_ipv6_setsockopt.isra.9+0x3afb/0x45d0 net/ipv6/ipv6_sockglue.c:743
       ipv6_setsockopt+0xbd/0x170 net/ipv6/ipv6_sockglue.c:933
       rawv6_setsockopt+0x59/0x140 net/ipv6/raw.c:1069
       sock_common_setsockopt+0x9a/0xe0 net/core/sock.c:3038
       __sys_setsockopt+0x1ba/0x3c0 net/socket.c:1902
       __do_sys_setsockopt net/socket.c:1913 [inline]
       __se_sys_setsockopt net/socket.c:1910 [inline]
       __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1910
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 22430:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
       kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
       __cache_free mm/slab.c:3498 [inline]
       kfree+0xcf/0x230 mm/slab.c:3813
       __sock_kfree_s net/core/sock.c:2004 [inline]
       sock_kfree_s+0x29/0x60 net/core/sock.c:2010
       ip6_mc_leave_src+0x11a/0x1d0 net/ipv6/mcast.c:2448
       __ipv6_sock_mc_close+0x20b/0x4e0 net/ipv6/mcast.c:310
       ipv6_sock_mc_close+0x158/0x1d0 net/ipv6/mcast.c:328
       inet6_release+0x40/0x70 net/ipv6/af_inet6.c:452
       __sock_release+0xd7/0x250 net/socket.c:579
       sock_close+0x19/0x20 net/socket.c:1141
       __fput+0x385/0xa30 fs/file_table.c:278
       ____fput+0x15/0x20 fs/file_table.c:309
       task_work_run+0x1e8/0x2a0 kernel/task_work.c:113
       tracehook_notify_resume include/linux/tracehook.h:193 [inline]
       exit_to_usermode_loop+0x318/0x380 arch/x86/entry/common.c:166
       prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
       syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
       do_syscall_64+0x6be/0x820 arch/x86/entry/common.c:293
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff8801ce7f2500
       which belongs to the cache kmalloc-192 of size 192
      The buggy address is located 16 bytes inside of
       192-byte region [ffff8801ce7f2500, ffff8801ce7f25c0)
      The buggy address belongs to the page:
      page:ffffea000739fc80 count:1 mapcount:0 mapping:ffff8801da800040 index:0x0
      flags: 0x2fffc0000000100(slab)
      raw: 02fffc0000000100 ffffea0006f6e548 ffffea000737b948 ffff8801da800040
      raw: 0000000000000000 ffff8801ce7f2000 0000000100000010 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8801ce7f2400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8801ce7f2480: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      >ffff8801ce7f2500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                               ^
       ffff8801ce7f2580: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
       ffff8801ce7f2600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc012f36
    • S
      ipv6: rate-limit probes for neighbourless routes · f547fac6
      Sabrina Dubroca 提交于
      When commit 27097255 ("[IPV6]: ROUTE: Add Router Reachability
      Probing (RFC4191).") introduced router probing, the rt6_probe() function
      required that a neighbour entry existed. This neighbour entry is used to
      record the timestamp of the last probe via the ->updated field.
      
      Later, commit 2152caea ("ipv6: Do not depend on rt->n in rt6_probe().")
      removed the requirement for a neighbour entry. Neighbourless routes skip
      the interval check and are not rate-limited.
      
      This patch adds rate-limiting for neighbourless routes, by recording the
      timestamp of the last probe in the fib6_info itself.
      
      Fixes: 2152caea ("ipv6: Do not depend on rt->n in rt6_probe().")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f547fac6
    • J
      bpf: Allow sk_lookup with IPv6 module · 8a615c6b
      Joe Stringer 提交于
      This is a more complete fix than d71019b5 ("net: core: Fix build
      with CONFIG_IPV6=m"), so that IPv6 sockets may be looked up if the IPv6
      module is loaded (not just if it's compiled in).
      Signed-off-by: NJoe Stringer <joe@wand.net.nz>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      8a615c6b
  21. 13 10月, 2018 2 次提交
    • D
      net: Evict neighbor entries on carrier down · 859bd2ef
      David Ahern 提交于
      When a link's carrier goes down it could be a sign of the port changing
      networks. If the new network has overlapping addresses with the old one,
      then the kernel will continue trying to use neighbor entries established
      based on the old network until the entries finally age out - meaning a
      potentially long delay with communications not working.
      
      This patch evicts neighbor entries on carrier down with the exception of
      those marked permanent. Permanent entries are managed by userspace (either
      an admin or a routing daemon such as FRR).
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      859bd2ef
    • D
      net/ipv6: Add knob to skip DELROUTE message on device down · 7c6bb7d2
      David Ahern 提交于
      Another difference between IPv4 and IPv6 is the generation of RTM_DELROUTE
      notifications when a device is taken down (admin down) or deleted. IPv4
      does not generate a message for routes evicted by the down or delete;
      IPv6 does. A NOS at scale really needs to avoid these messages and have
      IPv4 and IPv6 behave similarly, relying on userspace to handle link
      notifications and evict the routes.
      
      At this point existing user behavior needs to be preserved. Since
      notifications are a global action (not per app) the only way to preserve
      existing behavior and allow the messages to be skipped is to add a new
      sysctl (net/ipv6/route/skip_notify_on_dev_down) which can be set to
      disable the notificatioons.
      
      IPv6 route code already supports the option to skip the message (it is
      used for multipath routes for example). Besides the new sysctl we need
      to pass the skip_notify setting through the generic fib6_clean and
      fib6_walk functions to fib6_clean_node and to set skip_notify on calls
      to __ip_del_rt for the addrconf_ifdown path.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c6bb7d2