1. 09 8月, 2019 1 次提交
  2. 28 7月, 2019 2 次提交
  3. 10 7月, 2019 2 次提交
    • G
      netfilter: ipv6: nf_defrag: accept duplicate fragments again · ac0024ba
      Guillaume Nault 提交于
      [ Upstream commit 8a3dca632538c550930ce8bafa8c906b130d35cf ]
      
      When fixing the skb leak introduced by the conversion to rbtree, I
      forgot about the special case of duplicate fragments. The condition
      under the 'insert_error' label isn't effective anymore as
      nf_ct_frg6_gather() doesn't override the returned value anymore. So
      duplicate fragments now get NF_DROP verdict.
      
      To accept duplicate fragments again, handle them specially as soon as
      inet_frag_queue_insert() reports them. Return -EINPROGRESS which will
      translate to NF_STOLEN verdict, like any accepted fragment. However,
      such packets don't carry any new information and aren't queued, so we
      just drop them immediately.
      
      Fixes: a0d56cb911ca ("netfilter: ipv6: nf_defrag: fix leakage of unqueued fragments")
      Signed-off-by: NGuillaume Nault <gnault@redhat.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ac0024ba
    • G
      netfilter: ipv6: nf_defrag: fix leakage of unqueued fragments · 318244f3
      Guillaume Nault 提交于
      [ Upstream commit a0d56cb911ca301de81735f1d73c2aab424654ba ]
      
      With commit 997dd9647164 ("net: IP6 defrag: use rbtrees in
      nf_conntrack_reasm.c"), nf_ct_frag6_reasm() is now called from
      nf_ct_frag6_queue(). With this change, nf_ct_frag6_queue() can fail
      after the skb has been added to the fragment queue and
      nf_ct_frag6_gather() was adapted to handle this case.
      
      But nf_ct_frag6_queue() can still fail before the fragment has been
      queued. nf_ct_frag6_gather() can't handle this case anymore, because it
      has no way to know if nf_ct_frag6_queue() queued the fragment before
      failing. If it didn't, the skb is lost as the error code is overwritten
      with -EINPROGRESS.
      
      Fix this by setting -EINPROGRESS directly in nf_ct_frag6_queue(), so
      that nf_ct_frag6_gather() can propagate the error as is.
      
      Fixes: 997dd9647164 ("net: IP6 defrag: use rbtrees in nf_conntrack_reasm.c")
      Signed-off-by: NGuillaume Nault <gnault@redhat.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      318244f3
  4. 03 7月, 2019 3 次提交
    • M
      bpf: udp: ipv6: Avoid running reuseport's bpf_prog from __udp6_lib_err · ba6340a7
      Martin KaFai Lau 提交于
      commit 4ac30c4b3659efac031818c418beb51e630d512d upstream.
      
      __udp6_lib_err() may be called when handling icmpv6 message. For example,
      the icmpv6 toobig(type=2).  __udp6_lib_lookup() is then called
      which may call reuseport_select_sock().  reuseport_select_sock() will
      call into a bpf_prog (if there is one).
      
      reuseport_select_sock() is expecting the skb->data pointing to the
      transport header (udphdr in this case).  For example, run_bpf_filter()
      is pulling the transport header.
      
      However, in the __udp6_lib_err() path, the skb->data is pointing to the
      ipv6hdr instead of the udphdr.
      
      One option is to pull and push the ipv6hdr in __udp6_lib_err().
      Instead of doing this, this patch follows how the original
      commit 538950a1 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
      was done in IPv4, which has passed a NULL skb pointer to
      reuseport_select_sock().
      
      Fixes: 538950a1 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
      Cc: Craig Gallek <kraig@google.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Acked-by: NCraig Gallek <kraig@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ba6340a7
    • M
      bpf: udp: Avoid calling reuseport's bpf_prog from udp_gro · 79c6a8c0
      Martin KaFai Lau 提交于
      commit 257a525fe2e49584842c504a92c27097407f778f upstream.
      
      When the commit a6024562 ("udp: Add GRO functions to UDP socket")
      added udp[46]_lib_lookup_skb to the udp_gro code path, it broke
      the reuseport_select_sock() assumption that skb->data is pointing
      to the transport header.
      
      This patch follows an earlier __udp6_lib_err() fix by
      passing a NULL skb to avoid calling the reuseport's bpf_prog.
      
      Fixes: a6024562 ("udp: Add GRO functions to UDP socket")
      Cc: Tom Herbert <tom@herbertland.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      79c6a8c0
    • D
      bpf: fix unconnected udp hooks · 613bc37f
      Daniel Borkmann 提交于
      commit 983695fa676568fc0fe5ddd995c7267aabc24632 upstream.
      
      Intention of cgroup bind/connect/sendmsg BPF hooks is to act transparently
      to applications as also stated in original motivation in 7828f20e ("Merge
      branch 'bpf-cgroup-bind-connect'"). When recently integrating the latter
      two hooks into Cilium to enable host based load-balancing with Kubernetes,
      I ran into the issue that pods couldn't start up as DNS got broken. Kubernetes
      typically sets up DNS as a service and is thus subject to load-balancing.
      
      Upon further debugging, it turns out that the cgroupv2 sendmsg BPF hooks API
      is currently insufficient and thus not usable as-is for standard applications
      shipped with most distros. To break down the issue we ran into with a simple
      example:
      
        # cat /etc/resolv.conf
        nameserver 147.75.207.207
        nameserver 147.75.207.208
      
      For the purpose of a simple test, we set up above IPs as service IPs and
      transparently redirect traffic to a different DNS backend server for that
      node:
      
        # cilium service list
        ID   Frontend            Backend
        1    147.75.207.207:53   1 => 8.8.8.8:53
        2    147.75.207.208:53   1 => 8.8.8.8:53
      
      The attached BPF program is basically selecting one of the backends if the
      service IP/port matches on the cgroup hook. DNS breaks here, because the
      hooks are not transparent enough to applications which have built-in msg_name
      address checks:
      
        # nslookup 1.1.1.1
        ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53
        ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.208#53
        ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53
        [...]
        ;; connection timed out; no servers could be reached
      
        # dig 1.1.1.1
        ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53
        ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.208#53
        ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53
        [...]
      
        ; <<>> DiG 9.11.3-1ubuntu1.7-Ubuntu <<>> 1.1.1.1
        ;; global options: +cmd
        ;; connection timed out; no servers could be reached
      
      For comparison, if none of the service IPs is used, and we tell nslookup
      to use 8.8.8.8 directly it works just fine, of course:
      
        # nslookup 1.1.1.1 8.8.8.8
        1.1.1.1.in-addr.arpa	name = one.one.one.one.
      
      In order to fix this and thus act more transparent to the application,
      this needs reverse translation on recvmsg() side. A minimal fix for this
      API is to add similar recvmsg() hooks behind the BPF cgroups static key
      such that the program can track state and replace the current sockaddr_in{,6}
      with the original service IP. From BPF side, this basically tracks the
      service tuple plus socket cookie in an LRU map where the reverse NAT can
      then be retrieved via map value as one example. Side-note: the BPF cgroups
      static key should be converted to a per-hook static key in future.
      
      Same example after this fix:
      
        # cilium service list
        ID   Frontend            Backend
        1    147.75.207.207:53   1 => 8.8.8.8:53
        2    147.75.207.208:53   1 => 8.8.8.8:53
      
      Lookups work fine now:
      
        # nslookup 1.1.1.1
        1.1.1.1.in-addr.arpa    name = one.one.one.one.
      
        Authoritative answers can be found from:
      
        # dig 1.1.1.1
      
        ; <<>> DiG 9.11.3-1ubuntu1.7-Ubuntu <<>> 1.1.1.1
        ;; global options: +cmd
        ;; Got answer:
        ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 51550
        ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
      
        ;; OPT PSEUDOSECTION:
        ; EDNS: version: 0, flags:; udp: 512
        ;; QUESTION SECTION:
        ;1.1.1.1.                       IN      A
      
        ;; AUTHORITY SECTION:
        .                       23426   IN      SOA     a.root-servers.net. nstld.verisign-grs.com. 2019052001 1800 900 604800 86400
      
        ;; Query time: 17 msec
        ;; SERVER: 147.75.207.207#53(147.75.207.207)
        ;; WHEN: Tue May 21 12:59:38 UTC 2019
        ;; MSG SIZE  rcvd: 111
      
      And from an actual packet level it shows that we're using the back end
      server when talking via 147.75.207.20{7,8} front end:
      
        # tcpdump -i any udp
        [...]
        12:59:52.698732 IP foo.42011 > google-public-dns-a.google.com.domain: 18803+ PTR? 1.1.1.1.in-addr.arpa. (38)
        12:59:52.698735 IP foo.42011 > google-public-dns-a.google.com.domain: 18803+ PTR? 1.1.1.1.in-addr.arpa. (38)
        12:59:52.701208 IP google-public-dns-a.google.com.domain > foo.42011: 18803 1/0/0 PTR one.one.one.one. (67)
        12:59:52.701208 IP google-public-dns-a.google.com.domain > foo.42011: 18803 1/0/0 PTR one.one.one.one. (67)
        [...]
      
      In order to be flexible and to have same semantics as in sendmsg BPF
      programs, we only allow return codes in [1,1] range. In the sendmsg case
      the program is called if msg->msg_name is present which can be the case
      in both, connected and unconnected UDP.
      
      The former only relies on the sockaddr_in{,6} passed via connect(2) if
      passed msg->msg_name was NULL. Therefore, on recvmsg side, we act in similar
      way to call into the BPF program whenever a non-NULL msg->msg_name was
      passed independent of sk->sk_state being TCP_ESTABLISHED or not. Note
      that for TCP case, the msg->msg_name is ignored in the regular recvmsg
      path and therefore not relevant.
      
      For the case of ip{,v6}_recv_error() paths, picked up via MSG_ERRQUEUE,
      the hook is not called. This is intentional as it aligns with the same
      semantics as in case of TCP cgroup BPF hooks right now. This might be
      better addressed in future through a different bpf_attach_type such
      that this case can be distinguished from the regular recvmsg paths,
      for example.
      
      Fixes: 1cedee13 ("bpf: Hooks for sys_sendmsg")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAndrey Ignatov <rdna@fb.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NMartynas Pumputis <m@lambda.lt>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      613bc37f
  5. 22 6月, 2019 1 次提交
  6. 11 6月, 2019 2 次提交
  7. 04 6月, 2019 3 次提交
  8. 26 5月, 2019 4 次提交
  9. 17 5月, 2019 1 次提交
  10. 05 5月, 2019 4 次提交
    • W
      ipv6: invert flowlabel sharing check in process and user mode · f78ec0cd
      Willem de Bruijn 提交于
      [ Upstream commit 95c169251bf734aa555a1e8043e4d88ec97a04ec ]
      
      A request for a flowlabel fails in process or user exclusive mode must
      fail if the caller pid or uid does not match. Invert the test.
      
      Previously, the test was unsafe wrt PID recycling, but indeed tested
      for inequality: fl1->owner != fl->owner
      
      Fixes: 4f82f457 ("net ip6 flowlabel: Make owner a union of struct pid* and kuid_t")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f78ec0cd
    • E
      ipv6/flowlabel: wait rcu grace period before put_pid() · 39eddbb7
      Eric Dumazet 提交于
      [ Upstream commit 6c0afef5fb0c27758f4d52b2210c61b6bd8b4470 ]
      
      syzbot was able to catch a use-after-free read in pid_nr_ns() [1]
      
      ip6fl_seq_show() seems to use RCU protection, dereferencing fl->owner.pid
      but fl_free() releases fl->owner.pid before rcu grace period is started.
      
      [1]
      
      BUG: KASAN: use-after-free in pid_nr_ns+0x128/0x140 kernel/pid.c:407
      Read of size 4 at addr ffff888094012a04 by task syz-executor.0/18087
      
      CPU: 0 PID: 18087 Comm: syz-executor.0 Not tainted 5.1.0-rc6+ #89
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
       kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
       __asan_report_load4_noabort+0x14/0x20 mm/kasan/generic_report.c:131
       pid_nr_ns+0x128/0x140 kernel/pid.c:407
       ip6fl_seq_show+0x2f8/0x4f0 net/ipv6/ip6_flowlabel.c:794
       seq_read+0xad3/0x1130 fs/seq_file.c:268
       proc_reg_read+0x1fe/0x2c0 fs/proc/inode.c:227
       do_loop_readv_writev fs/read_write.c:701 [inline]
       do_loop_readv_writev fs/read_write.c:688 [inline]
       do_iter_read+0x4a9/0x660 fs/read_write.c:922
       vfs_readv+0xf0/0x160 fs/read_write.c:984
       kernel_readv fs/splice.c:358 [inline]
       default_file_splice_read+0x475/0x890 fs/splice.c:413
       do_splice_to+0x12a/0x190 fs/splice.c:876
       splice_direct_to_actor+0x2d2/0x970 fs/splice.c:953
       do_splice_direct+0x1da/0x2a0 fs/splice.c:1062
       do_sendfile+0x597/0xd00 fs/read_write.c:1443
       __do_sys_sendfile64 fs/read_write.c:1498 [inline]
       __se_sys_sendfile64 fs/read_write.c:1490 [inline]
       __x64_sys_sendfile64+0x15a/0x220 fs/read_write.c:1490
       do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x458da9
      Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f300d24bc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
      RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 0000000000458da9
      RDX: 00000000200000c0 RSI: 0000000000000008 RDI: 0000000000000007
      RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
      R10: 000000000000005a R11: 0000000000000246 R12: 00007f300d24c6d4
      R13: 00000000004c5fa3 R14: 00000000004da748 R15: 00000000ffffffff
      
      Allocated by task 17543:
       save_stack+0x45/0xd0 mm/kasan/common.c:75
       set_track mm/kasan/common.c:87 [inline]
       __kasan_kmalloc mm/kasan/common.c:497 [inline]
       __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:470
       kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:505
       slab_post_alloc_hook mm/slab.h:437 [inline]
       slab_alloc mm/slab.c:3393 [inline]
       kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3555
       alloc_pid+0x55/0x8f0 kernel/pid.c:168
       copy_process.part.0+0x3b08/0x7980 kernel/fork.c:1932
       copy_process kernel/fork.c:1709 [inline]
       _do_fork+0x257/0xfd0 kernel/fork.c:2226
       __do_sys_clone kernel/fork.c:2333 [inline]
       __se_sys_clone kernel/fork.c:2327 [inline]
       __x64_sys_clone+0xbf/0x150 kernel/fork.c:2327
       do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 7789:
       save_stack+0x45/0xd0 mm/kasan/common.c:75
       set_track mm/kasan/common.c:87 [inline]
       __kasan_slab_free+0x102/0x150 mm/kasan/common.c:459
       kasan_slab_free+0xe/0x10 mm/kasan/common.c:467
       __cache_free mm/slab.c:3499 [inline]
       kmem_cache_free+0x86/0x260 mm/slab.c:3765
       put_pid.part.0+0x111/0x150 kernel/pid.c:111
       put_pid+0x20/0x30 kernel/pid.c:105
       fl_free+0xbe/0xe0 net/ipv6/ip6_flowlabel.c:102
       ip6_fl_gc+0x295/0x3e0 net/ipv6/ip6_flowlabel.c:152
       call_timer_fn+0x190/0x720 kernel/time/timer.c:1325
       expire_timers kernel/time/timer.c:1362 [inline]
       __run_timers kernel/time/timer.c:1681 [inline]
       __run_timers kernel/time/timer.c:1649 [inline]
       run_timer_softirq+0x652/0x1700 kernel/time/timer.c:1694
       __do_softirq+0x266/0x95a kernel/softirq.c:293
      
      The buggy address belongs to the object at ffff888094012a00
       which belongs to the cache pid_2 of size 88
      The buggy address is located 4 bytes inside of
       88-byte region [ffff888094012a00, ffff888094012a58)
      The buggy address belongs to the page:
      page:ffffea0002500480 count:1 mapcount:0 mapping:ffff88809a483080 index:0xffff888094012980
      flags: 0x1fffc0000000200(slab)
      raw: 01fffc0000000200 ffffea00018a3508 ffffea0002524a88 ffff88809a483080
      raw: ffff888094012980 ffff888094012000 000000010000001b 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff888094012900: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
       ffff888094012980: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
      >ffff888094012a00: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
                         ^
       ffff888094012a80: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
       ffff888094012b00: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
      
      Fixes: 4f82f457 ("net ip6 flowlabel: Make owner a union of struct pid * and kuid_t")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      39eddbb7
    • E
      ipv6: fix races in ip6_dst_destroy() · 1a9e0134
      Eric Dumazet 提交于
      [ Upstream commit 0e2338749192ce0e52e7174c5352f627632f478a ]
      
      We had many syzbot reports that seem to be caused by use-after-free
      of struct fib6_info.
      
      ip6_dst_destroy(), fib6_drop_pcpu_from() and rt6_remove_exception()
      are writers vs rt->from, and use non consistent synchronization among
      themselves.
      
      Switching to xchg() will solve the issues with no possible
      lockdep issues.
      
      BUG: KASAN: user-memory-access in atomic_dec_and_test include/asm-generic/atomic-instrumented.h:747 [inline]
      BUG: KASAN: user-memory-access in fib6_info_release include/net/ip6_fib.h:294 [inline]
      BUG: KASAN: user-memory-access in fib6_info_release include/net/ip6_fib.h:292 [inline]
      BUG: KASAN: user-memory-access in fib6_drop_pcpu_from net/ipv6/ip6_fib.c:927 [inline]
      BUG: KASAN: user-memory-access in fib6_purge_rt+0x4f6/0x670 net/ipv6/ip6_fib.c:960
      Write of size 4 at addr 0000000000ffffb4 by task syz-executor.1/7649
      
      CPU: 0 PID: 7649 Comm: syz-executor.1 Not tainted 5.1.0-rc6+ #183
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       kasan_report.cold+0x5/0x40 mm/kasan/report.c:321
       check_memory_region_inline mm/kasan/generic.c:185 [inline]
       check_memory_region+0x123/0x190 mm/kasan/generic.c:191
       kasan_check_write+0x14/0x20 mm/kasan/common.c:108
       atomic_dec_and_test include/asm-generic/atomic-instrumented.h:747 [inline]
       fib6_info_release include/net/ip6_fib.h:294 [inline]
       fib6_info_release include/net/ip6_fib.h:292 [inline]
       fib6_drop_pcpu_from net/ipv6/ip6_fib.c:927 [inline]
       fib6_purge_rt+0x4f6/0x670 net/ipv6/ip6_fib.c:960
       fib6_del_route net/ipv6/ip6_fib.c:1813 [inline]
       fib6_del+0xac2/0x10a0 net/ipv6/ip6_fib.c:1844
       fib6_clean_node+0x3a8/0x590 net/ipv6/ip6_fib.c:2006
       fib6_walk_continue+0x495/0x900 net/ipv6/ip6_fib.c:1928
       fib6_walk+0x9d/0x100 net/ipv6/ip6_fib.c:1976
       fib6_clean_tree+0xe0/0x120 net/ipv6/ip6_fib.c:2055
       __fib6_clean_all+0x118/0x2a0 net/ipv6/ip6_fib.c:2071
       fib6_clean_all+0x2b/0x40 net/ipv6/ip6_fib.c:2082
       rt6_sync_down_dev+0x134/0x150 net/ipv6/route.c:4057
       rt6_disable_ip+0x27/0x5f0 net/ipv6/route.c:4062
       addrconf_ifdown+0xa2/0x1220 net/ipv6/addrconf.c:3705
       addrconf_notify+0x19a/0x2260 net/ipv6/addrconf.c:3630
       notifier_call_chain+0xc7/0x240 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394 [inline]
       raw_notifier_call_chain+0x2e/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1753
       call_netdevice_notifiers_extack net/core/dev.c:1765 [inline]
       call_netdevice_notifiers net/core/dev.c:1779 [inline]
       dev_close_many+0x33f/0x6f0 net/core/dev.c:1522
       rollback_registered_many+0x43b/0xfd0 net/core/dev.c:8177
       rollback_registered+0x109/0x1d0 net/core/dev.c:8242
       unregister_netdevice_queue net/core/dev.c:9289 [inline]
       unregister_netdevice_queue+0x1ee/0x2c0 net/core/dev.c:9282
       unregister_netdevice include/linux/netdevice.h:2658 [inline]
       __tun_detach+0xd5b/0x1000 drivers/net/tun.c:727
       tun_detach drivers/net/tun.c:744 [inline]
       tun_chr_close+0xe0/0x180 drivers/net/tun.c:3443
       __fput+0x2e5/0x8d0 fs/file_table.c:278
       ____fput+0x16/0x20 fs/file_table.c:309
       task_work_run+0x14a/0x1c0 kernel/task_work.c:113
       exit_task_work include/linux/task_work.h:22 [inline]
       do_exit+0x90a/0x2fa0 kernel/exit.c:876
       do_group_exit+0x135/0x370 kernel/exit.c:980
       __do_sys_exit_group kernel/exit.c:991 [inline]
       __se_sys_exit_group kernel/exit.c:989 [inline]
       __x64_sys_exit_group+0x44/0x50 kernel/exit.c:989
       do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x458da9
      Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffeafc2a6a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
      RAX: ffffffffffffffda RBX: 000000000000001c RCX: 0000000000458da9
      RDX: 0000000000412a80 RSI: 0000000000a54ef0 RDI: 0000000000000043
      RBP: 00000000004be552 R08: 000000000000000c R09: 000000000004c0d1
      R10: 0000000002341940 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00007ffeafc2a7f0 R14: 000000000004c065 R15: 00007ffeafc2a800
      
      Fixes: a68886a6 ("net/ipv6: Make from in rt6_info rcu protected")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: David Ahern <dsahern@gmail.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a9e0134
    • M
      ipv6: A few fixes on dereferencing rt->from · 7ea4f000
      Martin KaFai Lau 提交于
      [ Upstream commit 886b7a50100a50f1cbd08a6f8ec5884dfbe082dc ]
      
      It is a followup after the fix in
      commit 9c69a1320515 ("route: Avoid crash from dereferencing NULL rt->from")
      
      rt6_do_redirect():
      1. NULL checking is needed on rt->from because a parallel
         fib6_info delete could happen that sets rt->from to NULL.
         (e.g. rt6_remove_exception() and fib6_drop_pcpu_from()).
      
      2. fib6_info_hold() is not enough.  Same reason as (1).
         Meaning, holding dst->__refcnt cannot ensure
         rt->from is not NULL or rt->from->fib6_ref is not 0.
      
         Instead of using fib6_info_hold_safe() which ip6_rt_cache_alloc()
         is already doing, this patch chooses to extend the rcu section
         to keep "from" dereference-able after checking for NULL.
      
      inet6_rtm_getroute():
      1. NULL checking is also needed on rt->from for a similar reason.
         Note that inet6_rtm_getroute() is using RTNL_FLAG_DOIT_UNLOCKED.
      
      Fixes: a68886a6 ("net/ipv6: Make from in rt6_info rcu protected")
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NWei Wang <weiwan@google.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7ea4f000
  11. 04 5月, 2019 1 次提交
  12. 27 4月, 2019 3 次提交
  13. 20 4月, 2019 2 次提交
    • L
      net: ip6_gre: fix possible NULL pointer dereference in ip6erspan_set_version · 8c5e9ea1
      Lorenzo Bianconi 提交于
      [ Upstream commit efcc9bcaf77c07df01371a7c34e50424c291f3ac ]
      
      Fix a possible NULL pointer dereference in ip6erspan_set_version checking
      nlattr data pointer
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 7549 Comm: syz-executor432 Not tainted 5.0.0-rc6-next-20190218
      #37
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:ip6erspan_set_version+0x5c/0x350 net/ipv6/ip6_gre.c:1726
      Code: 07 38 d0 7f 08 84 c0 0f 85 9f 02 00 00 49 8d bc 24 b0 00 00 00 c6 43
      54 01 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f
      85 9a 02 00 00 4d 8b ac 24 b0 00 00 00 4d 85 ed 0f
      RSP: 0018:ffff888089ed7168 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: ffff8880869d6e58 RCX: 0000000000000000
      RDX: 0000000000000016 RSI: ffffffff862736b4 RDI: 00000000000000b0
      RBP: ffff888089ed7180 R08: 1ffff11010d3adcb R09: ffff8880869d6e58
      R10: ffffed1010d3add5 R11: ffff8880869d6eaf R12: 0000000000000000
      R13: ffffffff8931f8c0 R14: ffffffff862825d0 R15: ffff8880869d6e58
      FS:  0000000000b3d880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000184 CR3: 0000000092cc5000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
        ip6erspan_newlink+0x66/0x7b0 net/ipv6/ip6_gre.c:2210
        __rtnl_newlink+0x107b/0x16c0 net/core/rtnetlink.c:3176
        rtnl_newlink+0x69/0xa0 net/core/rtnetlink.c:3234
        rtnetlink_rcv_msg+0x465/0xb00 net/core/rtnetlink.c:5192
        netlink_rcv_skb+0x17a/0x460 net/netlink/af_netlink.c:2485
        rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5210
        netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
        netlink_unicast+0x536/0x720 net/netlink/af_netlink.c:1336
        netlink_sendmsg+0x8ae/0xd70 net/netlink/af_netlink.c:1925
        sock_sendmsg_nosec net/socket.c:621 [inline]
        sock_sendmsg+0xdd/0x130 net/socket.c:631
        ___sys_sendmsg+0x806/0x930 net/socket.c:2136
        __sys_sendmsg+0x105/0x1d0 net/socket.c:2174
        __do_sys_sendmsg net/socket.c:2183 [inline]
        __se_sys_sendmsg net/socket.c:2181 [inline]
        __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2181
        do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x440159
      Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7
      48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
      ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fffa69156e8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440159
      RDX: 0000000000000000 RSI: 0000000020001340 RDI: 0000000000000003
      RBP: 00000000006ca018 R08: 0000000000000001 R09: 00000000004002c8
      R10: 0000000000000011 R11: 0000000000000246 R12: 00000000004019e0
      R13: 0000000000401a70 R14: 0000000000000000 R15: 0000000000000000
      Modules linked in:
      ---[ end trace 09f8a7d13b4faaa1 ]---
      RIP: 0010:ip6erspan_set_version+0x5c/0x350 net/ipv6/ip6_gre.c:1726
      Code: 07 38 d0 7f 08 84 c0 0f 85 9f 02 00 00 49 8d bc 24 b0 00 00 00 c6 43
      54 01 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f
      85 9a 02 00 00 4d 8b ac 24 b0 00 00 00 4d 85 ed 0f
      RSP: 0018:ffff888089ed7168 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: ffff8880869d6e58 RCX: 0000000000000000
      RDX: 0000000000000016 RSI: ffffffff862736b4 RDI: 00000000000000b0
      RBP: ffff888089ed7180 R08: 1ffff11010d3adcb R09: ffff8880869d6e58
      R10: ffffed1010d3add5 R11: ffff8880869d6eaf R12: 0000000000000000
      R13: ffffffff8931f8c0 R14: ffffffff862825d0 R15: ffff8880869d6e58
      FS:  0000000000b3d880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000184 CR3: 0000000092cc5000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fixes: 4974d5f678ab ("net: ip6_gre: initialize erspan_ver just for erspan tunnels")
      Reported-and-tested-by: syzbot+30191cf1057abd3064af@syzkaller.appspotmail.com
      Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Reviewed-by: NGreg Rose <gvrose8192@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8c5e9ea1
    • C
      xfrm: destroy xfrm_state synchronously on net exit path · bbbe4746
      Cong Wang 提交于
      [ Upstream commit f75a2804da391571563c4b6b29e7797787332673 ]
      
      xfrm_state_put() moves struct xfrm_state to the GC list
      and schedules the GC work to clean it up. On net exit call
      path, xfrm_state_flush() is called to clean up and
      xfrm_flush_gc() is called to wait for the GC work to complete
      before exit.
      
      However, this doesn't work because one of the ->destructor(),
      ipcomp_destroy(), schedules the same GC work again inside
      the GC work. It is hard to wait for such a nested async
      callback. This is also why syzbot still reports the following
      warning:
      
       WARNING: CPU: 1 PID: 33 at net/ipv6/xfrm6_tunnel.c:351 xfrm6_tunnel_net_exit+0x2cb/0x500 net/ipv6/xfrm6_tunnel.c:351
       ...
        ops_exit_list.isra.0+0xb0/0x160 net/core/net_namespace.c:153
        cleanup_net+0x51d/0xb10 net/core/net_namespace.c:551
        process_one_work+0xd0c/0x1ce0 kernel/workqueue.c:2153
        worker_thread+0x143/0x14a0 kernel/workqueue.c:2296
        kthread+0x357/0x430 kernel/kthread.c:246
        ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
      
      In fact, it is perfectly fine to bypass GC and destroy xfrm_state
      synchronously on net exit call path, because it is in process context
      and doesn't need a work struct to do any blocking work.
      
      This patch introduces xfrm_state_put_sync() which simply bypasses
      GC, and lets its callers to decide whether to use this synchronous
      version. On net exit path, xfrm_state_fini() and
      xfrm6_tunnel_net_exit() use it. And, as ipcomp_destroy() itself is
      blocking, it can use xfrm_state_put_sync() directly too.
      
      Also rename xfrm_state_gc_destroy() to ___xfrm_state_destroy() to
      reflect this change.
      
      Fixes: b48c05ab ("xfrm: Fix warning in xfrm6_tunnel_net_exit.")
      Reported-and-tested-by: syzbot+e9aebef558e3ed673934@syzkaller.appspotmail.com
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      bbbe4746
  14. 17 4月, 2019 4 次提交
    • L
      net: ip6_gre: fix possible use-after-free in ip6erspan_rcv · a2ef7723
      Lorenzo Bianconi 提交于
      [ Upstream commit 2a3cabae4536edbcb21d344e7aa8be7a584d2afb ]
      
      erspan_v6 tunnels run __iptunnel_pull_header on received skbs to remove
      erspan header. This can determine a possible use-after-free accessing
      pkt_md pointer in ip6erspan_rcv since the packet will be 'uncloned'
      running pskb_expand_head if it is a cloned gso skb (e.g if the packet has
      been sent though a veth device). Fix it resetting pkt_md pointer after
      __iptunnel_pull_header
      
      Fixes: 1d7e2ed2 ("net: erspan: refactor existing erspan code")
      Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a2ef7723
    • L
      ipv6: sit: reset ip header pointer in ipip6_rcv · 42f1fa0f
      Lorenzo Bianconi 提交于
      [ Upstream commit bb9bd814ebf04f579be466ba61fc922625508807 ]
      
      ipip6 tunnels run iptunnel_pull_header on received skbs. This can
      determine the following use-after-free accessing iph pointer since
      the packet will be 'uncloned' running pskb_expand_head if it is a
      cloned gso skb (e.g if the packet has been sent though a veth device)
      
      [  706.369655] BUG: KASAN: use-after-free in ipip6_rcv+0x1678/0x16e0 [sit]
      [  706.449056] Read of size 1 at addr ffffe01b6bd855f5 by task ksoftirqd/1/=
      [  706.669494] Hardware name: HPE ProLiant m400 Server/ProLiant m400 Server, BIOS U02 08/19/2016
      [  706.771839] Call trace:
      [  706.801159]  dump_backtrace+0x0/0x2f8
      [  706.845079]  show_stack+0x24/0x30
      [  706.884833]  dump_stack+0xe0/0x11c
      [  706.925629]  print_address_description+0x68/0x260
      [  706.982070]  kasan_report+0x178/0x340
      [  707.025995]  __asan_report_load1_noabort+0x30/0x40
      [  707.083481]  ipip6_rcv+0x1678/0x16e0 [sit]
      [  707.132623]  tunnel64_rcv+0xd4/0x200 [tunnel4]
      [  707.185940]  ip_local_deliver_finish+0x3b8/0x988
      [  707.241338]  ip_local_deliver+0x144/0x470
      [  707.289436]  ip_rcv_finish+0x43c/0x14b0
      [  707.335447]  ip_rcv+0x628/0x1138
      [  707.374151]  __netif_receive_skb_core+0x1670/0x2600
      [  707.432680]  __netif_receive_skb+0x28/0x190
      [  707.482859]  process_backlog+0x1d0/0x610
      [  707.529913]  net_rx_action+0x37c/0xf68
      [  707.574882]  __do_softirq+0x288/0x1018
      [  707.619852]  run_ksoftirqd+0x70/0xa8
      [  707.662734]  smpboot_thread_fn+0x3a4/0x9e8
      [  707.711875]  kthread+0x2c8/0x350
      [  707.750583]  ret_from_fork+0x10/0x18
      
      [  707.811302] Allocated by task 16982:
      [  707.854182]  kasan_kmalloc.part.1+0x40/0x108
      [  707.905405]  kasan_kmalloc+0xb4/0xc8
      [  707.948291]  kasan_slab_alloc+0x14/0x20
      [  707.994309]  __kmalloc_node_track_caller+0x158/0x5e0
      [  708.053902]  __kmalloc_reserve.isra.8+0x54/0xe0
      [  708.108280]  __alloc_skb+0xd8/0x400
      [  708.150139]  sk_stream_alloc_skb+0xa4/0x638
      [  708.200346]  tcp_sendmsg_locked+0x818/0x2b90
      [  708.251581]  tcp_sendmsg+0x40/0x60
      [  708.292376]  inet_sendmsg+0xf0/0x520
      [  708.335259]  sock_sendmsg+0xac/0xf8
      [  708.377096]  sock_write_iter+0x1c0/0x2c0
      [  708.424154]  new_sync_write+0x358/0x4a8
      [  708.470162]  __vfs_write+0xc4/0xf8
      [  708.510950]  vfs_write+0x12c/0x3d0
      [  708.551739]  ksys_write+0xcc/0x178
      [  708.592533]  __arm64_sys_write+0x70/0xa0
      [  708.639593]  el0_svc_handler+0x13c/0x298
      [  708.686646]  el0_svc+0x8/0xc
      
      [  708.739019] Freed by task 17:
      [  708.774597]  __kasan_slab_free+0x114/0x228
      [  708.823736]  kasan_slab_free+0x10/0x18
      [  708.868703]  kfree+0x100/0x3d8
      [  708.905320]  skb_free_head+0x7c/0x98
      [  708.948204]  skb_release_data+0x320/0x490
      [  708.996301]  pskb_expand_head+0x60c/0x970
      [  709.044399]  __iptunnel_pull_header+0x3b8/0x5d0
      [  709.098770]  ipip6_rcv+0x41c/0x16e0 [sit]
      [  709.146873]  tunnel64_rcv+0xd4/0x200 [tunnel4]
      [  709.200195]  ip_local_deliver_finish+0x3b8/0x988
      [  709.255596]  ip_local_deliver+0x144/0x470
      [  709.303692]  ip_rcv_finish+0x43c/0x14b0
      [  709.349705]  ip_rcv+0x628/0x1138
      [  709.388413]  __netif_receive_skb_core+0x1670/0x2600
      [  709.446943]  __netif_receive_skb+0x28/0x190
      [  709.497120]  process_backlog+0x1d0/0x610
      [  709.544169]  net_rx_action+0x37c/0xf68
      [  709.589131]  __do_softirq+0x288/0x1018
      
      [  709.651938] The buggy address belongs to the object at ffffe01b6bd85580
                      which belongs to the cache kmalloc-1024 of size 1024
      [  709.804356] The buggy address is located 117 bytes inside of
                      1024-byte region [ffffe01b6bd85580, ffffe01b6bd85980)
      [  709.946340] The buggy address belongs to the page:
      [  710.003824] page:ffff7ff806daf600 count:1 mapcount:0 mapping:ffffe01c4001f600 index:0x0
      [  710.099914] flags: 0xfffff8000000100(slab)
      [  710.149059] raw: 0fffff8000000100 dead000000000100 dead000000000200 ffffe01c4001f600
      [  710.242011] raw: 0000000000000000 0000000000380038 00000001ffffffff 0000000000000000
      [  710.334966] page dumped because: kasan: bad access detected
      
      Fix it resetting iph pointer after iptunnel_pull_header
      
      Fixes: a09a4c8d ("tunnels: Remove encapsulation offloads on decap")
      Tested-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      42f1fa0f
    • J
      ipv6: Fix dangling pointer when ipv6 fragment · ea06796f
      Junwei Hu 提交于
      [ Upstream commit ef0efcd3bd3fd0589732b67fb586ffd3c8705806 ]
      
      At the beginning of ip6_fragment func, the prevhdr pointer is
      obtained in the ip6_find_1stfragopt func.
      However, all the pointers pointing into skb header may change
      when calling skb_checksum_help func with
      skb->ip_summed = CHECKSUM_PARTIAL condition.
      The prevhdr pointe will be dangling if it is not reloaded after
      calling __skb_linearize func in skb_checksum_help func.
      
      Here, I add a variable, nexthdr_offset, to evaluate the offset,
      which does not changes even after calling __skb_linearize func.
      
      Fixes: 405c92f7 ("ipv6: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment")
      Signed-off-by: NJunwei Hu <hujunwei4@huawei.com>
      Reported-by: NWenhao Zhang <zhangwenhao8@huawei.com>
      Reported-by: syzbot+e8ce541d095e486074fc@syzkaller.appspotmail.com
      Reviewed-by: NZhiqiang Liu <liuzhiqiang26@huawei.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ea06796f
    • S
      ip6_tunnel: Match to ARPHRD_TUNNEL6 for dev type · 8e4b4da3
      Sheena Mira-ato 提交于
      [ Upstream commit b2e54b09a3d29c4db883b920274ca8dca4d9f04d ]
      
      The device type for ip6 tunnels is set to
      ARPHRD_TUNNEL6. However, the ip4ip6_err function
      is expecting the device type of the tunnel to be
      ARPHRD_TUNNEL.  Since the device types do not
      match, the function exits and the ICMP error
      packet is not sent to the originating host. Note
      that the device type for IPv4 tunnels is set to
      ARPHRD_TUNNEL.
      
      Fix is to expect a tunnel device type of
      ARPHRD_TUNNEL6 instead.  Now the tunnel device
      type matches and the ICMP error packet is sent
      to the originating host.
      Signed-off-by: NSheena Mira-ato <sheena.mira-ato@alliedtelesis.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8e4b4da3
  15. 03 4月, 2019 3 次提交
  16. 24 3月, 2019 1 次提交
    • M
      esp: Skip TX bytes accounting when sending from a request socket · b92eaed3
      Martin Willi 提交于
      [ Upstream commit 09db51241118aeb06e1c8cd393b45879ce099b36 ]
      
      On ESP output, sk_wmem_alloc is incremented for the added padding if a
      socket is associated to the skb. When replying with TCP SYNACKs over
      IPsec, the associated sk is a casted request socket, only. Increasing
      sk_wmem_alloc on a request socket results in a write at an arbitrary
      struct offset. In the best case, this produces the following WARNING:
      
      WARNING: CPU: 1 PID: 0 at lib/refcount.c:102 esp_output_head+0x2e4/0x308 [esp4]
      refcount_t: addition on 0; use-after-free.
      CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-rc3 #2
      Hardware name: Marvell Armada 380/385 (Device Tree)
      [...]
      [<bf0ff354>] (esp_output_head [esp4]) from [<bf1006a4>] (esp_output+0xb8/0x180 [esp4])
      [<bf1006a4>] (esp_output [esp4]) from [<c05dee64>] (xfrm_output_resume+0x558/0x664)
      [<c05dee64>] (xfrm_output_resume) from [<c05d07b0>] (xfrm4_output+0x44/0xc4)
      [<c05d07b0>] (xfrm4_output) from [<c05956bc>] (tcp_v4_send_synack+0xa8/0xe8)
      [<c05956bc>] (tcp_v4_send_synack) from [<c0586ad8>] (tcp_conn_request+0x7f4/0x948)
      [<c0586ad8>] (tcp_conn_request) from [<c058c404>] (tcp_rcv_state_process+0x2a0/0xe64)
      [<c058c404>] (tcp_rcv_state_process) from [<c05958ac>] (tcp_v4_do_rcv+0xf0/0x1f4)
      [<c05958ac>] (tcp_v4_do_rcv) from [<c0598a4c>] (tcp_v4_rcv+0xdb8/0xe20)
      [<c0598a4c>] (tcp_v4_rcv) from [<c056eb74>] (ip_protocol_deliver_rcu+0x2c/0x2dc)
      [<c056eb74>] (ip_protocol_deliver_rcu) from [<c056ee6c>] (ip_local_deliver_finish+0x48/0x54)
      [<c056ee6c>] (ip_local_deliver_finish) from [<c056eecc>] (ip_local_deliver+0x54/0xec)
      [<c056eecc>] (ip_local_deliver) from [<c056efac>] (ip_rcv+0x48/0xb8)
      [<c056efac>] (ip_rcv) from [<c0519c2c>] (__netif_receive_skb_one_core+0x50/0x6c)
      [...]
      
      The issue triggers only when not using TCP syncookies, as for syncookies
      no socket is associated.
      
      Fixes: cac2661c ("esp4: Avoid skb_cow_data whenever possible")
      Fixes: 03e2a30f ("esp6: Avoid skb_cow_data whenever possible")
      Signed-off-by: NMartin Willi <martin@strongswan.org>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b92eaed3
  17. 19 3月, 2019 3 次提交
    • P
      ipv6: route: enforce RCU protection in ip6_route_check_nh_onlink() · 2e4b2aeb
      Paolo Abeni 提交于
      [ Upstream commit bf1dc8bad1d42287164d216d8efb51c5cd381b18 ]
      
      We need a RCU critical section around rt6_info->from deference, and
      proper annotation.
      
      Fixes: 4ed591c8ab44 ("net/ipv6: Allow onlink routes to have a device mismatch if it is the default route")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2e4b2aeb
    • P
      ipv6: route: enforce RCU protection in rt6_update_exception_stamp_rt() · 96dd4ef3
      Paolo Abeni 提交于
      [ Upstream commit 193f3685d0546b0cea20c99894aadb70098e47bf ]
      
      We must access rt6_info->from under RCU read lock: move the
      dereference under such lock, with proper annotation.
      
      v1 -> v2:
       - avoid using multiple, racy, fetch operations for rt->from
      
      Fixes: a68886a6 ("net/ipv6: Make from in rt6_info rcu protected")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      96dd4ef3
    • P
      ipv6: route: purge exception on removal · b9d0cb75
      Paolo Abeni 提交于
      [ Upstream commit f5b51fe804ec2a6edce0f8f6b11ea57283f5857b ]
      
      When a netdevice is unregistered, we flush the relevant exception
      via rt6_sync_down_dev() -> fib6_ifdown() -> fib6_del() -> fib6_del_route().
      
      Finally, we end-up calling rt6_remove_exception(), where we release
      the relevant dst, while we keep the references to the related fib6_info and
      dev. Such references should be released later when the dst will be
      destroyed.
      
      There are a number of caches that can keep the exception around for an
      unlimited amount of time - namely dst_cache, possibly even socket cache.
      As a result device registration may hang, as demonstrated by this script:
      
      ip netns add cl
      ip netns add rt
      ip netns add srv
      ip netns exec rt sysctl -w net.ipv6.conf.all.forwarding=1
      
      ip link add name cl_veth type veth peer name cl_rt_veth
      ip link set dev cl_veth netns cl
      ip -n cl link set dev cl_veth up
      ip -n cl addr add dev cl_veth 2001::2/64
      ip -n cl route add default via 2001::1
      
      ip -n cl link add tunv6 type ip6tnl mode ip6ip6 local 2001::2 remote 2002::1 hoplimit 64 dev cl_veth
      ip -n cl link set tunv6 up
      ip -n cl addr add 2013::2/64 dev tunv6
      
      ip link set dev cl_rt_veth netns rt
      ip -n rt link set dev cl_rt_veth up
      ip -n rt addr add dev cl_rt_veth 2001::1/64
      
      ip link add name rt_srv_veth type veth peer name srv_veth
      ip link set dev srv_veth netns srv
      ip -n srv link set dev srv_veth up
      ip -n srv addr add dev srv_veth 2002::1/64
      ip -n srv route add default via 2002::2
      
      ip -n srv link add tunv6 type ip6tnl mode ip6ip6 local 2002::1 remote 2001::2 hoplimit 64 dev srv_veth
      ip -n srv link set tunv6 up
      ip -n srv addr add 2013::1/64 dev tunv6
      
      ip link set dev rt_srv_veth netns rt
      ip -n rt link set dev rt_srv_veth up
      ip -n rt addr add dev rt_srv_veth 2002::2/64
      
      ip netns exec srv netserver & sleep 0.1
      ip netns exec cl ping6 -c 4 2013::1
      ip netns exec cl netperf -H 2013::1 -t TCP_STREAM -l 3 & sleep 1
      ip -n rt link set dev rt_srv_veth mtu 1400
      wait %2
      
      ip -n cl link del cl_veth
      
      This commit addresses the issue purging all the references held by the
      exception at time, as we currently do for e.g. ipv6 pcpu dst entries.
      
      v1 -> v2:
       - re-order the code to avoid accessing dst and net after dst_dev_put()
      
      Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b9d0cb75