1. 27 3月, 2018 1 次提交
    • J
      net: sched, fix OOO packets with pfifo_fast · eb82a994
      John Fastabend 提交于
      After the qdisc lock was dropped in pfifo_fast we allow multiple
      enqueue threads and dequeue threads to run in parallel. On the
      enqueue side the skb bit ooo_okay is used to ensure all related
      skbs are enqueued in-order. On the dequeue side though there is
      no similar logic. What we observe is with fewer queues than CPUs
      it is possible to re-order packets when two instances of
      __qdisc_run() are running in parallel. Each thread will dequeue
      a skb and then whichever thread calls the ndo op first will
      be sent on the wire. This doesn't typically happen because
      qdisc_run() is usually triggered by the same core that did the
      enqueue. However, drivers will trigger __netif_schedule()
      when queues are transitioning from stopped to awake using the
      netif_tx_wake_* APIs. When this happens netif_schedule() calls
      qdisc_run() on the same CPU that did the netif_tx_wake_* which
      is usually done in the interrupt completion context. This CPU
      is selected with the irq affinity which is unrelated to the
      enqueue operations.
      
      To resolve this we add a RUNNING bit to the qdisc to ensure
      only a single dequeue per qdisc is running. Enqueue and dequeue
      operations can still run in parallel and also on multi queue
      NICs we can still have a dequeue in-flight per qdisc, which
      is typically per CPU.
      
      Fixes: c5ad119f ("net: sched: pfifo_fast use skb_array")
      Reported-by: NJakob Unterwurzacher <jakob.unterwurzacher@theobroma-systems.com>
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb82a994
  2. 26 3月, 2018 3 次提交
    • P
      ipv6: the entire IPv6 header chain must fit the first fragment · 10b8a3de
      Paolo Abeni 提交于
      While building ipv6 datagram we currently allow arbitrary large
      extheaders, even beyond pmtu size. The syzbot has found a way
      to exploit the above to trigger the following splat:
      
      kernel BUG at ./include/linux/skbuff.h:2073!
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
          (ftrace buffer empty)
      Modules linked in:
      CPU: 1 PID: 4230 Comm: syzkaller672661 Not tainted 4.16.0-rc2+ #326
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:__skb_pull include/linux/skbuff.h:2073 [inline]
      RIP: 0010:__ip6_make_skb+0x1ac8/0x2190 net/ipv6/ip6_output.c:1636
      RSP: 0018:ffff8801bc18f0f0 EFLAGS: 00010293
      RAX: ffff8801b17400c0 RBX: 0000000000000738 RCX: ffffffff84f01828
      RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8801b415ac18
      RBP: ffff8801bc18f360 R08: ffff8801b4576844 R09: 0000000000000000
      R10: ffff8801bc18f380 R11: ffffed00367aee4e R12: 00000000000000d6
      R13: ffff8801b415a740 R14: dffffc0000000000 R15: ffff8801b45767c0
      FS:  0000000001535880(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000000002000b000 CR3: 00000001b4123001 CR4: 00000000001606e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
        ip6_finish_skb include/net/ipv6.h:969 [inline]
        udp_v6_push_pending_frames+0x269/0x3b0 net/ipv6/udp.c:1073
        udpv6_sendmsg+0x2a96/0x3400 net/ipv6/udp.c:1343
        inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:764
        sock_sendmsg_nosec net/socket.c:630 [inline]
        sock_sendmsg+0xca/0x110 net/socket.c:640
        ___sys_sendmsg+0x320/0x8b0 net/socket.c:2046
        __sys_sendmmsg+0x1ee/0x620 net/socket.c:2136
        SYSC_sendmmsg net/socket.c:2167 [inline]
        SyS_sendmmsg+0x35/0x60 net/socket.c:2162
        do_syscall_64+0x280/0x940 arch/x86/entry/common.c:287
        entry_SYSCALL_64_after_hwframe+0x42/0xb7
      RIP: 0033:0x4404c9
      RSP: 002b:00007ffdce35f948 EFLAGS: 00000217 ORIG_RAX: 0000000000000133
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004404c9
      RDX: 0000000000000003 RSI: 0000000020001f00 RDI: 0000000000000003
      RBP: 00000000006cb018 R08: 00000000004002c8 R09: 00000000004002c8
      R10: 0000000020000080 R11: 0000000000000217 R12: 0000000000401df0
      R13: 0000000000401e80 R14: 0000000000000000 R15: 0000000000000000
      Code: ff e8 1d 5e b9 fc e9 15 e9 ff ff e8 13 5e b9 fc e9 44 e8 ff ff e8 29
      5e b9 fc e9 c0 e6 ff ff e8 3f f3 80 fc 0f 0b e8 38 f3 80 fc <0f> 0b 49 8d
      87 80 00 00 00 4d 8d 87 84 00 00 00 48 89 85 20 fe
      RIP: __skb_pull include/linux/skbuff.h:2073 [inline] RSP: ffff8801bc18f0f0
      RIP: __ip6_make_skb+0x1ac8/0x2190 net/ipv6/ip6_output.c:1636 RSP:
      ffff8801bc18f0f0
      
      As stated by RFC 7112 section 5:
      
         When a host fragments an IPv6 datagram, it MUST include the entire
         IPv6 Header Chain in the First Fragment.
      
      So this patch addresses the issue dropping datagrams with excessive
      extheader length. It also updates the error path to report to the
      calling socket nonnegative pmtu values.
      
      The issue apparently predates git history.
      
      v1 -> v2: cleanup error path, as per Eric's suggestion
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: syzbot+91e6f9932ff122fa4410@syzkaller.appspotmail.com
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10b8a3de
    • A
      netlink: make sure nladdr has correct size in netlink_connect() · 78802879
      Alexander Potapenko 提交于
      KMSAN reports use of uninitialized memory in the case when |alen| is
      smaller than sizeof(struct sockaddr_nl), and therefore |nladdr| isn't
      fully copied from the userspace.
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      78802879
    • H
      net/ipv4: disable SMC TCP option with SYN Cookies · bc58a1ba
      Hans Wippel 提交于
      Currently, the SMC experimental TCP option in a SYN packet is lost on
      the server side when SYN Cookies are active. However, the corresponding
      SYNACK sent back to the client contains the SMC option. This causes an
      inconsistent view of the SMC capabilities on the client and server.
      
      This patch disables the SMC option in the SYNACK when SYN Cookies are
      active to avoid this issue.
      
      Fixes: 60e2a778 ("tcp: TCP experimental option for SMC")
      Signed-off-by: NHans Wippel <hwippel@linux.vnet.ibm.com>
      Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc58a1ba
  3. 25 3月, 2018 1 次提交
    • S
      netfilter: nf_socket: Fix out of bounds access in nf_sk_lookup_slow_v{4,6} · 32c1733f
      Subash Abhinov Kasiviswanathan 提交于
      skb_header_pointer will copy data into a buffer if data is non linear,
      otherwise it will return a pointer in the linear section of the data.
      nf_sk_lookup_slow_v{4,6} always copies data of size udphdr but later
      accesses memory within the size of tcphdr (th->doff) in case of TCP
      packets. This causes a crash when running with KASAN with the following
      call stack -
      
      BUG: KASAN: stack-out-of-bounds in xt_socket_lookup_slow_v4+0x524/0x718
      net/netfilter/xt_socket.c:178
      Read of size 2 at addr ffffffe3d417a87c by task syz-executor/28971
      CPU: 2 PID: 28971 Comm: syz-executor Tainted: G    B   W  O    4.9.65+ #1
      Call trace:
      [<ffffff9467e8d390>] dump_backtrace+0x0/0x428 arch/arm64/kernel/traps.c:76
      [<ffffff9467e8d7e0>] show_stack+0x28/0x38 arch/arm64/kernel/traps.c:226
      [<ffffff946842d9b8>] __dump_stack lib/dump_stack.c:15 [inline]
      [<ffffff946842d9b8>] dump_stack+0xd4/0x124 lib/dump_stack.c:51
      [<ffffff946811d4b0>] print_address_description+0x68/0x258 mm/kasan/report.c:248
      [<ffffff946811d8c8>] kasan_report_error mm/kasan/report.c:347 [inline]
      [<ffffff946811d8c8>] kasan_report.part.2+0x228/0x2f0 mm/kasan/report.c:371
      [<ffffff946811df44>] kasan_report+0x5c/0x70 mm/kasan/report.c:372
      [<ffffff946811bebc>] check_memory_region_inline mm/kasan/kasan.c:308 [inline]
      [<ffffff946811bebc>] __asan_load2+0x84/0x98 mm/kasan/kasan.c:739
      [<ffffff94694d6f04>] __tcp_hdrlen include/linux/tcp.h:35 [inline]
      [<ffffff94694d6f04>] xt_socket_lookup_slow_v4+0x524/0x718 net/netfilter/xt_socket.c:178
      
      Fix this by copying data into appropriate size headers based on protocol.
      
      Fixes: a583636a ("inet: refactor inet[6]_lookup functions to take skb")
      Signed-off-by: NTejaswi Tanikella <tejaswit@codeaurora.org>
      Signed-off-by: NSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      32c1733f
  4. 24 3月, 2018 4 次提交
    • L
      batman-adv: fix packet loss for broadcasted DHCP packets to a server · a752c0a4
      Linus Lüssing 提交于
      DHCP connectivity issues can currently occur if the following conditions
      are met:
      
      1) A DHCP packet from a client to a server
      2) This packet has a multicast destination
      3) This destination has a matching entry in the translation table
         (FF:FF:FF:FF:FF:FF for IPv4, 33:33:00:01:00:02/33:33:00:01:00:03
          for IPv6)
      4) The orig-node determined by TT for the multicast destination
         does not match the orig-node determined by best-gateway-selection
      
      In this case the DHCP packet will be dropped.
      
      The "gateway-out-of-range" check is supposed to only be applied to
      unicasted DHCP packets to a specific DHCP server.
      
      In that case dropping the the unicasted frame forces the client to
      retry via a broadcasted one, but now directed to the new best
      gateway.
      
      A DHCP packet with broadcast/multicast destination is already ensured to
      always be delivered to the best gateway. Dropping a multicasted
      DHCP packet here will only prevent completing DHCP as there is no
      other fallback.
      
      So far, it seems the unicast check was implicitly performed by
      expecting the batadv_transtable_search() to return NULL for multicast
      destinations. However, a multicast address could have always ended up in
      the translation table and in fact is now common.
      
      To fix this potential loss of a DHCP client-to-server packet to a
      multicast address this patch adds an explicit multicast destination
      check to reliably bail out of the gateway-out-of-range check for such
      destinations.
      
      The issue and fix were tested in the following three node setup:
      
      - Line topology, A-B-C
      - A: gateway client, DHCP client
      - B: gateway server, hop-penalty increased: 30->60, DHCP server
      - C: gateway server, code modifications to announce FF:FF:FF:FF:FF:FF
      
      Without this patch, A would never transmit its DHCP Discover packet
      due to an always "out-of-range" condition. With this patch,
      a full DHCP handshake between A and B was possible again.
      
      Fixes: be7af5cf ("batman-adv: refactoring gateway handling code")
      Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NSimon Wunderlich <sw@simonwunderlich.de>
      a752c0a4
    • L
      batman-adv: fix multicast-via-unicast transmission with AP isolation · f8fb3419
      Linus Lüssing 提交于
      For multicast frames AP isolation is only supposed to be checked on
      the receiving nodes and never on the originating one.
      
      Furthermore, the isolation or wifi flag bits should only be intepreted
      as such for unicast and never multicast TT entries.
      
      By injecting flags to the multicast TT entry claimed by a single
      target node it was verified in tests that this multicast address
      becomes unreachable, leading to packet loss.
      
      Omitting the "src" parameter to the batadv_transtable_search() call
      successfully skipped the AP isolation check and made the target
      reachable again.
      
      Fixes: 1d8ab8d3 ("batman-adv: Modified forwarding behaviour for multicast packets")
      Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NSimon Wunderlich <sw@simonwunderlich.de>
      f8fb3419
    • E
      ipv6: fix possible deadlock in rt6_age_examine_exception() · 1bfa26ff
      Eric Dumazet 提交于
      syzbot reported a LOCKDEP splat [1] in rt6_age_examine_exception()
      
      rt6_age_examine_exception() is called while rt6_exception_lock is held.
      This lock is the lower one in the lock hierarchy, thus we can not
      call dst_neigh_lookup() function, as it can fallback to neigh_create()
      
      We should instead do a pure RCU lookup. As a bonus we avoid
      a pair of atomic operations on neigh refcount.
      
      [1]
      
      WARNING: possible circular locking dependency detected
      4.16.0-rc4+ #277 Not tainted
      
      syz-executor7/4015 is trying to acquire lock:
       (&ndev->lock){++--}, at: [<00000000416dce19>] __ipv6_dev_mc_dec+0x45/0x350 net/ipv6/mcast.c:928
      
      but task is already holding lock:
       (&tbl->lock){++-.}, at: [<00000000b5cb1d65>] neigh_ifdown+0x3d/0x250 net/core/neighbour.c:292
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #3 (&tbl->lock){++-.}:
             __raw_write_lock_bh include/linux/rwlock_api_smp.h:203 [inline]
             _raw_write_lock_bh+0x31/0x40 kernel/locking/spinlock.c:312
             __neigh_create+0x87e/0x1d90 net/core/neighbour.c:528
             neigh_create include/net/neighbour.h:315 [inline]
             ip6_neigh_lookup+0x9a7/0xba0 net/ipv6/route.c:228
             dst_neigh_lookup include/net/dst.h:405 [inline]
             rt6_age_examine_exception net/ipv6/route.c:1609 [inline]
             rt6_age_exceptions+0x381/0x660 net/ipv6/route.c:1645
             fib6_age+0xfb/0x140 net/ipv6/ip6_fib.c:2033
             fib6_clean_node+0x389/0x580 net/ipv6/ip6_fib.c:1919
             fib6_walk_continue+0x46c/0x8a0 net/ipv6/ip6_fib.c:1845
             fib6_walk+0x91/0xf0 net/ipv6/ip6_fib.c:1893
             fib6_clean_tree+0x1e6/0x340 net/ipv6/ip6_fib.c:1970
             __fib6_clean_all+0x1f4/0x3a0 net/ipv6/ip6_fib.c:1986
             fib6_clean_all net/ipv6/ip6_fib.c:1997 [inline]
             fib6_run_gc+0x16b/0x3c0 net/ipv6/ip6_fib.c:2053
             ndisc_netdev_event+0x3c2/0x4a0 net/ipv6/ndisc.c:1781
             notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
             __raw_notifier_call_chain kernel/notifier.c:394 [inline]
             raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
             call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
             call_netdevice_notifiers net/core/dev.c:1725 [inline]
             __dev_notify_flags+0x262/0x430 net/core/dev.c:6960
             dev_change_flags+0xf5/0x140 net/core/dev.c:6994
             devinet_ioctl+0x126a/0x1ac0 net/ipv4/devinet.c:1080
             inet_ioctl+0x184/0x310 net/ipv4/af_inet.c:919
             sock_do_ioctl+0xef/0x390 net/socket.c:957
             sock_ioctl+0x36b/0x610 net/socket.c:1081
             vfs_ioctl fs/ioctl.c:46 [inline]
             do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
             SYSC_ioctl fs/ioctl.c:701 [inline]
             SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
             do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
             entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      -> #2 (rt6_exception_lock){+.-.}:
             __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
             _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168
             spin_lock_bh include/linux/spinlock.h:315 [inline]
             rt6_flush_exceptions+0x21/0x210 net/ipv6/route.c:1367
             fib6_del_route net/ipv6/ip6_fib.c:1677 [inline]
             fib6_del+0x624/0x12c0 net/ipv6/ip6_fib.c:1761
             __ip6_del_rt+0xc7/0x120 net/ipv6/route.c:2980
             ip6_del_rt+0x132/0x1a0 net/ipv6/route.c:2993
             __ipv6_dev_ac_dec+0x3b1/0x600 net/ipv6/anycast.c:332
             ipv6_dev_ac_dec net/ipv6/anycast.c:345 [inline]
             ipv6_sock_ac_close+0x2b4/0x3e0 net/ipv6/anycast.c:200
             inet6_release+0x48/0x70 net/ipv6/af_inet6.c:433
             sock_release+0x8d/0x1e0 net/socket.c:594
             sock_close+0x16/0x20 net/socket.c:1149
             __fput+0x327/0x7e0 fs/file_table.c:209
             ____fput+0x15/0x20 fs/file_table.c:243
             task_work_run+0x199/0x270 kernel/task_work.c:113
             exit_task_work include/linux/task_work.h:22 [inline]
             do_exit+0x9bb/0x1ad0 kernel/exit.c:865
             do_group_exit+0x149/0x400 kernel/exit.c:968
             get_signal+0x73a/0x16d0 kernel/signal.c:2469
             do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
             exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
             prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
             syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
             do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
             entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      -> #1 (&(&tb->tb6_lock)->rlock){+.-.}:
             __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
             _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168
             spin_lock_bh include/linux/spinlock.h:315 [inline]
             __ip6_ins_rt+0x56/0x90 net/ipv6/route.c:1007
             ip6_route_add+0x141/0x190 net/ipv6/route.c:2955
             addrconf_prefix_route+0x44f/0x620 net/ipv6/addrconf.c:2359
             fixup_permanent_addr net/ipv6/addrconf.c:3368 [inline]
             addrconf_permanent_addr net/ipv6/addrconf.c:3391 [inline]
             addrconf_notify+0x1ad2/0x2310 net/ipv6/addrconf.c:3460
             notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
             __raw_notifier_call_chain kernel/notifier.c:394 [inline]
             raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
             call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
             call_netdevice_notifiers net/core/dev.c:1725 [inline]
             __dev_notify_flags+0x15d/0x430 net/core/dev.c:6958
             dev_change_flags+0xf5/0x140 net/core/dev.c:6994
             do_setlink+0xa22/0x3bb0 net/core/rtnetlink.c:2357
             rtnl_newlink+0xf37/0x1a50 net/core/rtnetlink.c:2965
             rtnetlink_rcv_msg+0x57f/0xb10 net/core/rtnetlink.c:4641
             netlink_rcv_skb+0x14b/0x380 net/netlink/af_netlink.c:2444
             rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4659
             netlink_unicast_kernel net/netlink/af_netlink.c:1308 [inline]
             netlink_unicast+0x4c4/0x6b0 net/netlink/af_netlink.c:1334
             netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1897
             sock_sendmsg_nosec net/socket.c:629 [inline]
             sock_sendmsg+0xca/0x110 net/socket.c:639
             ___sys_sendmsg+0x767/0x8b0 net/socket.c:2047
             __sys_sendmsg+0xe5/0x210 net/socket.c:2081
             SYSC_sendmsg net/socket.c:2092 [inline]
             SyS_sendmsg+0x2d/0x50 net/socket.c:2088
             do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
             entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      -> #0 (&ndev->lock){++--}:
             lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920
             __raw_write_lock_bh include/linux/rwlock_api_smp.h:203 [inline]
             _raw_write_lock_bh+0x31/0x40 kernel/locking/spinlock.c:312
             __ipv6_dev_mc_dec+0x45/0x350 net/ipv6/mcast.c:928
             ipv6_dev_mc_dec+0x110/0x1f0 net/ipv6/mcast.c:961
             pndisc_destructor+0x21a/0x340 net/ipv6/ndisc.c:392
             pneigh_ifdown net/core/neighbour.c:695 [inline]
             neigh_ifdown+0x149/0x250 net/core/neighbour.c:294
             rt6_disable_ip+0x537/0x700 net/ipv6/route.c:3874
             addrconf_ifdown+0x14b/0x14f0 net/ipv6/addrconf.c:3633
             addrconf_notify+0x5f8/0x2310 net/ipv6/addrconf.c:3557
             notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
             __raw_notifier_call_chain kernel/notifier.c:394 [inline]
             raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
             call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
             call_netdevice_notifiers net/core/dev.c:1725 [inline]
             __dev_notify_flags+0x262/0x430 net/core/dev.c:6960
             dev_change_flags+0xf5/0x140 net/core/dev.c:6994
             devinet_ioctl+0x126a/0x1ac0 net/ipv4/devinet.c:1080
             inet_ioctl+0x184/0x310 net/ipv4/af_inet.c:919
             packet_ioctl+0x1ff/0x310 net/packet/af_packet.c:4066
             sock_do_ioctl+0xef/0x390 net/socket.c:957
             sock_ioctl+0x36b/0x610 net/socket.c:1081
             vfs_ioctl fs/ioctl.c:46 [inline]
             do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
             SYSC_ioctl fs/ioctl.c:701 [inline]
             SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
             do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
             entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      other info that might help us debug this:
      
      Chain exists of:
        &ndev->lock --> rt6_exception_lock --> &tbl->lock
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&tbl->lock);
                                     lock(rt6_exception_lock);
                                     lock(&tbl->lock);
        lock(&ndev->lock);
      
       *** DEADLOCK ***
      
      2 locks held by syz-executor7/4015:
       #0:  (rtnl_mutex){+.+.}, at: [<00000000a2f16daa>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74
       #1:  (&tbl->lock){++-.}, at: [<00000000b5cb1d65>] neigh_ifdown+0x3d/0x250 net/core/neighbour.c:292
      
      stack backtrace:
      CPU: 0 PID: 4015 Comm: syz-executor7 Not tainted 4.16.0-rc4+ #277
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x194/0x24d lib/dump_stack.c:53
       print_circular_bug.isra.38+0x2cd/0x2dc kernel/locking/lockdep.c:1223
       check_prev_add kernel/locking/lockdep.c:1863 [inline]
       check_prevs_add kernel/locking/lockdep.c:1976 [inline]
       validate_chain kernel/locking/lockdep.c:2417 [inline]
       __lock_acquire+0x30a8/0x3e00 kernel/locking/lockdep.c:3431
       lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920
       __raw_write_lock_bh include/linux/rwlock_api_smp.h:203 [inline]
       _raw_write_lock_bh+0x31/0x40 kernel/locking/spinlock.c:312
       __ipv6_dev_mc_dec+0x45/0x350 net/ipv6/mcast.c:928
       ipv6_dev_mc_dec+0x110/0x1f0 net/ipv6/mcast.c:961
       pndisc_destructor+0x21a/0x340 net/ipv6/ndisc.c:392
       pneigh_ifdown net/core/neighbour.c:695 [inline]
       neigh_ifdown+0x149/0x250 net/core/neighbour.c:294
       rt6_disable_ip+0x537/0x700 net/ipv6/route.c:3874
       addrconf_ifdown+0x14b/0x14f0 net/ipv6/addrconf.c:3633
       addrconf_notify+0x5f8/0x2310 net/ipv6/addrconf.c:3557
       notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394 [inline]
       raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
       call_netdevice_notifiers net/core/dev.c:1725 [inline]
       __dev_notify_flags+0x262/0x430 net/core/dev.c:6960
       dev_change_flags+0xf5/0x140 net/core/dev.c:6994
       devinet_ioctl+0x126a/0x1ac0 net/ipv4/devinet.c:1080
       inet_ioctl+0x184/0x310 net/ipv4/af_inet.c:919
       packet_ioctl+0x1ff/0x310 net/packet/af_packet.c:4066
       sock_do_ioctl+0xef/0x390 net/socket.c:957
       sock_ioctl+0x36b/0x610 net/socket.c:1081
       vfs_ioctl fs/ioctl.c:46 [inline]
       do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
       SYSC_ioctl fs/ioctl.c:701 [inline]
       SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
       do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      Fixes: c757faa8 ("ipv6: prepare fib6_age() for exception table")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Wei Wang <weiwan@google.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Acked-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1bfa26ff
    • P
      ip_tunnel: Emit events for post-register MTU changes · f6cc9c05
      Petr Machata 提交于
      For tunnels created with IFLA_MTU, MTU of the netdevice is set by
      rtnl_create_link() (called from rtnl_newlink()) before the device is
      registered. However without IFLA_MTU that's not done.
      
      rtnl_newlink() proceeds by calling struct rtnl_link_ops.newlink, which
      via ip_tunnel_newlink() calls register_netdevice(), and that emits
      NETDEV_REGISTER. Thus any listeners that inspect the netdevice get the
      MTU of 0.
      
      After ip_tunnel_newlink() corrects the MTU after registering the
      netdevice, but since there's no event, the listeners don't get to know
      about the MTU until something else happens--such as a NETDEV_UP event.
      That's not ideal.
      
      So instead of setting the MTU directly, go through dev_set_mtu(), which
      takes care of distributing the necessary NETDEV_PRECHANGEMTU and
      NETDEV_CHANGEMTU events.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f6cc9c05
  5. 23 3月, 2018 3 次提交
    • D
      net/ipv6: Handle onlink flag with multipath routes · 68e2ffde
      David Ahern 提交于
      For multipath routes the ONLINK flag can be specified per nexthop in
      rtnh_flags or globally in rtm_flags. Update ip6_route_multipath_add
      to consider the ONLINK setting coming from rtnh_flags. Each loop over
      nexthops the config for the sibling route is initialized to the global
      config and then per nexthop settings overlayed. The flag is 'or'ed into
      fib6_config to handle the ONLINK flag coming from either rtm_flags or
      rtnh_flags.
      
      Fixes: fc1e64e1 ("net/ipv6: Add support for onlink flag")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68e2ffde
    • D
      ipv6: sr: fix NULL pointer dereference when setting encap source address · 8936ef76
      David Lebrun 提交于
      When using seg6 in encap mode, we call ipv6_dev_get_saddr() to set the
      source address of the outer IPv6 header, in case none was specified.
      Using skb->dev can lead to BUG() when it is in an inconsistent state.
      This patch uses the net_device attached to the skb's dst instead.
      
      [940807.667429] BUG: unable to handle kernel NULL pointer dereference at 000000000000047c
      [940807.762427] IP: ipv6_dev_get_saddr+0x8b/0x1d0
      [940807.815725] PGD 0 P4D 0
      [940807.847173] Oops: 0000 [#1] SMP PTI
      [940807.890073] Modules linked in:
      [940807.927765] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G        W        4.16.0-rc1-seg6bpf+ #2
      [940808.028988] Hardware name: HP ProLiant DL120 G6/ProLiant DL120 G6, BIOS O26    09/06/2010
      [940808.128128] RIP: 0010:ipv6_dev_get_saddr+0x8b/0x1d0
      [940808.187667] RSP: 0018:ffff88043fd836b0 EFLAGS: 00010206
      [940808.251366] RAX: 0000000000000005 RBX: ffff88042cb1c860 RCX: 00000000000000fe
      [940808.338025] RDX: 00000000000002c0 RSI: ffff88042cb1c860 RDI: 0000000000004500
      [940808.424683] RBP: ffff88043fd83740 R08: 0000000000000000 R09: ffffffffffffffff
      [940808.511342] R10: 0000000000000040 R11: 0000000000000000 R12: ffff88042cb1c850
      [940808.598012] R13: ffffffff8208e380 R14: ffff88042ac8da00 R15: 0000000000000002
      [940808.684675] FS:  0000000000000000(0000) GS:ffff88043fd80000(0000) knlGS:0000000000000000
      [940808.783036] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [940808.852975] CR2: 000000000000047c CR3: 00000004255fe000 CR4: 00000000000006e0
      [940808.939634] Call Trace:
      [940808.970041]  <IRQ>
      [940808.995250]  ? ip6t_do_table+0x265/0x640
      [940809.043341]  seg6_do_srh_encap+0x28f/0x300
      [940809.093516]  ? seg6_do_srh+0x1a0/0x210
      [940809.139528]  seg6_do_srh+0x1a0/0x210
      [940809.183462]  seg6_output+0x28/0x1e0
      [940809.226358]  lwtunnel_output+0x3f/0x70
      [940809.272370]  ip6_xmit+0x2b8/0x530
      [940809.313185]  ? ac6_proc_exit+0x20/0x20
      [940809.359197]  inet6_csk_xmit+0x7d/0xc0
      [940809.404173]  tcp_transmit_skb+0x548/0x9a0
      [940809.453304]  __tcp_retransmit_skb+0x1a8/0x7a0
      [940809.506603]  ? ip6_default_advmss+0x40/0x40
      [940809.557824]  ? tcp_current_mss+0x24/0x90
      [940809.605925]  tcp_retransmit_skb+0xd/0x80
      [940809.654016]  tcp_xmit_retransmit_queue.part.17+0xf9/0x210
      [940809.719797]  tcp_ack+0xa47/0x1110
      [940809.760612]  tcp_rcv_established+0x13c/0x570
      [940809.812865]  tcp_v6_do_rcv+0x151/0x3d0
      [940809.858879]  tcp_v6_rcv+0xa5c/0xb10
      [940809.901770]  ? seg6_output+0xdd/0x1e0
      [940809.946745]  ip6_input_finish+0xbb/0x460
      [940809.994837]  ip6_input+0x74/0x80
      [940810.034612]  ? ip6_rcv_finish+0xb0/0xb0
      [940810.081663]  ipv6_rcv+0x31c/0x4c0
      ...
      
      Fixes: 6c8702c6 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
      Reported-by: NTom Herbert <tom@quantonium.net>
      Signed-off-by: NDavid Lebrun <dlebrun@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8936ef76
    • D
      ipv6: sr: fix scheduling in RCU when creating seg6 lwtunnel state · 191f86ca
      David Lebrun 提交于
      The seg6_build_state() function is called with RCU read lock held,
      so we cannot use GFP_KERNEL. This patch uses GFP_ATOMIC instead.
      
      [   92.770271] =============================
      [   92.770628] WARNING: suspicious RCU usage
      [   92.770921] 4.16.0-rc4+ #12 Not tainted
      [   92.771277] -----------------------------
      [   92.771585] ./include/linux/rcupdate.h:302 Illegal context switch in RCU read-side critical section!
      [   92.772279]
      [   92.772279] other info that might help us debug this:
      [   92.772279]
      [   92.773067]
      [   92.773067] rcu_scheduler_active = 2, debug_locks = 1
      [   92.773514] 2 locks held by ip/2413:
      [   92.773765]  #0:  (rtnl_mutex){+.+.}, at: [<00000000e5461720>] rtnetlink_rcv_msg+0x441/0x4d0
      [   92.774377]  #1:  (rcu_read_lock){....}, at: [<00000000df4f161e>] lwtunnel_build_state+0x59/0x210
      [   92.775065]
      [   92.775065] stack backtrace:
      [   92.775371] CPU: 0 PID: 2413 Comm: ip Not tainted 4.16.0-rc4+ #12
      [   92.775791] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc27 04/01/2014
      [   92.776608] Call Trace:
      [   92.776852]  dump_stack+0x7d/0xbc
      [   92.777130]  __schedule+0x133/0xf00
      [   92.777393]  ? unwind_get_return_address_ptr+0x50/0x50
      [   92.777783]  ? __sched_text_start+0x8/0x8
      [   92.778073]  ? rcu_is_watching+0x19/0x30
      [   92.778383]  ? kernel_text_address+0x49/0x60
      [   92.778800]  ? __kernel_text_address+0x9/0x30
      [   92.779241]  ? unwind_get_return_address+0x29/0x40
      [   92.779727]  ? pcpu_alloc+0x102/0x8f0
      [   92.780101]  _cond_resched+0x23/0x50
      [   92.780459]  __mutex_lock+0xbd/0xad0
      [   92.780818]  ? pcpu_alloc+0x102/0x8f0
      [   92.781194]  ? seg6_build_state+0x11d/0x240
      [   92.781611]  ? save_stack+0x9b/0xb0
      [   92.781965]  ? __ww_mutex_wakeup_for_backoff+0xf0/0xf0
      [   92.782480]  ? seg6_build_state+0x11d/0x240
      [   92.782925]  ? lwtunnel_build_state+0x1bd/0x210
      [   92.783393]  ? ip6_route_info_create+0x687/0x1640
      [   92.783846]  ? ip6_route_add+0x74/0x110
      [   92.784236]  ? inet6_rtm_newroute+0x8a/0xd0
      
      Fixes: 6c8702c6 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
      Signed-off-by: NDavid Lebrun <dlebrun@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      191f86ca
  6. 22 3月, 2018 10 次提交
    • P
      netfilter: nf_tables: do not hold reference on netdevice from preparation phase · 90d2723c
      Pablo Neira Ayuso 提交于
      The netfilter netdevice event handler hold the nfnl_lock mutex, this
      avoids races with a device going away while such device is being
      attached to hooks from the netlink control plane. Therefore, either
      control plane bails out with ENOENT or netdevice event path waits until
      the hook that is attached to net_device is registered.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      90d2723c
    • P
      netfilter: nf_tables: cache device name in flowtable object · d92191aa
      Pablo Neira Ayuso 提交于
      Devices going away have to grab the nfnl_lock from the netdev event path
      to avoid races with control plane updates.
      
      However, netlink dumps in netfilter do not hold nfnl_lock mutex. Cache
      the device name into the objects to avoid an use-after-free situation
      for a device that is going away.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      d92191aa
    • P
      netfilter: drop template ct when conntrack is skipped. · aebfa52a
      Paolo Abeni 提交于
      The ipv4 nf_ct code currently skips the nf_conntrak_in() call
      for fragmented packets. As a results later matches/target can end
      up manipulating template ct entry instead of 'real' ones.
      
      Exploiting the above, syzbot found a way to trigger the following
      splat:
      
      WARNING: CPU: 1 PID: 4242 at net/netfilter/xt_cluster.c:55
      xt_cluster_mt+0x6c1/0x840 net/netfilter/xt_cluster.c:127
      Kernel panic - not syncing: panic_on_warn set ...
      
      CPU: 1 PID: 4242 Comm: syzkaller027971 Not tainted 4.16.0-rc2+ #243
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:17 [inline]
        dump_stack+0x194/0x24d lib/dump_stack.c:53
        panic+0x1e4/0x41c kernel/panic.c:183
        __warn+0x1dc/0x200 kernel/panic.c:547
        report_bug+0x211/0x2d0 lib/bug.c:184
        fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
        fixup_bug arch/x86/kernel/traps.c:247 [inline]
        do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
        do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
        invalid_op+0x58/0x80 arch/x86/entry/entry_64.S:957
      RIP: 0010:xt_cluster_hash net/netfilter/xt_cluster.c:55 [inline]
      RIP: 0010:xt_cluster_mt+0x6c1/0x840 net/netfilter/xt_cluster.c:127
      RSP: 0018:ffff8801d2f6f2d0 EFLAGS: 00010293
      RAX: ffff8801af700540 RBX: 0000000000000000 RCX: ffffffff84a2d1e1
      RDX: 0000000000000000 RSI: ffff8801d2f6f478 RDI: ffff8801cafd336a
      RBP: ffff8801d2f6f2e8 R08: 0000000000000000 R09: 0000000000000001
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801b03b3d18
      R13: ffff8801cafd3300 R14: dffffc0000000000 R15: ffff8801d2f6f478
        ipt_do_table+0xa91/0x19b0 net/ipv4/netfilter/ip_tables.c:296
        iptable_filter_hook+0x65/0x80 net/ipv4/netfilter/iptable_filter.c:41
        nf_hook_entry_hookfn include/linux/netfilter.h:120 [inline]
        nf_hook_slow+0xba/0x1a0 net/netfilter/core.c:483
        nf_hook include/linux/netfilter.h:243 [inline]
        NF_HOOK include/linux/netfilter.h:286 [inline]
        raw_send_hdrinc.isra.17+0xf39/0x1880 net/ipv4/raw.c:432
        raw_sendmsg+0x14cd/0x26b0 net/ipv4/raw.c:669
        inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763
        sock_sendmsg_nosec net/socket.c:629 [inline]
        sock_sendmsg+0xca/0x110 net/socket.c:639
        SYSC_sendto+0x361/0x5c0 net/socket.c:1748
        SyS_sendto+0x40/0x50 net/socket.c:1716
        do_syscall_64+0x280/0x940 arch/x86/entry/common.c:287
        entry_SYSCALL_64_after_hwframe+0x42/0xb7
      RIP: 0033:0x441b49
      RSP: 002b:00007ffff5ca8b18 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000441b49
      RDX: 0000000000000030 RSI: 0000000020ff7000 RDI: 0000000000000003
      RBP: 00000000006cc018 R08: 000000002066354c R09: 0000000000000010
      R10: 0000000000000000 R11: 0000000000000216 R12: 0000000000403470
      R13: 0000000000403500 R14: 0000000000000000 R15: 0000000000000000
      Dumping ftrace buffer:
          (ftrace buffer empty)
      Kernel Offset: disabled
      Rebooting in 86400 seconds..
      
      Instead of adding checks for template ct on every target/match
      manipulating skb->_nfct, simply drop the template ct when skipping
      nf_conntrack_in().
      
      Fixes: 7b4fdf77 ("netfilter: don't track fragmented packets")
      Reported-and-tested-by: syzbot+0346441ae0545cfcea3a@syzkaller.appspotmail.com
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      aebfa52a
    • D
      net/sched: fix idr leak in the error path of tcf_skbmod_init() · f29cdfbe
      Davide Caratti 提交于
      tcf_skbmod_init() can fail after the idr has been successfully reserved.
      When this happens, every subsequent attempt to configure skbmod rules
      using the same idr value will systematically fail with -ENOSPC, unless
      the first attempt was done using the 'replace' keyword:
      
       # tc action add action skbmod swap mac index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc action add action skbmod swap mac index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       # tc action add action skbmod swap mac index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       ...
      
      Fix this in tcf_skbmod_init(), ensuring that tcf_idr_release() is called
      on the error path when the idr has been reserved, but not yet inserted.
      Also, don't test 'ovr' in the error path, to avoid a 'replace' failure
      implicitly become a 'delete' that leaks refcount in act_skbmod module:
      
       # rmmod act_skbmod; modprobe act_skbmod
       # tc action add action skbmod swap mac index 100
       # tc action add action skbmod swap mac continue index 100
       RTNETLINK answers: File exists
       We have an error talking to the kernel
       # tc action replace action skbmod swap mac continue index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc action list action skbmod
       #
       # rmmod  act_skbmod
       rmmod: ERROR: Module act_skbmod is in use
      
      Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f29cdfbe
    • D
      net/sched: fix idr leak in the error path of tcf_vlan_init() · d7f20015
      Davide Caratti 提交于
      tcf_vlan_init() can fail after the idr has been successfully reserved.
      When this happens, every subsequent attempt to configure vlan rules using
      the same idr value will systematically fail with -ENOSPC, unless the first
      attempt was done using the 'replace' keyword.
      
       # tc action add action vlan pop index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc action add action vlan pop index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       # tc action add action vlan pop index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       ...
      
      Fix this in tcf_vlan_init(), ensuring that tcf_idr_release() is called on
      the error path when the idr has been reserved, but not yet inserted. Also,
      don't test 'ovr' in the error path, to avoid a 'replace' failure implicitly
      become a 'delete' that leaks refcount in act_vlan module:
      
       # rmmod act_vlan; modprobe act_vlan
       # tc action add action vlan push id 5 index 100
       # tc action replace action vlan push id 7 index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc action list action vlan
       #
       # rmmod act_vlan
       rmmod: ERROR: Module act_vlan is in use
      
      Fixes: 4c5b9d96 ("act_vlan: VLAN action rewrite to use RCU lock/unlock and update")
      Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7f20015
    • D
      net/sched: fix idr leak in the error path of __tcf_ipt_init() · 1e46ef17
      Davide Caratti 提交于
      __tcf_ipt_init() can fail after the idr has been successfully reserved.
      When this happens, subsequent attempts to configure xt/ipt rules using
      the same idr value systematically fail with -ENOSPC:
      
       # tc action add action xt -j LOG --log-prefix test1 index 100
       tablename: mangle hook: NF_IP_POST_ROUTING
               target:  LOG level warning prefix "test1" index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       Command "(null)" is unknown, try "tc actions help".
       # tc action add action xt -j LOG --log-prefix test1 index 100
       tablename: mangle hook: NF_IP_POST_ROUTING
               target:  LOG level warning prefix "test1" index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       Command "(null)" is unknown, try "tc actions help".
       # tc action add action xt -j LOG --log-prefix test1 index 100
       tablename: mangle hook: NF_IP_POST_ROUTING
               target:  LOG level warning prefix "test1" index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       ...
      
      Fix this in the error path of __tcf_ipt_init(), calling tcf_idr_release()
      in place of tcf_idr_cleanup(). Since tcf_ipt_release() can now be called
      when tcfi_t is NULL, we also need to protect calls to ipt_destroy_target()
      to avoid NULL pointer dereference.
      
      Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e46ef17
    • D
      net/sched: fix idr leak in the error path of tcp_pedit_init() · 94fa3f92
      Davide Caratti 提交于
      tcf_pedit_init() can fail to allocate 'keys' after the idr has been
      successfully reserved. When this happens, subsequent attempts to configure
      a pedit rule using the same idr value systematically fail with -ENOSPC:
      
       # tc action add action pedit munge ip ttl set 63 index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc action add action pedit munge ip ttl set 63 index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       # tc action add action pedit munge ip ttl set 63 index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       ...
      
      Fix this in the error path of tcf_act_pedit_init(), calling
      tcf_idr_release() in place of tcf_idr_cleanup().
      
      Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      94fa3f92
    • D
      net/sched: fix idr leak in the error path of tcf_act_police_init() · 5bf7f818
      Davide Caratti 提交于
      tcf_act_police_init() can fail after the idr has been successfully
      reserved (e.g., qdisc_get_rtab() may return NULL). When this happens,
      subsequent attempts to configure a police rule using the same idr value
      systematiclly fail with -ENOSPC:
      
       # tc action add action police rate 1000 burst 1000 drop index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc action add action police rate 1000 burst 1000 drop index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       # tc action add action police rate 1000 burst 1000 drop index 100
       RTNETLINK answers: No space left on device
       ...
      
      Fix this in the error path of tcf_act_police_init(), calling
      tcf_idr_release() in place of tcf_idr_cleanup().
      
      Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5bf7f818
    • D
      net/sched: fix idr leak in the error path of tcf_simp_init() · 60e10b3a
      Davide Caratti 提交于
      if the kernel fails to duplicate 'sdata', creation of a new action fails
      with -ENOMEM. However, subsequent attempts to install the same action
      using the same value of 'index' systematically fail with -ENOSPC, and
      that value of 'index' will no more be usable by act_simple, until rmmod /
      insmod of act_simple.ko is done:
      
       # tc actions add action simple sdata hello index 100
       # tc actions list action simple
      
              action order 0: Simple <hello>
               index 100 ref 1 bind 0
       # tc actions flush action simple
       # tc actions add action simple sdata hello index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc actions flush action simple
       # tc actions add action simple sdata hello index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       # tc actions add action simple sdata hello index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       ...
      
      Fix this in the error path of tcf_simp_init(), calling tcf_idr_release()
      in place of tcf_idr_cleanup().
      
      Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
      Suggested-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60e10b3a
    • D
      net/sched: fix idr leak on the error path of tcf_bpf_init() · bbc09e78
      Davide Caratti 提交于
      when the following command sequence is entered
      
       # tc action add action bpf bytecode '4,40 0 0 12,31 0 1 2048,6 0 0 262144,6 0 0 0' index 100
       RTNETLINK answers: Invalid argument
       We have an error talking to the kernel
       # tc action add action bpf bytecode '4,40 0 0 12,21 0 1 2048,6 0 0 262144,6 0 0 0' index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
      
      act_bpf correctly refuses to install the first TC rule, because 31 is not
      a valid instruction. However, it refuses to install the second TC rule,
      even if the BPF code is correct. Furthermore, it's no more possible to
      install any other rule having the same value of 'index' until act_bpf
      module is unloaded/inserted again. After the idr has been reserved, call
      tcf_idr_release() instead of tcf_idr_cleanup(), to fix this issue.
      
      Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bbc09e78
  7. 21 3月, 2018 2 次提交
  8. 20 3月, 2018 4 次提交
    • F
      netfilter: nf_tables: add missing netlink attrs to policies · 467697d2
      Florian Westphal 提交于
      Fixes: 8aeff920 ("netfilter: nf_tables: add stateful object reference to set elements")
      Fixes: f25ad2e9 ("netfilter: nf_tables: prepare for expressions associated to set elements")
      Fixes: 1a94e38d ("netfilter: nf_tables: add NFTA_RULE_ID attribute")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      467697d2
    • A
      devlink: Remove redundant free on error path · 7fe4d6dc
      Arkadi Sharshevsky 提交于
      The current code performs unneeded free. Remove the redundant skb freeing
      during the error path.
      
      Fixes: 1555d204 ("devlink: Support for pipeline debug (dpipe)")
      Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7fe4d6dc
    • F
      netfilter: nf_tables: permit second nat hook if colliding hook is going away · ae6153b5
      Florian Westphal 提交于
      Sergei Trofimovich reported that restoring an nft ruleset doesn't work
      anymore unless old rule content is flushed first.
      
      The problem stems from a recent change designed to prevent multiple nat
      hooks at the same hook point locations and nftables transaction model.
      
      A 'flush ruleset' won't take effect until the entire transaction has
      completed.
      
      So, if one has a nft.rules file that contains a 'flush ruleset',
      followed by a nat hook register request, then 'nft -f file' will work,
      but running 'nft -f file' again will fail with -EBUSY.
      
      Reason is that nftables will place the flush/removal requests in the
      transaction list, but it will not act on the removal until after all new
      rules are in place.
      
      The netfilter core will therefore get request to register a new nat
      hook before the old one is removed -- this now fails as the netfilter
      core can't know the existing hook is staged for removal.
      
      To fix this, we can search the transaction log when a hook collision
      is detected.  The collision is okay if
      
       1. there is a delete request pending for the nat hook that is already
          registered.
       2. there is no second add request for a matching nat hook.
          This is required to only apply the exception once.
      
      Fixes: f92b40a8 ("netfilter: core: only allow one nat hook per hook point")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ae6153b5
    • F
      netfilter: nf_tables: meter: pick a set backend that supports updates · 4f2921ca
      Florian Westphal 提交于
      in nftables, 'meter' can be used to instantiate a hash-table at run
      time:
      
      rule add filter forward iif "internal" meter hostacct { ip saddr counter}
      nft list meter ip filter hostacct
      table ip filter {
        meter hostacct {
          type ipv4_addr
          elements = { 192.168.0.1 : counter packets 8 bytes 2672, ..
      
      because elemets get added on the fly, the kernel must chose a set
      backend type that implements the ->update() function, otherwise
      rule insertion fails with EOPNOTSUPP.
      
      Therefore, skip set types that lack ->update, and also
      make sure we do not discard a (bad) candidate when we did yet
      find any candidate at all.  This could happen when userspace prefers
      low memory footprint -- the set implementation currently checked might
      not be a fit at all.  Make sure we pick it anyway (!bops).  In
      case next candidate is a better fix, it will be chosen instead.
      
      But in case nothing else is found we at least have a non-ideal
      match rather than no match at all.
      
      Fixes: 6c03ae21 ("netfilter: nft_set_hash: add non-resizable hashtable implementation")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      4f2921ca
  9. 18 3月, 2018 8 次提交
    • S
      batman-adv: Fix skbuff rcsum on packet reroute · fc04fdb2
      Sven Eckelmann 提交于
      batadv_check_unicast_ttvn may redirect a packet to itself or another
      originator. This involves rewriting the ttvn and the destination address in
      the batadv unicast header. These field were not yet pulled (with skb rcsum
      update) and thus any change to them also requires a change in the receive
      checksum.
      Reported-by: NMatthias Schiffer <mschiffer@universe-factory.net>
      Fixes: a73105b8 ("batman-adv: improved client announcement mechanism")
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NSimon Wunderlich <sw@simonwunderlich.de>
      fc04fdb2
    • S
      batman-adv: Add missing include for EPOLL* constants · 48881ed5
      Sven Eckelmann 提交于
      Fixes: a9a08845 ("vfs: do bulk POLL* -> EPOLL* replacement")
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NSimon Wunderlich <sw@simonwunderlich.de>
      48881ed5
    • D
      net/sched: fix NULL dereference on the error path of tcf_skbmod_init() · 2d433610
      Davide Caratti 提交于
      when the following command
      
       # tc action replace action skbmod swap mac index 100
      
      is run for the first time, and tcf_skbmod_init() fails to allocate struct
      tcf_skbmod_params, tcf_skbmod_cleanup() calls kfree_rcu(NULL), thus
      causing the following error:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
       IP: __call_rcu+0x23/0x2b0
       PGD 8000000034057067 P4D 8000000034057067 PUD 74937067 PMD 0
       Oops: 0002 [#1] SMP PTI
       Modules linked in: act_skbmod(E) psample ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 snd_hda_codec_generic snd_hda_intel snd_hda_codec crct10dif_pclmul mbcache jbd2 crc32_pclmul snd_hda_core ghash_clmulni_intel snd_hwdep pcbc snd_seq snd_seq_device snd_pcm aesni_intel snd_timer crypto_simd glue_helper snd cryptd virtio_balloon joydev soundcore pcspkr i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm virtio_console virtio_net virtio_blk ata_piix libata crc32c_intel virtio_pci serio_raw virtio_ring virtio i2c_core floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_skbmod]
       CPU: 3 PID: 3144 Comm: tc Tainted: G            E    4.16.0-rc4.act_vlan.orig+ #403
       Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
       RIP: 0010:__call_rcu+0x23/0x2b0
       RSP: 0018:ffffbd2e403e7798 EFLAGS: 00010246
       RAX: ffffffffc0872080 RBX: ffff981d34bff780 RCX: 00000000ffffffff
       RDX: ffffffff922a5f00 RSI: 0000000000000000 RDI: 0000000000000000
       RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000000021f
       R10: 000000003d003000 R11: 0000000000aaaaaa R12: 0000000000000000
       R13: ffffffff922a5f00 R14: 0000000000000001 R15: ffff981d3b698c2c
       FS:  00007f3678292740(0000) GS:ffff981d3fd80000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000008 CR3: 000000007c57a006 CR4: 00000000001606e0
       Call Trace:
        __tcf_idr_release+0x79/0xf0
        tcf_skbmod_init+0x1d1/0x210 [act_skbmod]
        tcf_action_init_1+0x2cc/0x430
        tcf_action_init+0xd3/0x1b0
        tc_ctl_action+0x18b/0x240
        rtnetlink_rcv_msg+0x29c/0x310
        ? _cond_resched+0x15/0x30
        ? __kmalloc_node_track_caller+0x1b9/0x270
        ? rtnl_calcit.isra.28+0x100/0x100
        netlink_rcv_skb+0xd2/0x110
        netlink_unicast+0x17c/0x230
        netlink_sendmsg+0x2cd/0x3c0
        sock_sendmsg+0x30/0x40
        ___sys_sendmsg+0x27a/0x290
        ? filemap_map_pages+0x34a/0x3a0
        ? __handle_mm_fault+0xbfd/0xe20
        __sys_sendmsg+0x51/0x90
        do_syscall_64+0x6e/0x1a0
        entry_SYSCALL_64_after_hwframe+0x3d/0xa2
       RIP: 0033:0x7f36776a3ba0
       RSP: 002b:00007fff4703b618 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 00007fff4703b740 RCX: 00007f36776a3ba0
       RDX: 0000000000000000 RSI: 00007fff4703b690 RDI: 0000000000000003
       RBP: 000000005aaaba36 R08: 0000000000000002 R09: 0000000000000000
       R10: 00007fff4703b0a0 R11: 0000000000000246 R12: 0000000000000000
       R13: 00007fff4703b754 R14: 0000000000000001 R15: 0000000000669f60
       Code: 5d e9 42 da ff ff 66 90 0f 1f 44 00 00 41 57 41 56 41 55 49 89 d5 41 54 55 48 89 fd 53 48 83 ec 08 40 f6 c7 07 0f 85 19 02 00 00 <48> 89 75 08 48 c7 45 00 00 00 00 00 9c 58 0f 1f 44 00 00 49 89
       RIP: __call_rcu+0x23/0x2b0 RSP: ffffbd2e403e7798
       CR2: 0000000000000008
      
      Fix it in tcf_skbmod_cleanup(), ensuring that kfree_rcu(p, ...) is called
      only when p is not NULL.
      
      Fixes: 86da71b5 ("net_sched: Introduce skbmod action")
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d433610
    • D
      net/sched: fix NULL dereference in the error path of tcf_sample_init() · 1f110e7c
      Davide Caratti 提交于
      when the following command
      
       # tc action add action sample rate 100 group 100 index 100
      
      is run for the first time, and psample_group_get(100) fails to create a
      new group, tcf_sample_cleanup() calls psample_group_put(NULL), thus
      causing the following error:
      
       BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
       IP: psample_group_put+0x15/0x71 [psample]
       PGD 8000000075775067 P4D 8000000075775067 PUD 7453c067 PMD 0
       Oops: 0002 [#1] SMP PTI
       Modules linked in: act_sample(E) psample ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core mbcache jbd2 crct10dif_pclmul snd_hwdep crc32_pclmul snd_seq ghash_clmulni_intel pcbc snd_seq_device snd_pcm aesni_intel crypto_simd snd_timer glue_helper snd cryptd joydev pcspkr i2c_piix4 soundcore virtio_balloon nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm virtio_net ata_piix virtio_console virtio_blk libata serio_raw crc32c_intel virtio_pci i2c_core virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_tunnel_key]
       CPU: 2 PID: 5740 Comm: tc Tainted: G            E    4.16.0-rc4.act_vlan.orig+ #403
       Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
       RIP: 0010:psample_group_put+0x15/0x71 [psample]
       RSP: 0018:ffffb8a80032f7d0 EFLAGS: 00010246
       RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000024
       RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffffffc06d93c0
       RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000044
       R10: 00000000bd003000 R11: ffff979fba04aa59 R12: 0000000000000000
       R13: 0000000000000000 R14: 0000000000000000 R15: ffff979fbba3f22c
       FS:  00007f7638112740(0000) GS:ffff979fbfd00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 000000000000001c CR3: 00000000734ea001 CR4: 00000000001606e0
       Call Trace:
        __tcf_idr_release+0x79/0xf0
        tcf_sample_init+0x125/0x1d0 [act_sample]
        tcf_action_init_1+0x2cc/0x430
        tcf_action_init+0xd3/0x1b0
        tc_ctl_action+0x18b/0x240
        rtnetlink_rcv_msg+0x29c/0x310
        ? _cond_resched+0x15/0x30
        ? __kmalloc_node_track_caller+0x1b9/0x270
        ? rtnl_calcit.isra.28+0x100/0x100
        netlink_rcv_skb+0xd2/0x110
        netlink_unicast+0x17c/0x230
        netlink_sendmsg+0x2cd/0x3c0
        sock_sendmsg+0x30/0x40
        ___sys_sendmsg+0x27a/0x290
        ? filemap_map_pages+0x34a/0x3a0
        ? __handle_mm_fault+0xbfd/0xe20
        __sys_sendmsg+0x51/0x90
        do_syscall_64+0x6e/0x1a0
        entry_SYSCALL_64_after_hwframe+0x3d/0xa2
       RIP: 0033:0x7f7637523ba0
       RSP: 002b:00007fff0473ef58 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 00007fff0473f080 RCX: 00007f7637523ba0
       RDX: 0000000000000000 RSI: 00007fff0473efd0 RDI: 0000000000000003
       RBP: 000000005aaaac80 R08: 0000000000000002 R09: 0000000000000000
       R10: 00007fff0473e9e0 R11: 0000000000000246 R12: 0000000000000000
       R13: 00007fff0473f094 R14: 0000000000000001 R15: 0000000000669f60
       Code: be 02 00 00 00 48 89 df e8 a9 fe ff ff e9 7c ff ff ff 0f 1f 40 00 0f 1f 44 00 00 53 48 89 fb 48 c7 c7 c0 93 6d c0 e8 db 20 8c ef <83> 6b 1c 01 74 10 48 c7 c7 c0 93 6d c0 ff 14 25 e8 83 83 b0 5b
       RIP: psample_group_put+0x15/0x71 [psample] RSP: ffffb8a80032f7d0
       CR2: 000000000000001c
      
      Fix it in tcf_sample_cleanup(), ensuring that calls to psample_group_put(p)
      are done only when p is not NULL.
      
      Fixes: cadb9c9f ("net/sched: act_sample: Fix error path in init")
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f110e7c
    • D
      net/sched: fix NULL dereference in the error path of tunnel_key_init() · abdadd3c
      Davide Caratti 提交于
      when the following command
      
       # tc action add action tunnel_key unset index 100
      
      is run for the first time, and tunnel_key_init() fails to allocate struct
      tcf_tunnel_key_params, tunnel_key_release() dereferences NULL pointers.
      This causes the following error:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
       IP: tunnel_key_release+0xd/0x40 [act_tunnel_key]
       PGD 8000000033787067 P4D 8000000033787067 PUD 74646067 PMD 0
       Oops: 0000 [#1] SMP PTI
       Modules linked in: act_tunnel_key(E) act_csum ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 mbcache jbd2 crct10dif_pclmul crc32_pclmul snd_hda_codec_generic ghash_clmulni_intel snd_hda_intel pcbc snd_hda_codec snd_hda_core snd_hwdep snd_seq aesni_intel snd_seq_device crypto_simd glue_helper snd_pcm cryptd joydev snd_timer pcspkr virtio_balloon snd i2c_piix4 soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_net virtio_blk drm virtio_console crc32c_intel ata_piix serio_raw i2c_core virtio_pci libata virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod
       CPU: 2 PID: 3101 Comm: tc Tainted: G            E    4.16.0-rc4.act_vlan.orig+ #403
       Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
       RIP: 0010:tunnel_key_release+0xd/0x40 [act_tunnel_key]
       RSP: 0018:ffffba46803b7768 EFLAGS: 00010286
       RAX: ffffffffc09010a0 RBX: 0000000000000000 RCX: 0000000000000024
       RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff99ee336d7480
       RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000044
       R10: 0000000000000220 R11: ffff99ee79d73131 R12: 0000000000000000
       R13: ffff99ee32d67610 R14: ffff99ee7671dc38 R15: 00000000fffffff4
       FS:  00007febcb2cd740(0000) GS:ffff99ee7fd00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000010 CR3: 000000007c8e4005 CR4: 00000000001606e0
       Call Trace:
        __tcf_idr_release+0x79/0xf0
        tunnel_key_init+0xd9/0x460 [act_tunnel_key]
        tcf_action_init_1+0x2cc/0x430
        tcf_action_init+0xd3/0x1b0
        tc_ctl_action+0x18b/0x240
        rtnetlink_rcv_msg+0x29c/0x310
        ? _cond_resched+0x15/0x30
        ? __kmalloc_node_track_caller+0x1b9/0x270
        ? rtnl_calcit.isra.28+0x100/0x100
        netlink_rcv_skb+0xd2/0x110
        netlink_unicast+0x17c/0x230
        netlink_sendmsg+0x2cd/0x3c0
        sock_sendmsg+0x30/0x40
        ___sys_sendmsg+0x27a/0x290
        __sys_sendmsg+0x51/0x90
        do_syscall_64+0x6e/0x1a0
        entry_SYSCALL_64_after_hwframe+0x3d/0xa2
       RIP: 0033:0x7febca6deba0
       RSP: 002b:00007ffe7b0dd128 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 00007ffe7b0dd250 RCX: 00007febca6deba0
       RDX: 0000000000000000 RSI: 00007ffe7b0dd1a0 RDI: 0000000000000003
       RBP: 000000005aaa90cb R08: 0000000000000002 R09: 0000000000000000
       R10: 00007ffe7b0dcba0 R11: 0000000000000246 R12: 0000000000000000
       R13: 00007ffe7b0dd264 R14: 0000000000000001 R15: 0000000000669f60
       Code: 44 00 00 8b 0d b5 23 00 00 48 8b 87 48 10 00 00 48 8b 3c c8 e9 a5 e5 d8 c3 0f 1f 44 00 00 0f 1f 44 00 00 53 48 8b 9f b0 00 00 00 <83> 7b 10 01 74 0b 48 89 df 31 f6 5b e9 f2 fa 7f c3 48 8b 7b 18
       RIP: tunnel_key_release+0xd/0x40 [act_tunnel_key] RSP: ffffba46803b7768
       CR2: 0000000000000010
      
      Fix this in tunnel_key_release(), ensuring 'param' is not NULL before
      dereferencing it.
      
      Fixes: d0f6dd8a ("net/sched: Introduce act_tunnel_key")
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      abdadd3c
    • D
      net/sched: fix NULL dereference in the error path of tcf_csum_init() · aab378a7
      Davide Caratti 提交于
      when the following command
      
       # tc action add action csum udp continue index 100
      
      is run for the first time, and tcf_csum_init() fails allocating struct
      tcf_csum, tcf_csum_cleanup() calls kfree_rcu(NULL,...). This causes the
      following error:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
       IP: __call_rcu+0x23/0x2b0
       PGD 80000000740b4067 P4D 80000000740b4067 PUD 32e7f067 PMD 0
       Oops: 0002 [#1] SMP PTI
       Modules linked in: act_csum(E) act_vlan ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 mbcache jbd2 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_generic pcbc snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer aesni_intel crypto_simd glue_helper cryptd snd joydev pcspkr virtio_balloon i2c_piix4 soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_blk drm virtio_net virtio_console ata_piix crc32c_intel libata virtio_pci serio_raw i2c_core virtio_ring virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_vlan]
       CPU: 2 PID: 5763 Comm: tc Tainted: G            E    4.16.0-rc4.act_vlan.orig+ #403
       Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
       RIP: 0010:__call_rcu+0x23/0x2b0
       RSP: 0018:ffffb275803e77c0 EFLAGS: 00010246
       RAX: ffffffffc057b080 RBX: ffff9674bc6f5240 RCX: 00000000ffffffff
       RDX: ffffffff928a5f00 RSI: 0000000000000008 RDI: 0000000000000008
       RBP: 0000000000000008 R08: 0000000000000001 R09: 0000000000000044
       R10: 0000000000000220 R11: ffff9674b9ab4821 R12: 0000000000000000
       R13: ffffffff928a5f00 R14: 0000000000000000 R15: 0000000000000001
       FS:  00007fa6368d8740(0000) GS:ffff9674bfd00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000010 CR3: 0000000073dec001 CR4: 00000000001606e0
       Call Trace:
        __tcf_idr_release+0x79/0xf0
        tcf_csum_init+0xfb/0x180 [act_csum]
        tcf_action_init_1+0x2cc/0x430
        tcf_action_init+0xd3/0x1b0
        tc_ctl_action+0x18b/0x240
        rtnetlink_rcv_msg+0x29c/0x310
        ? _cond_resched+0x15/0x30
        ? __kmalloc_node_track_caller+0x1b9/0x270
        ? rtnl_calcit.isra.28+0x100/0x100
        netlink_rcv_skb+0xd2/0x110
        netlink_unicast+0x17c/0x230
        netlink_sendmsg+0x2cd/0x3c0
        sock_sendmsg+0x30/0x40
        ___sys_sendmsg+0x27a/0x290
        ? filemap_map_pages+0x34a/0x3a0
        ? __handle_mm_fault+0xbfd/0xe20
        __sys_sendmsg+0x51/0x90
        do_syscall_64+0x6e/0x1a0
        entry_SYSCALL_64_after_hwframe+0x3d/0xa2
       RIP: 0033:0x7fa635ce9ba0
       RSP: 002b:00007ffc185b0fc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 00007ffc185b10f0 RCX: 00007fa635ce9ba0
       RDX: 0000000000000000 RSI: 00007ffc185b1040 RDI: 0000000000000003
       RBP: 000000005aaa85e0 R08: 0000000000000002 R09: 0000000000000000
       R10: 00007ffc185b0a20 R11: 0000000000000246 R12: 0000000000000000
       R13: 00007ffc185b1104 R14: 0000000000000001 R15: 0000000000669f60
       Code: 5d e9 42 da ff ff 66 90 0f 1f 44 00 00 41 57 41 56 41 55 49 89 d5 41 54 55 48 89 fd 53 48 83 ec 08 40 f6 c7 07 0f 85 19 02 00 00 <48> 89 75 08 48 c7 45 00 00 00 00 00 9c 58 0f 1f 44 00 00 49 89
       RIP: __call_rcu+0x23/0x2b0 RSP: ffffb275803e77c0
       CR2: 0000000000000010
      
      fix this in tcf_csum_cleanup(), ensuring that kfree_rcu(param, ...) is
      called only when param is not NULL.
      
      Fixes: 9c5f69bb ("net/sched: act_csum: don't use spinlock in the fast path")
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aab378a7
    • D
      net/sched: fix NULL dereference in the error path of tcf_vlan_init() · 1edf8abe
      Davide Caratti 提交于
      when the following command
      
       # tc actions replace action vlan pop index 100
      
      is run for the first time, and tcf_vlan_init() fails allocating struct
      tcf_vlan_params, tcf_vlan_cleanup() calls kfree_rcu(NULL, ...). This causes
      the following error:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
       IP: __call_rcu+0x23/0x2b0
       PGD 80000000760a2067 P4D 80000000760a2067 PUD 742c1067 PMD 0
       Oops: 0002 [#1] SMP PTI
       Modules linked in: act_vlan(E) ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 snd_hda_codec_generic snd_hda_intel mbcache snd_hda_codec jbd2 snd_hda_core crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc snd_hwdep snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd snd_timer glue_helper snd cryptd joydev soundcore virtio_balloon pcspkr i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm virtio_console virtio_blk virtio_net ata_piix crc32c_intel libata virtio_pci i2c_core virtio_ring serio_raw virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_vlan]
       CPU: 3 PID: 3119 Comm: tc Tainted: G            E    4.16.0-rc4.act_vlan.orig+ #403
       Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
       RIP: 0010:__call_rcu+0x23/0x2b0
       RSP: 0018:ffffaac3005fb798 EFLAGS: 00010246
       RAX: ffffffffc0704080 RBX: ffff97f2b4bbe900 RCX: 00000000ffffffff
       RDX: ffffffffabca5f00 RSI: 0000000000000010 RDI: 0000000000000010
       RBP: 0000000000000010 R08: 0000000000000001 R09: 0000000000000044
       R10: 00000000fd003000 R11: ffff97f2faab5b91 R12: 0000000000000000
       R13: ffffffffabca5f00 R14: ffff97f2fb80202c R15: 00000000fffffff4
       FS:  00007f68f75b4740(0000) GS:ffff97f2ffd80000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000018 CR3: 0000000072b52001 CR4: 00000000001606e0
       Call Trace:
        __tcf_idr_release+0x79/0xf0
        tcf_vlan_init+0x168/0x270 [act_vlan]
        tcf_action_init_1+0x2cc/0x430
        tcf_action_init+0xd3/0x1b0
        tc_ctl_action+0x18b/0x240
        rtnetlink_rcv_msg+0x29c/0x310
        ? _cond_resched+0x15/0x30
        ? __kmalloc_node_track_caller+0x1b9/0x270
        ? rtnl_calcit.isra.28+0x100/0x100
        netlink_rcv_skb+0xd2/0x110
        netlink_unicast+0x17c/0x230
        netlink_sendmsg+0x2cd/0x3c0
        sock_sendmsg+0x30/0x40
        ___sys_sendmsg+0x27a/0x290
        ? filemap_map_pages+0x34a/0x3a0
        ? __handle_mm_fault+0xbfd/0xe20
        __sys_sendmsg+0x51/0x90
        do_syscall_64+0x6e/0x1a0
        entry_SYSCALL_64_after_hwframe+0x3d/0xa2
       RIP: 0033:0x7f68f69c5ba0
       RSP: 002b:00007fffd79c1118 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 00007fffd79c1240 RCX: 00007f68f69c5ba0
       RDX: 0000000000000000 RSI: 00007fffd79c1190 RDI: 0000000000000003
       RBP: 000000005aaa708e R08: 0000000000000002 R09: 0000000000000000
       R10: 00007fffd79c0ba0 R11: 0000000000000246 R12: 0000000000000000
       R13: 00007fffd79c1254 R14: 0000000000000001 R15: 0000000000669f60
       Code: 5d e9 42 da ff ff 66 90 0f 1f 44 00 00 41 57 41 56 41 55 49 89 d5 41 54 55 48 89 fd 53 48 83 ec 08 40 f6 c7 07 0f 85 19 02 00 00 <48> 89 75 08 48 c7 45 00 00 00 00 00 9c 58 0f 1f 44 00 00 49 89
       RIP: __call_rcu+0x23/0x2b0 RSP: ffffaac3005fb798
       CR2: 0000000000000018
      
      fix this in tcf_vlan_cleanup(), ensuring that kfree_rcu(p, ...) is called
      only when p is not NULL.
      
      Fixes: 4c5b9d96 ("act_vlan: VLAN action rewrite to use RCU lock/unlock and update")
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NManish Kurup <manish.kurup@verizon.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1edf8abe
    • E
      net: sched: fix uses after free · cce6294c
      Eric Dumazet 提交于
      syzbot reported one use-after-free in pfifo_fast_enqueue() [1]
      
      Issue here is that we can not reuse skb after a successful skb_array_produce()
      since another cpu might have consumed it already.
      
      I believe a similar problem exists in try_bulk_dequeue_skb_slow()
      in case we put an skb into qdisc_enqueue_skb_bad_txq() for lockless qdisc.
      
      [1]
      BUG: KASAN: use-after-free in qdisc_pkt_len include/net/sch_generic.h:610 [inline]
      BUG: KASAN: use-after-free in qdisc_qstats_cpu_backlog_inc include/net/sch_generic.h:712 [inline]
      BUG: KASAN: use-after-free in pfifo_fast_enqueue+0x4bc/0x5e0 net/sched/sch_generic.c:639
      Read of size 4 at addr ffff8801cede37e8 by task syzkaller717588/5543
      
      CPU: 1 PID: 5543 Comm: syzkaller717588 Not tainted 4.16.0-rc4+ #265
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x194/0x24d lib/dump_stack.c:53
       print_address_description+0x73/0x250 mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report+0x23c/0x360 mm/kasan/report.c:412
       __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
       qdisc_pkt_len include/net/sch_generic.h:610 [inline]
       qdisc_qstats_cpu_backlog_inc include/net/sch_generic.h:712 [inline]
       pfifo_fast_enqueue+0x4bc/0x5e0 net/sched/sch_generic.c:639
       __dev_xmit_skb net/core/dev.c:3216 [inline]
      
      Fixes: c5ad119f ("net: sched: pfifo_fast use skb_array")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: syzbot+ed43b6903ab968b16f54@syzkaller.appspotmail.com
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc:	Cong Wang <xiyou.wangcong@gmail.com>
      Cc:	Jiri Pirko <jiri@resnulli.us>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cce6294c
  10. 17 3月, 2018 4 次提交