1. 02 4月, 2018 3 次提交
  2. 31 3月, 2018 3 次提交
    • D
      net/ipv6: Fix route leaking between VRFs · b6cdbc85
      David Ahern 提交于
      Donald reported that IPv6 route leaking between VRFs is not working.
      The root cause is the strict argument in the call to rt6_lookup when
      validating the nexthop spec.
      
      ip6_route_check_nh validates the gateway and device (if given) of a
      route spec. It in turn could call rt6_lookup (e.g., lookup in a given
      table did not succeed so it falls back to a full lookup) and if so
      sets the strict argument to 1. That means if the egress device is given,
      the route lookup needs to return a result with the same device. This
      strict requirement does not work with VRFs (IPv4 or IPv6) because the
      oif in the flow struct is overridden with the index of the VRF device
      to trigger a match on the l3mdev rule and force the lookup to its table.
      
      The right long term solution is to add an l3mdev index to the flow
      struct such that the oif is not overridden. That solution will not
      backport well, so this patch aims for a simpler solution to relax the
      strict argument if the route spec device is an l3mdev slave. As done
      in other places, use the FLOWI_FLAG_SKIP_NH_OIF to know that the
      RT6_LOOKUP_F_IFACE flag needs to be removed.
      
      Fixes: ca254490 ("net: Add VRF support to IPv6 stack")
      Reported-by: NDonald Sharp <sharpd@cumulusnetworks.com>
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6cdbc85
    • D
      ipv6: sr: fix seg6 encap performances with TSO enabled · 5807b22c
      David Lebrun 提交于
      Enabling TSO can lead to abysmal performances when using seg6 in
      encap mode, such as with the ixgbe driver. This patch adds a call to
      iptunnel_handle_offloads() to remove the encapsulation bit if needed.
      
      Before:
      root@comp4-seg6bpf:~# iperf3 -c fc00::55
      Connecting to host fc00::55, port 5201
      [  4] local fc45::4 port 36592 connected to fc00::55 port 5201
      [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
      [  4]   0.00-1.00   sec   196 KBytes  1.60 Mbits/sec   47   6.66 KBytes
      [  4]   1.00-2.00   sec   304 KBytes  2.49 Mbits/sec  100   5.33 KBytes
      [  4]   2.00-3.00   sec   284 KBytes  2.32 Mbits/sec   92   5.33 KBytes
      
      After:
      root@comp4-seg6bpf:~# iperf3 -c fc00::55
      Connecting to host fc00::55, port 5201
      [  4] local fc45::4 port 43062 connected to fc00::55 port 5201
      [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
      [  4]   0.00-1.00   sec  1.03 GBytes  8.89 Gbits/sec    0    743 KBytes
      [  4]   1.00-2.00   sec  1.03 GBytes  8.87 Gbits/sec    0    743 KBytes
      [  4]   2.00-3.00   sec  1.03 GBytes  8.87 Gbits/sec    0    743 KBytes
      Reported-by: NTom Herbert <tom@quantonium.net>
      Fixes: 6c8702c6 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
      Signed-off-by: NDavid Lebrun <dlebrun@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5807b22c
    • T
      net: Fix untag for vlan packets without ethernet header · ae474573
      Toshiaki Makita 提交于
      In some situation vlan packets do not have ethernet headers. One example
      is packets from tun devices. Users can specify vlan protocol in tun_pi
      field instead of IP protocol, and skb_vlan_untag() attempts to untag such
      packets.
      
      skb_vlan_untag() (more precisely, skb_reorder_vlan_header() called by it)
      however did not expect packets without ethernet headers, so in such a case
      size argument for memmove() underflowed and triggered crash.
      
      ====
      BUG: unable to handle kernel paging request at ffff8801cccb8000
      IP: __memmove+0x24/0x1a0 arch/x86/lib/memmove_64.S:43
      PGD 9cee067 P4D 9cee067 PUD 1d9401063 PMD 1cccb7063 PTE 2810100028101
      Oops: 000b [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 1 PID: 17663 Comm: syz-executor2 Not tainted 4.16.0-rc7+ #368
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:__memmove+0x24/0x1a0 arch/x86/lib/memmove_64.S:43
      RSP: 0018:ffff8801cc046e28 EFLAGS: 00010287
      RAX: ffff8801ccc244c4 RBX: fffffffffffffffe RCX: fffffffffff6c4c2
      RDX: fffffffffffffffe RSI: ffff8801cccb7ffc RDI: ffff8801cccb8000
      RBP: ffff8801cc046e48 R08: ffff8801ccc244be R09: ffffed0039984899
      R10: 0000000000000001 R11: ffffed0039984898 R12: ffff8801ccc244c4
      R13: ffff8801ccc244c0 R14: ffff8801d96b7c06 R15: ffff8801d96b7b40
      FS:  00007febd562d700(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffff8801cccb8000 CR3: 00000001ccb2f006 CR4: 00000000001606e0
      DR0: 0000000020000000 DR1: 0000000020000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
      Call Trace:
       memmove include/linux/string.h:360 [inline]
       skb_reorder_vlan_header net/core/skbuff.c:5031 [inline]
       skb_vlan_untag+0x470/0xc40 net/core/skbuff.c:5061
       __netif_receive_skb_core+0x119c/0x3460 net/core/dev.c:4460
       __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4627
       netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4701
       netif_receive_skb+0xae/0x390 net/core/dev.c:4725
       tun_rx_batched.isra.50+0x5ee/0x870 drivers/net/tun.c:1555
       tun_get_user+0x299e/0x3c20 drivers/net/tun.c:1962
       tun_chr_write_iter+0xb9/0x160 drivers/net/tun.c:1990
       call_write_iter include/linux/fs.h:1782 [inline]
       new_sync_write fs/read_write.c:469 [inline]
       __vfs_write+0x684/0x970 fs/read_write.c:482
       vfs_write+0x189/0x510 fs/read_write.c:544
       SYSC_write fs/read_write.c:589 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:581
       do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x42/0xb7
      RIP: 0033:0x454879
      RSP: 002b:00007febd562cc68 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 00007febd562d6d4 RCX: 0000000000454879
      RDX: 0000000000000157 RSI: 0000000020000180 RDI: 0000000000000014
      RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00000000000006b0 R14: 00000000006fc120 R15: 0000000000000000
      Code: 90 90 90 90 90 90 90 48 89 f8 48 83 fa 20 0f 82 03 01 00 00 48 39 fe 7d 0f 49 89 f0 49 01 d0 49 39 f8 0f 8f 9f 00 00 00 48 89 d1 <f3> a4 c3 48 81 fa a8 02 00 00 72 05 40 38 fe 74 3b 48 83 ea 20
      RIP: __memmove+0x24/0x1a0 arch/x86/lib/memmove_64.S:43 RSP: ffff8801cc046e28
      CR2: ffff8801cccb8000
      ====
      
      We don't need to copy headers for packets which do not have preceding
      headers of vlan headers, so skip memmove() in that case.
      
      Fixes: 4bbb3e0e ("net: Fix vlan untag for bridge and vlan_dev with reorder_hdr off")
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae474573
  3. 29 3月, 2018 2 次提交
  4. 27 3月, 2018 6 次提交
    • U
      net/smc: use announced length in sock_recvmsg() · ab6f6dd1
      Ursula Braun 提交于
      Not every CLC proposal message needs the maximum buffer length.
      Due to the MSG_WAITALL flag, it is important to use the peeked
      real length when receiving the message.
      
      Fixes: d63d271c ("smc: switch to sock_recvmsg()")
      Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab6f6dd1
    • C
      llc: properly handle dev_queue_xmit() return value · b85ab56c
      Cong Wang 提交于
      llc_conn_send_pdu() pushes the skb into write queue and
      calls llc_conn_send_pdus() to flush them out. However, the
      status of dev_queue_xmit() is not returned to caller,
      in this case, llc_conn_state_process().
      
      llc_conn_state_process() needs hold the skb no matter
      success or failure, because it still uses it after that,
      therefore we should hold skb before dev_queue_xmit() when
      that skb is the one being processed by llc_conn_state_process().
      
      For other callers, they can just pass NULL and ignore
      the return value as they are.
      Reported-by: NNoam Rathaus <noamr@beyondsecurity.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b85ab56c
    • D
      strparser: Fix sign of err codes · cd00edc1
      Dave Watson 提交于
      strp_parser_err is called with a negative code everywhere, which then
      calls abort_parser with a negative code.  strp_msg_timeout calls
      abort_parser directly with a positive code.  Negate ETIMEDOUT
      to match signed-ness of other calls.
      
      The default abort_parser callback, strp_abort_strp, sets
      sk->sk_err to err.  Also negate the error here so sk_err always
      holds a positive value, as the rest of the net code expects.  Currently
      a negative sk_err can result in endless loops, or user code that
      thinks it actually sent/received err bytes.
      
      Found while testing net/tls_sw recv path.
      
      Fixes: 43a0c675 ("strparser: Stream parser for messages")
      Signed-off-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd00edc1
    • C
      net sched actions: fix dumping which requires several messages to user space · 734549eb
      Craig Dillabaugh 提交于
      Fixes a bug in the tcf_dump_walker function that can cause some actions
      to not be reported when dumping a large number of actions. This issue
      became more aggrevated when cookies feature was added. In particular
      this issue is manifest when large cookie values are assigned to the
      actions and when enough actions are created that the resulting table
      must be dumped in multiple batches.
      
      The number of actions returned in each batch is limited by the total
      number of actions and the memory buffer size.  With small cookies
      the numeric limit is reached before the buffer size limit, which avoids
      the code path triggering this bug. When large cookies are used buffer
      fills before the numeric limit, and the erroneous code path is hit.
      
      For example after creating 32 csum actions with the cookie
      aaaabbbbccccdddd
      
      $ tc actions ls action csum
      total acts 26
      
          action order 0: csum (tcp) action continue
          index 1 ref 1 bind 0
          cookie aaaabbbbccccdddd
      
          .....
      
          action order 25: csum (tcp) action continue
          index 26 ref 1 bind 0
          cookie aaaabbbbccccdddd
      total acts 6
      
          action order 0: csum (tcp) action continue
          index 28 ref 1 bind 0
          cookie aaaabbbbccccdddd
      
          ......
      
          action order 5: csum (tcp) action continue
          index 32 ref 1 bind 0
          cookie aaaabbbbccccdddd
      
      Note that the action with index 27 is omitted from the report.
      
      Fixes: 4b3550ef ("[NET_SCHED]: Use nla_nest_start/nla_nest_end")"
      Signed-off-by: NCraig Dillabaugh <cdillaba@mojatatu.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      734549eb
    • E
      net: fix possible out-of-bound read in skb_network_protocol() · 1dfe82eb
      Eric Dumazet 提交于
      skb mac header is not necessarily set at the time skb_network_protocol()
      is called. Use skb->data instead.
      
      BUG: KASAN: slab-out-of-bounds in skb_network_protocol+0x46b/0x4b0 net/core/dev.c:2739
      Read of size 2 at addr ffff8801b3097a0b by task syz-executor5/14242
      
      CPU: 1 PID: 14242 Comm: syz-executor5 Not tainted 4.16.0-rc6+ #280
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x194/0x24d lib/dump_stack.c:53
       print_address_description+0x73/0x250 mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report+0x23c/0x360 mm/kasan/report.c:412
       __asan_report_load_n_noabort+0xf/0x20 mm/kasan/report.c:443
       skb_network_protocol+0x46b/0x4b0 net/core/dev.c:2739
       harmonize_features net/core/dev.c:2924 [inline]
       netif_skb_features+0x509/0x9b0 net/core/dev.c:3011
       validate_xmit_skb+0x81/0xb00 net/core/dev.c:3084
       validate_xmit_skb_list+0xbf/0x120 net/core/dev.c:3142
       packet_direct_xmit+0x117/0x790 net/packet/af_packet.c:256
       packet_snd net/packet/af_packet.c:2944 [inline]
       packet_sendmsg+0x3aed/0x60b0 net/packet/af_packet.c:2969
       sock_sendmsg_nosec net/socket.c:629 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:639
       ___sys_sendmsg+0x767/0x8b0 net/socket.c:2047
       __sys_sendmsg+0xe5/0x210 net/socket.c:2081
      
      Fixes: 19acc327 ("gso: Handle Trans-Ether-Bridging protocol in skb_network_protocol()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Pravin B Shelar <pshelar@ovn.org>
      Reported-by: NReported-by: syzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1dfe82eb
    • J
      net: sched, fix OOO packets with pfifo_fast · eb82a994
      John Fastabend 提交于
      After the qdisc lock was dropped in pfifo_fast we allow multiple
      enqueue threads and dequeue threads to run in parallel. On the
      enqueue side the skb bit ooo_okay is used to ensure all related
      skbs are enqueued in-order. On the dequeue side though there is
      no similar logic. What we observe is with fewer queues than CPUs
      it is possible to re-order packets when two instances of
      __qdisc_run() are running in parallel. Each thread will dequeue
      a skb and then whichever thread calls the ndo op first will
      be sent on the wire. This doesn't typically happen because
      qdisc_run() is usually triggered by the same core that did the
      enqueue. However, drivers will trigger __netif_schedule()
      when queues are transitioning from stopped to awake using the
      netif_tx_wake_* APIs. When this happens netif_schedule() calls
      qdisc_run() on the same CPU that did the netif_tx_wake_* which
      is usually done in the interrupt completion context. This CPU
      is selected with the irq affinity which is unrelated to the
      enqueue operations.
      
      To resolve this we add a RUNNING bit to the qdisc to ensure
      only a single dequeue per qdisc is running. Enqueue and dequeue
      operations can still run in parallel and also on multi queue
      NICs we can still have a dequeue in-flight per qdisc, which
      is typically per CPU.
      
      Fixes: c5ad119f ("net: sched: pfifo_fast use skb_array")
      Reported-by: NJakob Unterwurzacher <jakob.unterwurzacher@theobroma-systems.com>
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb82a994
  5. 26 3月, 2018 3 次提交
    • P
      ipv6: the entire IPv6 header chain must fit the first fragment · 10b8a3de
      Paolo Abeni 提交于
      While building ipv6 datagram we currently allow arbitrary large
      extheaders, even beyond pmtu size. The syzbot has found a way
      to exploit the above to trigger the following splat:
      
      kernel BUG at ./include/linux/skbuff.h:2073!
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
          (ftrace buffer empty)
      Modules linked in:
      CPU: 1 PID: 4230 Comm: syzkaller672661 Not tainted 4.16.0-rc2+ #326
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:__skb_pull include/linux/skbuff.h:2073 [inline]
      RIP: 0010:__ip6_make_skb+0x1ac8/0x2190 net/ipv6/ip6_output.c:1636
      RSP: 0018:ffff8801bc18f0f0 EFLAGS: 00010293
      RAX: ffff8801b17400c0 RBX: 0000000000000738 RCX: ffffffff84f01828
      RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8801b415ac18
      RBP: ffff8801bc18f360 R08: ffff8801b4576844 R09: 0000000000000000
      R10: ffff8801bc18f380 R11: ffffed00367aee4e R12: 00000000000000d6
      R13: ffff8801b415a740 R14: dffffc0000000000 R15: ffff8801b45767c0
      FS:  0000000001535880(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000000002000b000 CR3: 00000001b4123001 CR4: 00000000001606e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
        ip6_finish_skb include/net/ipv6.h:969 [inline]
        udp_v6_push_pending_frames+0x269/0x3b0 net/ipv6/udp.c:1073
        udpv6_sendmsg+0x2a96/0x3400 net/ipv6/udp.c:1343
        inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:764
        sock_sendmsg_nosec net/socket.c:630 [inline]
        sock_sendmsg+0xca/0x110 net/socket.c:640
        ___sys_sendmsg+0x320/0x8b0 net/socket.c:2046
        __sys_sendmmsg+0x1ee/0x620 net/socket.c:2136
        SYSC_sendmmsg net/socket.c:2167 [inline]
        SyS_sendmmsg+0x35/0x60 net/socket.c:2162
        do_syscall_64+0x280/0x940 arch/x86/entry/common.c:287
        entry_SYSCALL_64_after_hwframe+0x42/0xb7
      RIP: 0033:0x4404c9
      RSP: 002b:00007ffdce35f948 EFLAGS: 00000217 ORIG_RAX: 0000000000000133
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004404c9
      RDX: 0000000000000003 RSI: 0000000020001f00 RDI: 0000000000000003
      RBP: 00000000006cb018 R08: 00000000004002c8 R09: 00000000004002c8
      R10: 0000000020000080 R11: 0000000000000217 R12: 0000000000401df0
      R13: 0000000000401e80 R14: 0000000000000000 R15: 0000000000000000
      Code: ff e8 1d 5e b9 fc e9 15 e9 ff ff e8 13 5e b9 fc e9 44 e8 ff ff e8 29
      5e b9 fc e9 c0 e6 ff ff e8 3f f3 80 fc 0f 0b e8 38 f3 80 fc <0f> 0b 49 8d
      87 80 00 00 00 4d 8d 87 84 00 00 00 48 89 85 20 fe
      RIP: __skb_pull include/linux/skbuff.h:2073 [inline] RSP: ffff8801bc18f0f0
      RIP: __ip6_make_skb+0x1ac8/0x2190 net/ipv6/ip6_output.c:1636 RSP:
      ffff8801bc18f0f0
      
      As stated by RFC 7112 section 5:
      
         When a host fragments an IPv6 datagram, it MUST include the entire
         IPv6 Header Chain in the First Fragment.
      
      So this patch addresses the issue dropping datagrams with excessive
      extheader length. It also updates the error path to report to the
      calling socket nonnegative pmtu values.
      
      The issue apparently predates git history.
      
      v1 -> v2: cleanup error path, as per Eric's suggestion
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: syzbot+91e6f9932ff122fa4410@syzkaller.appspotmail.com
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10b8a3de
    • A
      netlink: make sure nladdr has correct size in netlink_connect() · 78802879
      Alexander Potapenko 提交于
      KMSAN reports use of uninitialized memory in the case when |alen| is
      smaller than sizeof(struct sockaddr_nl), and therefore |nladdr| isn't
      fully copied from the userspace.
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      78802879
    • H
      net/ipv4: disable SMC TCP option with SYN Cookies · bc58a1ba
      Hans Wippel 提交于
      Currently, the SMC experimental TCP option in a SYN packet is lost on
      the server side when SYN Cookies are active. However, the corresponding
      SYNACK sent back to the client contains the SMC option. This causes an
      inconsistent view of the SMC capabilities on the client and server.
      
      This patch disables the SMC option in the SYNACK when SYN Cookies are
      active to avoid this issue.
      
      Fixes: 60e2a778 ("tcp: TCP experimental option for SMC")
      Signed-off-by: NHans Wippel <hwippel@linux.vnet.ibm.com>
      Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc58a1ba
  6. 25 3月, 2018 1 次提交
    • S
      netfilter: nf_socket: Fix out of bounds access in nf_sk_lookup_slow_v{4,6} · 32c1733f
      Subash Abhinov Kasiviswanathan 提交于
      skb_header_pointer will copy data into a buffer if data is non linear,
      otherwise it will return a pointer in the linear section of the data.
      nf_sk_lookup_slow_v{4,6} always copies data of size udphdr but later
      accesses memory within the size of tcphdr (th->doff) in case of TCP
      packets. This causes a crash when running with KASAN with the following
      call stack -
      
      BUG: KASAN: stack-out-of-bounds in xt_socket_lookup_slow_v4+0x524/0x718
      net/netfilter/xt_socket.c:178
      Read of size 2 at addr ffffffe3d417a87c by task syz-executor/28971
      CPU: 2 PID: 28971 Comm: syz-executor Tainted: G    B   W  O    4.9.65+ #1
      Call trace:
      [<ffffff9467e8d390>] dump_backtrace+0x0/0x428 arch/arm64/kernel/traps.c:76
      [<ffffff9467e8d7e0>] show_stack+0x28/0x38 arch/arm64/kernel/traps.c:226
      [<ffffff946842d9b8>] __dump_stack lib/dump_stack.c:15 [inline]
      [<ffffff946842d9b8>] dump_stack+0xd4/0x124 lib/dump_stack.c:51
      [<ffffff946811d4b0>] print_address_description+0x68/0x258 mm/kasan/report.c:248
      [<ffffff946811d8c8>] kasan_report_error mm/kasan/report.c:347 [inline]
      [<ffffff946811d8c8>] kasan_report.part.2+0x228/0x2f0 mm/kasan/report.c:371
      [<ffffff946811df44>] kasan_report+0x5c/0x70 mm/kasan/report.c:372
      [<ffffff946811bebc>] check_memory_region_inline mm/kasan/kasan.c:308 [inline]
      [<ffffff946811bebc>] __asan_load2+0x84/0x98 mm/kasan/kasan.c:739
      [<ffffff94694d6f04>] __tcp_hdrlen include/linux/tcp.h:35 [inline]
      [<ffffff94694d6f04>] xt_socket_lookup_slow_v4+0x524/0x718 net/netfilter/xt_socket.c:178
      
      Fix this by copying data into appropriate size headers based on protocol.
      
      Fixes: a583636a ("inet: refactor inet[6]_lookup functions to take skb")
      Signed-off-by: NTejaswi Tanikella <tejaswit@codeaurora.org>
      Signed-off-by: NSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      32c1733f
  7. 24 3月, 2018 4 次提交
    • L
      batman-adv: fix packet loss for broadcasted DHCP packets to a server · a752c0a4
      Linus Lüssing 提交于
      DHCP connectivity issues can currently occur if the following conditions
      are met:
      
      1) A DHCP packet from a client to a server
      2) This packet has a multicast destination
      3) This destination has a matching entry in the translation table
         (FF:FF:FF:FF:FF:FF for IPv4, 33:33:00:01:00:02/33:33:00:01:00:03
          for IPv6)
      4) The orig-node determined by TT for the multicast destination
         does not match the orig-node determined by best-gateway-selection
      
      In this case the DHCP packet will be dropped.
      
      The "gateway-out-of-range" check is supposed to only be applied to
      unicasted DHCP packets to a specific DHCP server.
      
      In that case dropping the the unicasted frame forces the client to
      retry via a broadcasted one, but now directed to the new best
      gateway.
      
      A DHCP packet with broadcast/multicast destination is already ensured to
      always be delivered to the best gateway. Dropping a multicasted
      DHCP packet here will only prevent completing DHCP as there is no
      other fallback.
      
      So far, it seems the unicast check was implicitly performed by
      expecting the batadv_transtable_search() to return NULL for multicast
      destinations. However, a multicast address could have always ended up in
      the translation table and in fact is now common.
      
      To fix this potential loss of a DHCP client-to-server packet to a
      multicast address this patch adds an explicit multicast destination
      check to reliably bail out of the gateway-out-of-range check for such
      destinations.
      
      The issue and fix were tested in the following three node setup:
      
      - Line topology, A-B-C
      - A: gateway client, DHCP client
      - B: gateway server, hop-penalty increased: 30->60, DHCP server
      - C: gateway server, code modifications to announce FF:FF:FF:FF:FF:FF
      
      Without this patch, A would never transmit its DHCP Discover packet
      due to an always "out-of-range" condition. With this patch,
      a full DHCP handshake between A and B was possible again.
      
      Fixes: be7af5cf ("batman-adv: refactoring gateway handling code")
      Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NSimon Wunderlich <sw@simonwunderlich.de>
      a752c0a4
    • L
      batman-adv: fix multicast-via-unicast transmission with AP isolation · f8fb3419
      Linus Lüssing 提交于
      For multicast frames AP isolation is only supposed to be checked on
      the receiving nodes and never on the originating one.
      
      Furthermore, the isolation or wifi flag bits should only be intepreted
      as such for unicast and never multicast TT entries.
      
      By injecting flags to the multicast TT entry claimed by a single
      target node it was verified in tests that this multicast address
      becomes unreachable, leading to packet loss.
      
      Omitting the "src" parameter to the batadv_transtable_search() call
      successfully skipped the AP isolation check and made the target
      reachable again.
      
      Fixes: 1d8ab8d3 ("batman-adv: Modified forwarding behaviour for multicast packets")
      Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NSimon Wunderlich <sw@simonwunderlich.de>
      f8fb3419
    • E
      ipv6: fix possible deadlock in rt6_age_examine_exception() · 1bfa26ff
      Eric Dumazet 提交于
      syzbot reported a LOCKDEP splat [1] in rt6_age_examine_exception()
      
      rt6_age_examine_exception() is called while rt6_exception_lock is held.
      This lock is the lower one in the lock hierarchy, thus we can not
      call dst_neigh_lookup() function, as it can fallback to neigh_create()
      
      We should instead do a pure RCU lookup. As a bonus we avoid
      a pair of atomic operations on neigh refcount.
      
      [1]
      
      WARNING: possible circular locking dependency detected
      4.16.0-rc4+ #277 Not tainted
      
      syz-executor7/4015 is trying to acquire lock:
       (&ndev->lock){++--}, at: [<00000000416dce19>] __ipv6_dev_mc_dec+0x45/0x350 net/ipv6/mcast.c:928
      
      but task is already holding lock:
       (&tbl->lock){++-.}, at: [<00000000b5cb1d65>] neigh_ifdown+0x3d/0x250 net/core/neighbour.c:292
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #3 (&tbl->lock){++-.}:
             __raw_write_lock_bh include/linux/rwlock_api_smp.h:203 [inline]
             _raw_write_lock_bh+0x31/0x40 kernel/locking/spinlock.c:312
             __neigh_create+0x87e/0x1d90 net/core/neighbour.c:528
             neigh_create include/net/neighbour.h:315 [inline]
             ip6_neigh_lookup+0x9a7/0xba0 net/ipv6/route.c:228
             dst_neigh_lookup include/net/dst.h:405 [inline]
             rt6_age_examine_exception net/ipv6/route.c:1609 [inline]
             rt6_age_exceptions+0x381/0x660 net/ipv6/route.c:1645
             fib6_age+0xfb/0x140 net/ipv6/ip6_fib.c:2033
             fib6_clean_node+0x389/0x580 net/ipv6/ip6_fib.c:1919
             fib6_walk_continue+0x46c/0x8a0 net/ipv6/ip6_fib.c:1845
             fib6_walk+0x91/0xf0 net/ipv6/ip6_fib.c:1893
             fib6_clean_tree+0x1e6/0x340 net/ipv6/ip6_fib.c:1970
             __fib6_clean_all+0x1f4/0x3a0 net/ipv6/ip6_fib.c:1986
             fib6_clean_all net/ipv6/ip6_fib.c:1997 [inline]
             fib6_run_gc+0x16b/0x3c0 net/ipv6/ip6_fib.c:2053
             ndisc_netdev_event+0x3c2/0x4a0 net/ipv6/ndisc.c:1781
             notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
             __raw_notifier_call_chain kernel/notifier.c:394 [inline]
             raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
             call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
             call_netdevice_notifiers net/core/dev.c:1725 [inline]
             __dev_notify_flags+0x262/0x430 net/core/dev.c:6960
             dev_change_flags+0xf5/0x140 net/core/dev.c:6994
             devinet_ioctl+0x126a/0x1ac0 net/ipv4/devinet.c:1080
             inet_ioctl+0x184/0x310 net/ipv4/af_inet.c:919
             sock_do_ioctl+0xef/0x390 net/socket.c:957
             sock_ioctl+0x36b/0x610 net/socket.c:1081
             vfs_ioctl fs/ioctl.c:46 [inline]
             do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
             SYSC_ioctl fs/ioctl.c:701 [inline]
             SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
             do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
             entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      -> #2 (rt6_exception_lock){+.-.}:
             __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
             _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168
             spin_lock_bh include/linux/spinlock.h:315 [inline]
             rt6_flush_exceptions+0x21/0x210 net/ipv6/route.c:1367
             fib6_del_route net/ipv6/ip6_fib.c:1677 [inline]
             fib6_del+0x624/0x12c0 net/ipv6/ip6_fib.c:1761
             __ip6_del_rt+0xc7/0x120 net/ipv6/route.c:2980
             ip6_del_rt+0x132/0x1a0 net/ipv6/route.c:2993
             __ipv6_dev_ac_dec+0x3b1/0x600 net/ipv6/anycast.c:332
             ipv6_dev_ac_dec net/ipv6/anycast.c:345 [inline]
             ipv6_sock_ac_close+0x2b4/0x3e0 net/ipv6/anycast.c:200
             inet6_release+0x48/0x70 net/ipv6/af_inet6.c:433
             sock_release+0x8d/0x1e0 net/socket.c:594
             sock_close+0x16/0x20 net/socket.c:1149
             __fput+0x327/0x7e0 fs/file_table.c:209
             ____fput+0x15/0x20 fs/file_table.c:243
             task_work_run+0x199/0x270 kernel/task_work.c:113
             exit_task_work include/linux/task_work.h:22 [inline]
             do_exit+0x9bb/0x1ad0 kernel/exit.c:865
             do_group_exit+0x149/0x400 kernel/exit.c:968
             get_signal+0x73a/0x16d0 kernel/signal.c:2469
             do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
             exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
             prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
             syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
             do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
             entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      -> #1 (&(&tb->tb6_lock)->rlock){+.-.}:
             __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
             _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168
             spin_lock_bh include/linux/spinlock.h:315 [inline]
             __ip6_ins_rt+0x56/0x90 net/ipv6/route.c:1007
             ip6_route_add+0x141/0x190 net/ipv6/route.c:2955
             addrconf_prefix_route+0x44f/0x620 net/ipv6/addrconf.c:2359
             fixup_permanent_addr net/ipv6/addrconf.c:3368 [inline]
             addrconf_permanent_addr net/ipv6/addrconf.c:3391 [inline]
             addrconf_notify+0x1ad2/0x2310 net/ipv6/addrconf.c:3460
             notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
             __raw_notifier_call_chain kernel/notifier.c:394 [inline]
             raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
             call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
             call_netdevice_notifiers net/core/dev.c:1725 [inline]
             __dev_notify_flags+0x15d/0x430 net/core/dev.c:6958
             dev_change_flags+0xf5/0x140 net/core/dev.c:6994
             do_setlink+0xa22/0x3bb0 net/core/rtnetlink.c:2357
             rtnl_newlink+0xf37/0x1a50 net/core/rtnetlink.c:2965
             rtnetlink_rcv_msg+0x57f/0xb10 net/core/rtnetlink.c:4641
             netlink_rcv_skb+0x14b/0x380 net/netlink/af_netlink.c:2444
             rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4659
             netlink_unicast_kernel net/netlink/af_netlink.c:1308 [inline]
             netlink_unicast+0x4c4/0x6b0 net/netlink/af_netlink.c:1334
             netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1897
             sock_sendmsg_nosec net/socket.c:629 [inline]
             sock_sendmsg+0xca/0x110 net/socket.c:639
             ___sys_sendmsg+0x767/0x8b0 net/socket.c:2047
             __sys_sendmsg+0xe5/0x210 net/socket.c:2081
             SYSC_sendmsg net/socket.c:2092 [inline]
             SyS_sendmsg+0x2d/0x50 net/socket.c:2088
             do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
             entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      -> #0 (&ndev->lock){++--}:
             lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920
             __raw_write_lock_bh include/linux/rwlock_api_smp.h:203 [inline]
             _raw_write_lock_bh+0x31/0x40 kernel/locking/spinlock.c:312
             __ipv6_dev_mc_dec+0x45/0x350 net/ipv6/mcast.c:928
             ipv6_dev_mc_dec+0x110/0x1f0 net/ipv6/mcast.c:961
             pndisc_destructor+0x21a/0x340 net/ipv6/ndisc.c:392
             pneigh_ifdown net/core/neighbour.c:695 [inline]
             neigh_ifdown+0x149/0x250 net/core/neighbour.c:294
             rt6_disable_ip+0x537/0x700 net/ipv6/route.c:3874
             addrconf_ifdown+0x14b/0x14f0 net/ipv6/addrconf.c:3633
             addrconf_notify+0x5f8/0x2310 net/ipv6/addrconf.c:3557
             notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
             __raw_notifier_call_chain kernel/notifier.c:394 [inline]
             raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
             call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
             call_netdevice_notifiers net/core/dev.c:1725 [inline]
             __dev_notify_flags+0x262/0x430 net/core/dev.c:6960
             dev_change_flags+0xf5/0x140 net/core/dev.c:6994
             devinet_ioctl+0x126a/0x1ac0 net/ipv4/devinet.c:1080
             inet_ioctl+0x184/0x310 net/ipv4/af_inet.c:919
             packet_ioctl+0x1ff/0x310 net/packet/af_packet.c:4066
             sock_do_ioctl+0xef/0x390 net/socket.c:957
             sock_ioctl+0x36b/0x610 net/socket.c:1081
             vfs_ioctl fs/ioctl.c:46 [inline]
             do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
             SYSC_ioctl fs/ioctl.c:701 [inline]
             SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
             do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
             entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      other info that might help us debug this:
      
      Chain exists of:
        &ndev->lock --> rt6_exception_lock --> &tbl->lock
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&tbl->lock);
                                     lock(rt6_exception_lock);
                                     lock(&tbl->lock);
        lock(&ndev->lock);
      
       *** DEADLOCK ***
      
      2 locks held by syz-executor7/4015:
       #0:  (rtnl_mutex){+.+.}, at: [<00000000a2f16daa>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74
       #1:  (&tbl->lock){++-.}, at: [<00000000b5cb1d65>] neigh_ifdown+0x3d/0x250 net/core/neighbour.c:292
      
      stack backtrace:
      CPU: 0 PID: 4015 Comm: syz-executor7 Not tainted 4.16.0-rc4+ #277
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x194/0x24d lib/dump_stack.c:53
       print_circular_bug.isra.38+0x2cd/0x2dc kernel/locking/lockdep.c:1223
       check_prev_add kernel/locking/lockdep.c:1863 [inline]
       check_prevs_add kernel/locking/lockdep.c:1976 [inline]
       validate_chain kernel/locking/lockdep.c:2417 [inline]
       __lock_acquire+0x30a8/0x3e00 kernel/locking/lockdep.c:3431
       lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920
       __raw_write_lock_bh include/linux/rwlock_api_smp.h:203 [inline]
       _raw_write_lock_bh+0x31/0x40 kernel/locking/spinlock.c:312
       __ipv6_dev_mc_dec+0x45/0x350 net/ipv6/mcast.c:928
       ipv6_dev_mc_dec+0x110/0x1f0 net/ipv6/mcast.c:961
       pndisc_destructor+0x21a/0x340 net/ipv6/ndisc.c:392
       pneigh_ifdown net/core/neighbour.c:695 [inline]
       neigh_ifdown+0x149/0x250 net/core/neighbour.c:294
       rt6_disable_ip+0x537/0x700 net/ipv6/route.c:3874
       addrconf_ifdown+0x14b/0x14f0 net/ipv6/addrconf.c:3633
       addrconf_notify+0x5f8/0x2310 net/ipv6/addrconf.c:3557
       notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394 [inline]
       raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
       call_netdevice_notifiers net/core/dev.c:1725 [inline]
       __dev_notify_flags+0x262/0x430 net/core/dev.c:6960
       dev_change_flags+0xf5/0x140 net/core/dev.c:6994
       devinet_ioctl+0x126a/0x1ac0 net/ipv4/devinet.c:1080
       inet_ioctl+0x184/0x310 net/ipv4/af_inet.c:919
       packet_ioctl+0x1ff/0x310 net/packet/af_packet.c:4066
       sock_do_ioctl+0xef/0x390 net/socket.c:957
       sock_ioctl+0x36b/0x610 net/socket.c:1081
       vfs_ioctl fs/ioctl.c:46 [inline]
       do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
       SYSC_ioctl fs/ioctl.c:701 [inline]
       SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
       do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      Fixes: c757faa8 ("ipv6: prepare fib6_age() for exception table")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Wei Wang <weiwan@google.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Acked-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1bfa26ff
    • P
      ip_tunnel: Emit events for post-register MTU changes · f6cc9c05
      Petr Machata 提交于
      For tunnels created with IFLA_MTU, MTU of the netdevice is set by
      rtnl_create_link() (called from rtnl_newlink()) before the device is
      registered. However without IFLA_MTU that's not done.
      
      rtnl_newlink() proceeds by calling struct rtnl_link_ops.newlink, which
      via ip_tunnel_newlink() calls register_netdevice(), and that emits
      NETDEV_REGISTER. Thus any listeners that inspect the netdevice get the
      MTU of 0.
      
      After ip_tunnel_newlink() corrects the MTU after registering the
      netdevice, but since there's no event, the listeners don't get to know
      about the MTU until something else happens--such as a NETDEV_UP event.
      That's not ideal.
      
      So instead of setting the MTU directly, go through dev_set_mtu(), which
      takes care of distributing the necessary NETDEV_PRECHANGEMTU and
      NETDEV_CHANGEMTU events.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f6cc9c05
  8. 23 3月, 2018 4 次提交
    • S
      xfrm: Fix transport mode skb control buffer usage. · 9a3fb9fb
      Steffen Klassert 提交于
      A recent commit introduced a new struct xfrm_trans_cb
      that is used with the sk_buff control buffer. Unfortunately
      it placed the structure in front of the control buffer and
      overlooked that the IPv4/IPv6 control buffer is still needed
      for some layer 4 protocols. As a result the IPv4/IPv6 control
      buffer is overwritten with this structure. Fix this by setting
      a apropriate header in front of the structure.
      
      Fixes acf568ee ("xfrm: Reinject transport-mode packets ...")
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      9a3fb9fb
    • D
      net/ipv6: Handle onlink flag with multipath routes · 68e2ffde
      David Ahern 提交于
      For multipath routes the ONLINK flag can be specified per nexthop in
      rtnh_flags or globally in rtm_flags. Update ip6_route_multipath_add
      to consider the ONLINK setting coming from rtnh_flags. Each loop over
      nexthops the config for the sibling route is initialized to the global
      config and then per nexthop settings overlayed. The flag is 'or'ed into
      fib6_config to handle the ONLINK flag coming from either rtm_flags or
      rtnh_flags.
      
      Fixes: fc1e64e1 ("net/ipv6: Add support for onlink flag")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68e2ffde
    • D
      ipv6: sr: fix NULL pointer dereference when setting encap source address · 8936ef76
      David Lebrun 提交于
      When using seg6 in encap mode, we call ipv6_dev_get_saddr() to set the
      source address of the outer IPv6 header, in case none was specified.
      Using skb->dev can lead to BUG() when it is in an inconsistent state.
      This patch uses the net_device attached to the skb's dst instead.
      
      [940807.667429] BUG: unable to handle kernel NULL pointer dereference at 000000000000047c
      [940807.762427] IP: ipv6_dev_get_saddr+0x8b/0x1d0
      [940807.815725] PGD 0 P4D 0
      [940807.847173] Oops: 0000 [#1] SMP PTI
      [940807.890073] Modules linked in:
      [940807.927765] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G        W        4.16.0-rc1-seg6bpf+ #2
      [940808.028988] Hardware name: HP ProLiant DL120 G6/ProLiant DL120 G6, BIOS O26    09/06/2010
      [940808.128128] RIP: 0010:ipv6_dev_get_saddr+0x8b/0x1d0
      [940808.187667] RSP: 0018:ffff88043fd836b0 EFLAGS: 00010206
      [940808.251366] RAX: 0000000000000005 RBX: ffff88042cb1c860 RCX: 00000000000000fe
      [940808.338025] RDX: 00000000000002c0 RSI: ffff88042cb1c860 RDI: 0000000000004500
      [940808.424683] RBP: ffff88043fd83740 R08: 0000000000000000 R09: ffffffffffffffff
      [940808.511342] R10: 0000000000000040 R11: 0000000000000000 R12: ffff88042cb1c850
      [940808.598012] R13: ffffffff8208e380 R14: ffff88042ac8da00 R15: 0000000000000002
      [940808.684675] FS:  0000000000000000(0000) GS:ffff88043fd80000(0000) knlGS:0000000000000000
      [940808.783036] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [940808.852975] CR2: 000000000000047c CR3: 00000004255fe000 CR4: 00000000000006e0
      [940808.939634] Call Trace:
      [940808.970041]  <IRQ>
      [940808.995250]  ? ip6t_do_table+0x265/0x640
      [940809.043341]  seg6_do_srh_encap+0x28f/0x300
      [940809.093516]  ? seg6_do_srh+0x1a0/0x210
      [940809.139528]  seg6_do_srh+0x1a0/0x210
      [940809.183462]  seg6_output+0x28/0x1e0
      [940809.226358]  lwtunnel_output+0x3f/0x70
      [940809.272370]  ip6_xmit+0x2b8/0x530
      [940809.313185]  ? ac6_proc_exit+0x20/0x20
      [940809.359197]  inet6_csk_xmit+0x7d/0xc0
      [940809.404173]  tcp_transmit_skb+0x548/0x9a0
      [940809.453304]  __tcp_retransmit_skb+0x1a8/0x7a0
      [940809.506603]  ? ip6_default_advmss+0x40/0x40
      [940809.557824]  ? tcp_current_mss+0x24/0x90
      [940809.605925]  tcp_retransmit_skb+0xd/0x80
      [940809.654016]  tcp_xmit_retransmit_queue.part.17+0xf9/0x210
      [940809.719797]  tcp_ack+0xa47/0x1110
      [940809.760612]  tcp_rcv_established+0x13c/0x570
      [940809.812865]  tcp_v6_do_rcv+0x151/0x3d0
      [940809.858879]  tcp_v6_rcv+0xa5c/0xb10
      [940809.901770]  ? seg6_output+0xdd/0x1e0
      [940809.946745]  ip6_input_finish+0xbb/0x460
      [940809.994837]  ip6_input+0x74/0x80
      [940810.034612]  ? ip6_rcv_finish+0xb0/0xb0
      [940810.081663]  ipv6_rcv+0x31c/0x4c0
      ...
      
      Fixes: 6c8702c6 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
      Reported-by: NTom Herbert <tom@quantonium.net>
      Signed-off-by: NDavid Lebrun <dlebrun@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8936ef76
    • D
      ipv6: sr: fix scheduling in RCU when creating seg6 lwtunnel state · 191f86ca
      David Lebrun 提交于
      The seg6_build_state() function is called with RCU read lock held,
      so we cannot use GFP_KERNEL. This patch uses GFP_ATOMIC instead.
      
      [   92.770271] =============================
      [   92.770628] WARNING: suspicious RCU usage
      [   92.770921] 4.16.0-rc4+ #12 Not tainted
      [   92.771277] -----------------------------
      [   92.771585] ./include/linux/rcupdate.h:302 Illegal context switch in RCU read-side critical section!
      [   92.772279]
      [   92.772279] other info that might help us debug this:
      [   92.772279]
      [   92.773067]
      [   92.773067] rcu_scheduler_active = 2, debug_locks = 1
      [   92.773514] 2 locks held by ip/2413:
      [   92.773765]  #0:  (rtnl_mutex){+.+.}, at: [<00000000e5461720>] rtnetlink_rcv_msg+0x441/0x4d0
      [   92.774377]  #1:  (rcu_read_lock){....}, at: [<00000000df4f161e>] lwtunnel_build_state+0x59/0x210
      [   92.775065]
      [   92.775065] stack backtrace:
      [   92.775371] CPU: 0 PID: 2413 Comm: ip Not tainted 4.16.0-rc4+ #12
      [   92.775791] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc27 04/01/2014
      [   92.776608] Call Trace:
      [   92.776852]  dump_stack+0x7d/0xbc
      [   92.777130]  __schedule+0x133/0xf00
      [   92.777393]  ? unwind_get_return_address_ptr+0x50/0x50
      [   92.777783]  ? __sched_text_start+0x8/0x8
      [   92.778073]  ? rcu_is_watching+0x19/0x30
      [   92.778383]  ? kernel_text_address+0x49/0x60
      [   92.778800]  ? __kernel_text_address+0x9/0x30
      [   92.779241]  ? unwind_get_return_address+0x29/0x40
      [   92.779727]  ? pcpu_alloc+0x102/0x8f0
      [   92.780101]  _cond_resched+0x23/0x50
      [   92.780459]  __mutex_lock+0xbd/0xad0
      [   92.780818]  ? pcpu_alloc+0x102/0x8f0
      [   92.781194]  ? seg6_build_state+0x11d/0x240
      [   92.781611]  ? save_stack+0x9b/0xb0
      [   92.781965]  ? __ww_mutex_wakeup_for_backoff+0xf0/0xf0
      [   92.782480]  ? seg6_build_state+0x11d/0x240
      [   92.782925]  ? lwtunnel_build_state+0x1bd/0x210
      [   92.783393]  ? ip6_route_info_create+0x687/0x1640
      [   92.783846]  ? ip6_route_add+0x74/0x110
      [   92.784236]  ? inet6_rtm_newroute+0x8a/0xd0
      
      Fixes: 6c8702c6 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
      Signed-off-by: NDavid Lebrun <dlebrun@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      191f86ca
  9. 22 3月, 2018 10 次提交
    • P
      netfilter: nf_tables: do not hold reference on netdevice from preparation phase · 90d2723c
      Pablo Neira Ayuso 提交于
      The netfilter netdevice event handler hold the nfnl_lock mutex, this
      avoids races with a device going away while such device is being
      attached to hooks from the netlink control plane. Therefore, either
      control plane bails out with ENOENT or netdevice event path waits until
      the hook that is attached to net_device is registered.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      90d2723c
    • P
      netfilter: nf_tables: cache device name in flowtable object · d92191aa
      Pablo Neira Ayuso 提交于
      Devices going away have to grab the nfnl_lock from the netdev event path
      to avoid races with control plane updates.
      
      However, netlink dumps in netfilter do not hold nfnl_lock mutex. Cache
      the device name into the objects to avoid an use-after-free situation
      for a device that is going away.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      d92191aa
    • P
      netfilter: drop template ct when conntrack is skipped. · aebfa52a
      Paolo Abeni 提交于
      The ipv4 nf_ct code currently skips the nf_conntrak_in() call
      for fragmented packets. As a results later matches/target can end
      up manipulating template ct entry instead of 'real' ones.
      
      Exploiting the above, syzbot found a way to trigger the following
      splat:
      
      WARNING: CPU: 1 PID: 4242 at net/netfilter/xt_cluster.c:55
      xt_cluster_mt+0x6c1/0x840 net/netfilter/xt_cluster.c:127
      Kernel panic - not syncing: panic_on_warn set ...
      
      CPU: 1 PID: 4242 Comm: syzkaller027971 Not tainted 4.16.0-rc2+ #243
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:17 [inline]
        dump_stack+0x194/0x24d lib/dump_stack.c:53
        panic+0x1e4/0x41c kernel/panic.c:183
        __warn+0x1dc/0x200 kernel/panic.c:547
        report_bug+0x211/0x2d0 lib/bug.c:184
        fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
        fixup_bug arch/x86/kernel/traps.c:247 [inline]
        do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
        do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
        invalid_op+0x58/0x80 arch/x86/entry/entry_64.S:957
      RIP: 0010:xt_cluster_hash net/netfilter/xt_cluster.c:55 [inline]
      RIP: 0010:xt_cluster_mt+0x6c1/0x840 net/netfilter/xt_cluster.c:127
      RSP: 0018:ffff8801d2f6f2d0 EFLAGS: 00010293
      RAX: ffff8801af700540 RBX: 0000000000000000 RCX: ffffffff84a2d1e1
      RDX: 0000000000000000 RSI: ffff8801d2f6f478 RDI: ffff8801cafd336a
      RBP: ffff8801d2f6f2e8 R08: 0000000000000000 R09: 0000000000000001
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801b03b3d18
      R13: ffff8801cafd3300 R14: dffffc0000000000 R15: ffff8801d2f6f478
        ipt_do_table+0xa91/0x19b0 net/ipv4/netfilter/ip_tables.c:296
        iptable_filter_hook+0x65/0x80 net/ipv4/netfilter/iptable_filter.c:41
        nf_hook_entry_hookfn include/linux/netfilter.h:120 [inline]
        nf_hook_slow+0xba/0x1a0 net/netfilter/core.c:483
        nf_hook include/linux/netfilter.h:243 [inline]
        NF_HOOK include/linux/netfilter.h:286 [inline]
        raw_send_hdrinc.isra.17+0xf39/0x1880 net/ipv4/raw.c:432
        raw_sendmsg+0x14cd/0x26b0 net/ipv4/raw.c:669
        inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763
        sock_sendmsg_nosec net/socket.c:629 [inline]
        sock_sendmsg+0xca/0x110 net/socket.c:639
        SYSC_sendto+0x361/0x5c0 net/socket.c:1748
        SyS_sendto+0x40/0x50 net/socket.c:1716
        do_syscall_64+0x280/0x940 arch/x86/entry/common.c:287
        entry_SYSCALL_64_after_hwframe+0x42/0xb7
      RIP: 0033:0x441b49
      RSP: 002b:00007ffff5ca8b18 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000441b49
      RDX: 0000000000000030 RSI: 0000000020ff7000 RDI: 0000000000000003
      RBP: 00000000006cc018 R08: 000000002066354c R09: 0000000000000010
      R10: 0000000000000000 R11: 0000000000000216 R12: 0000000000403470
      R13: 0000000000403500 R14: 0000000000000000 R15: 0000000000000000
      Dumping ftrace buffer:
          (ftrace buffer empty)
      Kernel Offset: disabled
      Rebooting in 86400 seconds..
      
      Instead of adding checks for template ct on every target/match
      manipulating skb->_nfct, simply drop the template ct when skipping
      nf_conntrack_in().
      
      Fixes: 7b4fdf77 ("netfilter: don't track fragmented packets")
      Reported-and-tested-by: syzbot+0346441ae0545cfcea3a@syzkaller.appspotmail.com
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      aebfa52a
    • D
      net/sched: fix idr leak in the error path of tcf_skbmod_init() · f29cdfbe
      Davide Caratti 提交于
      tcf_skbmod_init() can fail after the idr has been successfully reserved.
      When this happens, every subsequent attempt to configure skbmod rules
      using the same idr value will systematically fail with -ENOSPC, unless
      the first attempt was done using the 'replace' keyword:
      
       # tc action add action skbmod swap mac index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc action add action skbmod swap mac index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       # tc action add action skbmod swap mac index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       ...
      
      Fix this in tcf_skbmod_init(), ensuring that tcf_idr_release() is called
      on the error path when the idr has been reserved, but not yet inserted.
      Also, don't test 'ovr' in the error path, to avoid a 'replace' failure
      implicitly become a 'delete' that leaks refcount in act_skbmod module:
      
       # rmmod act_skbmod; modprobe act_skbmod
       # tc action add action skbmod swap mac index 100
       # tc action add action skbmod swap mac continue index 100
       RTNETLINK answers: File exists
       We have an error talking to the kernel
       # tc action replace action skbmod swap mac continue index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc action list action skbmod
       #
       # rmmod  act_skbmod
       rmmod: ERROR: Module act_skbmod is in use
      
      Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f29cdfbe
    • D
      net/sched: fix idr leak in the error path of tcf_vlan_init() · d7f20015
      Davide Caratti 提交于
      tcf_vlan_init() can fail after the idr has been successfully reserved.
      When this happens, every subsequent attempt to configure vlan rules using
      the same idr value will systematically fail with -ENOSPC, unless the first
      attempt was done using the 'replace' keyword.
      
       # tc action add action vlan pop index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc action add action vlan pop index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       # tc action add action vlan pop index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       ...
      
      Fix this in tcf_vlan_init(), ensuring that tcf_idr_release() is called on
      the error path when the idr has been reserved, but not yet inserted. Also,
      don't test 'ovr' in the error path, to avoid a 'replace' failure implicitly
      become a 'delete' that leaks refcount in act_vlan module:
      
       # rmmod act_vlan; modprobe act_vlan
       # tc action add action vlan push id 5 index 100
       # tc action replace action vlan push id 7 index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc action list action vlan
       #
       # rmmod act_vlan
       rmmod: ERROR: Module act_vlan is in use
      
      Fixes: 4c5b9d96 ("act_vlan: VLAN action rewrite to use RCU lock/unlock and update")
      Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7f20015
    • D
      net/sched: fix idr leak in the error path of __tcf_ipt_init() · 1e46ef17
      Davide Caratti 提交于
      __tcf_ipt_init() can fail after the idr has been successfully reserved.
      When this happens, subsequent attempts to configure xt/ipt rules using
      the same idr value systematically fail with -ENOSPC:
      
       # tc action add action xt -j LOG --log-prefix test1 index 100
       tablename: mangle hook: NF_IP_POST_ROUTING
               target:  LOG level warning prefix "test1" index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       Command "(null)" is unknown, try "tc actions help".
       # tc action add action xt -j LOG --log-prefix test1 index 100
       tablename: mangle hook: NF_IP_POST_ROUTING
               target:  LOG level warning prefix "test1" index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       Command "(null)" is unknown, try "tc actions help".
       # tc action add action xt -j LOG --log-prefix test1 index 100
       tablename: mangle hook: NF_IP_POST_ROUTING
               target:  LOG level warning prefix "test1" index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       ...
      
      Fix this in the error path of __tcf_ipt_init(), calling tcf_idr_release()
      in place of tcf_idr_cleanup(). Since tcf_ipt_release() can now be called
      when tcfi_t is NULL, we also need to protect calls to ipt_destroy_target()
      to avoid NULL pointer dereference.
      
      Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e46ef17
    • D
      net/sched: fix idr leak in the error path of tcp_pedit_init() · 94fa3f92
      Davide Caratti 提交于
      tcf_pedit_init() can fail to allocate 'keys' after the idr has been
      successfully reserved. When this happens, subsequent attempts to configure
      a pedit rule using the same idr value systematically fail with -ENOSPC:
      
       # tc action add action pedit munge ip ttl set 63 index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc action add action pedit munge ip ttl set 63 index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       # tc action add action pedit munge ip ttl set 63 index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       ...
      
      Fix this in the error path of tcf_act_pedit_init(), calling
      tcf_idr_release() in place of tcf_idr_cleanup().
      
      Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      94fa3f92
    • D
      net/sched: fix idr leak in the error path of tcf_act_police_init() · 5bf7f818
      Davide Caratti 提交于
      tcf_act_police_init() can fail after the idr has been successfully
      reserved (e.g., qdisc_get_rtab() may return NULL). When this happens,
      subsequent attempts to configure a police rule using the same idr value
      systematiclly fail with -ENOSPC:
      
       # tc action add action police rate 1000 burst 1000 drop index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc action add action police rate 1000 burst 1000 drop index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       # tc action add action police rate 1000 burst 1000 drop index 100
       RTNETLINK answers: No space left on device
       ...
      
      Fix this in the error path of tcf_act_police_init(), calling
      tcf_idr_release() in place of tcf_idr_cleanup().
      
      Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5bf7f818
    • D
      net/sched: fix idr leak in the error path of tcf_simp_init() · 60e10b3a
      Davide Caratti 提交于
      if the kernel fails to duplicate 'sdata', creation of a new action fails
      with -ENOMEM. However, subsequent attempts to install the same action
      using the same value of 'index' systematically fail with -ENOSPC, and
      that value of 'index' will no more be usable by act_simple, until rmmod /
      insmod of act_simple.ko is done:
      
       # tc actions add action simple sdata hello index 100
       # tc actions list action simple
      
              action order 0: Simple <hello>
               index 100 ref 1 bind 0
       # tc actions flush action simple
       # tc actions add action simple sdata hello index 100
       RTNETLINK answers: Cannot allocate memory
       We have an error talking to the kernel
       # tc actions flush action simple
       # tc actions add action simple sdata hello index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       # tc actions add action simple sdata hello index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       ...
      
      Fix this in the error path of tcf_simp_init(), calling tcf_idr_release()
      in place of tcf_idr_cleanup().
      
      Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
      Suggested-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60e10b3a
    • D
      net/sched: fix idr leak on the error path of tcf_bpf_init() · bbc09e78
      Davide Caratti 提交于
      when the following command sequence is entered
      
       # tc action add action bpf bytecode '4,40 0 0 12,31 0 1 2048,6 0 0 262144,6 0 0 0' index 100
       RTNETLINK answers: Invalid argument
       We have an error talking to the kernel
       # tc action add action bpf bytecode '4,40 0 0 12,21 0 1 2048,6 0 0 262144,6 0 0 0' index 100
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
      
      act_bpf correctly refuses to install the first TC rule, because 31 is not
      a valid instruction. However, it refuses to install the second TC rule,
      even if the BPF code is correct. Furthermore, it's no more possible to
      install any other rule having the same value of 'index' until act_bpf
      module is unloaded/inserted again. After the idr has been reserved, call
      tcf_idr_release() instead of tcf_idr_cleanup(), to fix this issue.
      
      Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bbc09e78
  10. 21 3月, 2018 2 次提交
  11. 20 3月, 2018 2 次提交