1. 08 1月, 2021 5 次提交
    • I
      nexthop: Unlink nexthop group entry in error path · 7b01e53e
      Ido Schimmel 提交于
      In case of error, remove the nexthop group entry from the list to which
      it was previously added.
      
      Fixes: 430a0491 ("nexthop: Add support for nexthop groups")
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NPetr Machata <petrm@nvidia.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      7b01e53e
    • I
      nexthop: Fix off-by-one error in error path · 07e61a97
      Ido Schimmel 提交于
      A reference was not taken for the current nexthop entry, so do not try
      to put it in the error path.
      
      Fixes: 430a0491 ("nexthop: Add support for nexthop groups")
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NPetr Machata <petrm@nvidia.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      07e61a97
    • F
      net: ip: always refragment ip defragmented packets · bb4cc1a1
      Florian Westphal 提交于
      Conntrack reassembly records the largest fragment size seen in IPCB.
      However, when this gets forwarded/transmitted, fragmentation will only
      be forced if one of the fragmented packets had the DF bit set.
      
      In that case, a flag in IPCB will force fragmentation even if the
      MTU is large enough.
      
      This should work fine, but this breaks with ip tunnels.
      Consider client that sends a UDP datagram of size X to another host.
      
      The client fragments the datagram, so two packets, of size y and z, are
      sent. DF bit is not set on any of these packets.
      
      Middlebox netfilter reassembles those packets back to single size-X
      packet, before routing decision.
      
      packet-size-vs-mtu checks in ip_forward are irrelevant, because DF bit
      isn't set.  At output time, ip refragmentation is skipped as well
      because x is still smaller than the mtu of the output device.
      
      If ttransmit device is an ip tunnel, the packet size increases to
      x+overhead.
      
      Also, tunnel might be configured to force DF bit on outer header.
      
      In this case, packet will be dropped (exceeds MTU) and an ICMP error is
      generated back to sender.
      
      But sender already respects the announced MTU, all the packets that
      it sent did fit the announced mtu.
      
      Force refragmentation as per original sizes unconditionally so ip tunnel
      will encapsulate the fragments instead.
      
      The only other solution I see is to place ip refragmentation in
      the ip_tunnel code to handle this case.
      
      Fixes: d6b915e2 ("ip_fragment: don't forward defragmented DF packet")
      Reported-by: NChristian Perle <christian.perle@secunet.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      bb4cc1a1
    • F
      net: fix pmtu check in nopmtudisc mode · 50c66167
      Florian Westphal 提交于
      For some reason ip_tunnel insist on setting the DF bit anyway when the
      inner header has the DF bit set, EVEN if the tunnel was configured with
      'nopmtudisc'.
      
      This means that the script added in the previous commit
      cannot be made to work by adding the 'nopmtudisc' flag to the
      ip tunnel configuration. Doing so breaks connectivity even for the
      without-conntrack/netfilter scenario.
      
      When nopmtudisc is set, the tunnel will skip the mtu check, so no
      icmp error is sent to client. Then, because inner header has DF set,
      the outer header gets added with DF bit set as well.
      
      IP stack then sends an error to itself because the packet exceeds
      the device MTU.
      
      Fixes: 23a3647b ("ip_tunnels: Use skb-len to PMTU check.")
      Cc: Stefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      50c66167
    • S
      net: ipv6: fib: flush exceptions when purging route · d8f5c296
      Sean Tranchetti 提交于
      Route removal is handled by two code paths. The main removal path is via
      fib6_del_route() which will handle purging any PMTU exceptions from the
      cache, removing all per-cpu copies of the DST entry used by the route, and
      releasing the fib6_info struct.
      
      The second removal location is during fib6_add_rt2node() during a route
      replacement operation. This path also calls fib6_purge_rt() to handle
      cleaning up the per-cpu copies of the DST entries and releasing the
      fib6_info associated with the older route, but it does not flush any PMTU
      exceptions that the older route had. Since the older route is removed from
      the tree during the replacement, we lose any way of accessing it again.
      
      As these lingering DSTs and the fib6_info struct are holding references to
      the underlying netdevice struct as well, unregistering that device from the
      kernel can never complete.
      
      Fixes: 2b760fcf ("ipv6: hook up exception table to store dst cache")
      Signed-off-by: NSean Tranchetti <stranche@codeaurora.org>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/1609892546-11389-1-git-send-email-stranche@quicinc.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      d8f5c296
  2. 06 1月, 2021 3 次提交
  3. 05 1月, 2021 1 次提交
  4. 29 12月, 2020 11 次提交
    • C
      erspan: fix version 1 check in gre_parse_header() · 085c7c4e
      Cong Wang 提交于
      Both version 0 and version 1 use ETH_P_ERSPAN, but version 0 does not
      have an erspan header. So the check in gre_parse_header() is wrong,
      we have to distinguish version 1 from version 0.
      
      We can just check the gre header length like is_erspan_type1().
      
      Fixes: cb73ee40 ("net: ip_gre: use erspan key field for tunnel lookup")
      Reported-by: syzbot+f583ce3d4ddf9836b27a@syzkaller.appspotmail.com
      Cc: William Tu <u9012063@gmail.com>
      Cc: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: NCong Wang <cong.wang@bytedance.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      085c7c4e
    • R
      net: sched: prevent invalid Scell_log shift count · bd1248f1
      Randy Dunlap 提交于
      Check Scell_log shift size in red_check_params() and modify all callers
      of red_check_params() to pass Scell_log.
      
      This prevents a shift out-of-bounds as detected by UBSAN:
        UBSAN: shift-out-of-bounds in ./include/net/red.h:252:22
        shift exponent 72 is too large for 32-bit type 'int'
      
      Fixes: 8afa10cb ("net_sched: red: Avoid illegal values")
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Reported-by: syzbot+97c5bd9cc81eca63d36e@syzkaller.appspotmail.com
      Cc: Nogah Frankel <nogahf@mellanox.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: netdev@vger.kernel.org
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd1248f1
    • W
      net: neighbor: fix a crash caused by mod zero · a533b70a
      weichenchen 提交于
      pneigh_enqueue() tries to obtain a random delay by mod
      NEIGH_VAR(p, PROXY_DELAY). However, NEIGH_VAR(p, PROXY_DELAY)
      migth be zero at that point because someone could write zero
      to /proc/sys/net/ipv4/neigh/[device]/proxy_delay after the
      callers check it.
      
      This patch uses prandom_u32_max() to get a random delay instead
      which avoids potential division by zero.
      Signed-off-by: Nweichenchen <weichen.chen@linux.alibaba.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a533b70a
    • G
      ipv4: Ignore ECN bits for fib lookups in fib_compute_spec_dst() · 21fdca22
      Guillaume Nault 提交于
      RT_TOS() only clears one of the ECN bits. Therefore, when
      fib_compute_spec_dst() resorts to a fib lookup, it can return
      different results depending on the value of the second ECN bit.
      
      For example, ECT(0) and ECT(1) packets could be treated differently.
      
        $ ip netns add ns0
        $ ip netns add ns1
        $ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
        $ ip -netns ns0 link set dev lo up
        $ ip -netns ns1 link set dev lo up
        $ ip -netns ns0 link set dev veth01 up
        $ ip -netns ns1 link set dev veth10 up
      
        $ ip -netns ns0 address add 192.0.2.10/24 dev veth01
        $ ip -netns ns1 address add 192.0.2.11/24 dev veth10
      
        $ ip -netns ns1 address add 192.0.2.21/32 dev lo
        $ ip -netns ns1 route add 192.0.2.10/32 tos 4 dev veth10 src 192.0.2.21
        $ ip netns exec ns1 sysctl -wq net.ipv4.icmp_echo_ignore_broadcasts=0
      
      With TOS 4 and ECT(1), ns1 replies using source address 192.0.2.21
      (ping uses -Q to set all TOS and ECN bits):
      
        $ ip netns exec ns0 ping -c 1 -b -Q 5 192.0.2.255
        [...]
        64 bytes from 192.0.2.21: icmp_seq=1 ttl=64 time=0.544 ms
      
      But with TOS 4 and ECT(0), ns1 replies using source address 192.0.2.11
      because the "tos 4" route isn't matched:
      
        $ ip netns exec ns0 ping -c 1 -b -Q 6 192.0.2.255
        [...]
        64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.597 ms
      
      After this patch the ECN bits don't affect the result anymore:
      
        $ ip netns exec ns0 ping -c 1 -b -Q 6 192.0.2.255
        [...]
        64 bytes from 192.0.2.21: icmp_seq=1 ttl=64 time=0.591 ms
      
      Fixes: 35ebf65e ("ipv4: Create and use fib_compute_spec_dst() helper.")
      Signed-off-by: NGuillaume Nault <gnault@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21fdca22
    • D
      net: mptcp: cap forward allocation to 1M · e7579d5d
      Davide Caratti 提交于
      the following syzkaller reproducer:
      
       r0 = socket$inet_mptcp(0x2, 0x1, 0x106)
       bind$inet(r0, &(0x7f0000000080)={0x2, 0x4e24, @multicast2}, 0x10)
       connect$inet(r0, &(0x7f0000000480)={0x2, 0x4e24, @local}, 0x10)
       sendto$inet(r0, &(0x7f0000000100)="f6", 0xffffffe7, 0xc000, 0x0, 0x0)
      
      systematically triggers the following warning:
      
       WARNING: CPU: 2 PID: 8618 at net/core/stream.c:208 sk_stream_kill_queues+0x3fa/0x580
       Modules linked in:
       CPU: 2 PID: 8618 Comm: syz-executor Not tainted 5.10.0+ #334
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/04
       RIP: 0010:sk_stream_kill_queues+0x3fa/0x580
       Code: df 48 c1 ea 03 0f b6 04 02 84 c0 74 04 3c 03 7e 40 8b ab 20 02 00 00 e9 64 ff ff ff e8 df f0 81 2
       RSP: 0018:ffffc9000290fcb0 EFLAGS: 00010293
       RAX: ffff888011cb8000 RBX: 0000000000000000 RCX: ffffffff86eecf0e
       RDX: 0000000000000000 RSI: ffffffff86eecf6a RDI: 0000000000000005
       RBP: 0000000000000e28 R08: ffff888011cb8000 R09: fffffbfff1f48139
       R10: ffffffff8fa409c7 R11: fffffbfff1f48138 R12: ffff8880215e6220
       R13: ffffffff8fa409c0 R14: ffffc9000290fd30 R15: 1ffff92000521fa2
       FS:  00007f41c78f4800(0000) GS:ffff88802d000000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007f95c803d088 CR3: 0000000025ed2000 CR4: 00000000000006f0
       Call Trace:
        __mptcp_destroy_sock+0x4f5/0x8e0
         mptcp_close+0x5e2/0x7f0
        inet_release+0x12b/0x270
        __sock_release+0xc8/0x270
        sock_close+0x18/0x20
        __fput+0x272/0x8e0
        task_work_run+0xe0/0x1a0
        exit_to_user_mode_prepare+0x1df/0x200
        syscall_exit_to_user_mode+0x19/0x50
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      userspace programs provide arbitrarily high values of 'len' in sendmsg():
      this is causing integer overflow of 'amount'. Cap forward allocation to 1
      megabyte: higher values are not really useful.
      Suggested-by: NPaolo Abeni <pabeni@redhat.com>
      Fixes: e93da928 ("mptcp: implement wmem reservation")
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Link: https://lore.kernel.org/r/3334d00d8b2faecafdfab9aa593efcbf61442756.1608584474.git.dcaratti@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      e7579d5d
    • A
      net-sysfs: take the rtnl lock when accessing xps_rxqs_map and num_tc · 4ae2bb81
      Antoine Tenart 提交于
      Accesses to dev->xps_rxqs_map (when using dev->num_tc) should be
      protected by the rtnl lock, like we do for netif_set_xps_queue. I didn't
      see an actual bug being triggered, but let's be safe here and take the
      rtnl lock while accessing the map in sysfs.
      
      Fixes: 8af2c06f ("net-sysfs: Add interface for Rx queue(s) map per Tx queue")
      Signed-off-by: NAntoine Tenart <atenart@kernel.org>
      Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      4ae2bb81
    • A
      net-sysfs: take the rtnl lock when storing xps_rxqs · 2d57b4f1
      Antoine Tenart 提交于
      Two race conditions can be triggered when storing xps rxqs, resulting in
      various oops and invalid memory accesses:
      
      1. Calling netdev_set_num_tc while netif_set_xps_queue:
      
         - netif_set_xps_queue uses dev->tc_num as one of the parameters to
           compute the size of new_dev_maps when allocating it. dev->tc_num is
           also used to access the map, and the compiler may generate code to
           retrieve this field multiple times in the function.
      
         - netdev_set_num_tc sets dev->tc_num.
      
         If new_dev_maps is allocated using dev->tc_num and then dev->tc_num
         is set to a higher value through netdev_set_num_tc, later accesses to
         new_dev_maps in netif_set_xps_queue could lead to accessing memory
         outside of new_dev_maps; triggering an oops.
      
      2. Calling netif_set_xps_queue while netdev_set_num_tc is running:
      
         2.1. netdev_set_num_tc starts by resetting the xps queues,
              dev->tc_num isn't updated yet.
      
         2.2. netif_set_xps_queue is called, setting up the map with the
              *old* dev->num_tc.
      
         2.3. netdev_set_num_tc updates dev->tc_num.
      
         2.4. Later accesses to the map lead to out of bound accesses and
              oops.
      
         A similar issue can be found with netdev_reset_tc.
      
      One way of triggering this is to set an iface up (for which the driver
      uses netdev_set_num_tc in the open path, such as bnx2x) and writing to
      xps_rxqs in a concurrent thread. With the right timing an oops is
      triggered.
      
      Both issues have the same fix: netif_set_xps_queue, netdev_set_num_tc
      and netdev_reset_tc should be mutually exclusive. We do that by taking
      the rtnl lock in xps_rxqs_store.
      
      Fixes: 8af2c06f ("net-sysfs: Add interface for Rx queue(s) map per Tx queue")
      Signed-off-by: NAntoine Tenart <atenart@kernel.org>
      Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      2d57b4f1
    • A
      net-sysfs: take the rtnl lock when accessing xps_cpus_map and num_tc · fb250385
      Antoine Tenart 提交于
      Accesses to dev->xps_cpus_map (when using dev->num_tc) should be
      protected by the rtnl lock, like we do for netif_set_xps_queue. I didn't
      see an actual bug being triggered, but let's be safe here and take the
      rtnl lock while accessing the map in sysfs.
      
      Fixes: 184c449f ("net: Add support for XPS with QoS via traffic classes")
      Signed-off-by: NAntoine Tenart <atenart@kernel.org>
      Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      fb250385
    • A
      net-sysfs: take the rtnl lock when storing xps_cpus · 1ad58225
      Antoine Tenart 提交于
      Two race conditions can be triggered when storing xps cpus, resulting in
      various oops and invalid memory accesses:
      
      1. Calling netdev_set_num_tc while netif_set_xps_queue:
      
         - netif_set_xps_queue uses dev->tc_num as one of the parameters to
           compute the size of new_dev_maps when allocating it. dev->tc_num is
           also used to access the map, and the compiler may generate code to
           retrieve this field multiple times in the function.
      
         - netdev_set_num_tc sets dev->tc_num.
      
         If new_dev_maps is allocated using dev->tc_num and then dev->tc_num
         is set to a higher value through netdev_set_num_tc, later accesses to
         new_dev_maps in netif_set_xps_queue could lead to accessing memory
         outside of new_dev_maps; triggering an oops.
      
      2. Calling netif_set_xps_queue while netdev_set_num_tc is running:
      
         2.1. netdev_set_num_tc starts by resetting the xps queues,
              dev->tc_num isn't updated yet.
      
         2.2. netif_set_xps_queue is called, setting up the map with the
              *old* dev->num_tc.
      
         2.3. netdev_set_num_tc updates dev->tc_num.
      
         2.4. Later accesses to the map lead to out of bound accesses and
              oops.
      
         A similar issue can be found with netdev_reset_tc.
      
      One way of triggering this is to set an iface up (for which the driver
      uses netdev_set_num_tc in the open path, such as bnx2x) and writing to
      xps_cpus in a concurrent thread. With the right timing an oops is
      triggered.
      
      Both issues have the same fix: netif_set_xps_queue, netdev_set_num_tc
      and netdev_reset_tc should be mutually exclusive. We do that by taking
      the rtnl lock in xps_cpus_store.
      
      Fixes: 184c449f ("net: Add support for XPS with QoS via traffic classes")
      Signed-off-by: NAntoine Tenart <atenart@kernel.org>
      Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      1ad58225
    • I
      libceph: align session_key and con_secret to 16 bytes · f5f2c9a0
      Ilya Dryomov 提交于
      crypto_shash_setkey() and crypto_aead_setkey() will do a (small)
      GFP_ATOMIC allocation to align the key if it isn't suitably aligned.
      It's not a big deal, but at the same time easy to avoid.
      
      The actual alignment requirement is dynamic, queryable with
      crypto_shash_alignmask() and crypto_aead_alignmask(), but shouldn't
      be stricter than 16 bytes for our algorithms.
      
      Fixes: cd1a677c ("libceph, ceph: implement msgr2.1 protocol (crc and secure modes)")
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      f5f2c9a0
    • I
      libceph: fix auth_signature buffer allocation in secure mode · ad32fe88
      Ilya Dryomov 提交于
      auth_signature frame is 68 bytes in plain mode and 96 bytes in
      secure mode but we are requesting 68 bytes in both modes.  By luck,
      this doesn't actually result in any invalid memory accesses because
      the allocation is satisfied out of kmalloc-96 slab and so exactly
      96 bytes are allocated, but KASAN rightfully complains.
      
      Fixes: cd1a677c ("libceph, ceph: implement msgr2.1 protocol (crc and secure modes)")
      Reported-by: NLuis Henriques <lhenriques@suse.de>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      ad32fe88
  5. 28 12月, 2020 2 次提交
    • P
      netfilter: nftables: add set expression flags · b4e70d8d
      Pablo Neira Ayuso 提交于
      The set flag NFT_SET_EXPR provides a hint to the kernel that userspace
      supports for multiple expressions per set element. In the same
      direction, NFT_DYNSET_F_EXPR specifies that dynset expression defines
      multiple expressions per set element.
      
      This allows new userspace software with old kernels to bail out with
      EOPNOTSUPP. This update is similar to ef516e86 ("netfilter:
      nf_tables: reintroduce the NFT_SET_CONCAT flag"). The NFT_SET_EXPR flag
      needs to be set on when the NFTA_SET_EXPRESSIONS attribute is specified.
      The NFT_SET_EXPR flag is not set on with NFTA_SET_EXPR to retain
      backward compatibility in old userspace binaries.
      
      Fixes: 48b0ae04 ("netfilter: nftables: netlink support for several set element expressions")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      b4e70d8d
    • P
      netfilter: nft_dynset: report EOPNOTSUPP on missing set feature · 95cd4bca
      Pablo Neira Ayuso 提交于
      If userspace requests a feature which is not available the original set
      definition, then bail out with EOPNOTSUPP. If userspace sends
      unsupported dynset flags (new feature not supported by this kernel),
      then report EOPNOTSUPP to userspace. EINVAL should be only used to
      report malformed netlink messages from userspace.
      
      Fixes: 22fe54d5 ("netfilter: nf_tables: add support for dynamic set updates")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      95cd4bca
  6. 27 12月, 2020 1 次提交
  7. 24 12月, 2020 2 次提交
  8. 19 12月, 2020 2 次提交
  9. 18 12月, 2020 11 次提交
  10. 17 12月, 2020 2 次提交
    • G
      mptcp: clear use_ack and use_map when dropping other suboptions · 3ae32c07
      Geliang Tang 提交于
      This patch cleared use_ack and use_map when dropping other suboptions to
      fix the following syzkaller BUG:
      
      [   15.223006] BUG: unable to handle page fault for address: 0000000000223b10
      [   15.223700] #PF: supervisor read access in kernel mode
      [   15.224209] #PF: error_code(0x0000) - not-present page
      [   15.224724] PGD b8d5067 P4D b8d5067 PUD c0a5067 PMD 0
      [   15.225237] Oops: 0000 [#1] SMP
      [   15.225556] CPU: 0 PID: 7747 Comm: syz-executor Not tainted 5.10.0-rc6+ #24
      [   15.226281] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [   15.227292] RIP: 0010:skb_release_data+0x89/0x1e0
      [   15.227816] Code: 5b 5d 41 5c 41 5d 41 5e 41 5f e9 02 06 8a ff e8 fd 05 8a ff 45 31 ed 80 7d 02 00 4c 8d 65 30 74 55 e8 eb 05 8a ff 49 8b 1c 24 <4c> 8b 7b 08 41 f6 c7 01 0f 85 18 01 00 00 e8 d4 05 8a ff 8b 43 34
      [   15.229669] RSP: 0018:ffffc900019c7c08 EFLAGS: 00010293
      [   15.230188] RAX: ffff88800daad900 RBX: 0000000000223b08 RCX: 0000000000000006
      [   15.230895] RDX: 0000000000000000 RSI: ffffffff818e06c5 RDI: ffff88807f6dc700
      [   15.231593] RBP: ffff88807f71a4c0 R08: 0000000000000001 R09: 0000000000000001
      [   15.232299] R10: ffffc900019c7c18 R11: 0000000000000000 R12: ffff88807f71a4f0
      [   15.233007] R13: 0000000000000000 R14: ffff88807f6dc700 R15: 0000000000000002
      [   15.233714] FS:  00007f65d9b5f700(0000) GS:ffff88807c400000(0000) knlGS:0000000000000000
      [   15.234509] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   15.235081] CR2: 0000000000223b10 CR3: 000000000b883000 CR4: 00000000000006f0
      [   15.235788] Call Trace:
      [   15.236042]  skb_release_all+0x28/0x30
      [   15.236419]  __kfree_skb+0x11/0x20
      [   15.236768]  tcp_data_queue+0x270/0x1240
      [   15.237161]  ? tcp_urg+0x50/0x2a0
      [   15.237496]  tcp_rcv_established+0x39a/0x890
      [   15.237997]  ? mark_held_locks+0x49/0x70
      [   15.238467]  tcp_v4_do_rcv+0xb9/0x270
      [   15.238915]  __release_sock+0x8a/0x160
      [   15.239365]  release_sock+0x32/0xd0
      [   15.239793]  __inet_stream_connect+0x1d2/0x400
      [   15.240313]  ? do_wait_intr_irq+0x80/0x80
      [   15.240791]  inet_stream_connect+0x36/0x50
      [   15.241275]  mptcp_stream_connect+0x69/0x1b0
      [   15.241787]  __sys_connect+0x122/0x140
      [   15.242236]  ? syscall_enter_from_user_mode+0x17/0x50
      [   15.242836]  ? lockdep_hardirqs_on_prepare+0xd4/0x170
      [   15.243436]  __x64_sys_connect+0x1a/0x20
      [   15.243924]  do_syscall_64+0x33/0x40
      [   15.244313]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   15.244821] RIP: 0033:0x7f65d946e469
      [   15.245183] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ff 49 2b 00 f7 d8 64 89 01 48
      [   15.247019] RSP: 002b:00007f65d9b5eda8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
      [   15.247770] RAX: ffffffffffffffda RBX: 000000000049bf00 RCX: 00007f65d946e469
      [   15.248471] RDX: 0000000000000010 RSI: 00000000200000c0 RDI: 0000000000000005
      [   15.249205] RBP: 000000000049bf00 R08: 0000000000000000 R09: 0000000000000000
      [   15.249908] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000049bf0c
      [   15.250603] R13: 00007fffe8a25cef R14: 00007f65d9b3f000 R15: 0000000000000003
      [   15.251312] Modules linked in:
      [   15.251626] CR2: 0000000000223b10
      [   15.251965] BUG: kernel NULL pointer dereference, address: 0000000000000048
      [   15.252005] ---[ end trace f5c51fe19123c773 ]---
      [   15.252822] #PF: supervisor read access in kernel mode
      [   15.252823] #PF: error_code(0x0000) - not-present page
      [   15.252825] PGD c6c6067 P4D c6c6067 PUD c0d8067
      [   15.253294] RIP: 0010:skb_release_data+0x89/0x1e0
      [   15.253910] PMD 0
      [   15.253914] Oops: 0000 [#2] SMP
      [   15.253917] CPU: 1 PID: 7746 Comm: syz-executor Tainted: G      D           5.10.0-rc6+ #24
      [   15.253920] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [   15.254435] Code: 5b 5d 41 5c 41 5d 41 5e 41 5f e9 02 06 8a ff e8 fd 05 8a ff 45 31 ed 80 7d 02 00 4c 8d 65 30 74 55 e8 eb 05 8a ff 49 8b 1c 24 <4c> 8b 7b 08 41 f6 c7 01 0f 85 18 01 00 00 e8 d4 05 8a ff 8b 43 34
      [   15.254899] RIP: 0010:skb_release_data+0x89/0x1e0
      [   15.254902] Code: 5b 5d 41 5c 41 5d 41 5e 41 5f e9 02 06 8a ff e8 fd 05 8a ff 45 31 ed 80 7d 02 00 4c 8d 65 30 74 55 e8 eb 05 8a ff 49 8b 1c 24 <4c> 8b 7b 08 41 f6 c7 01 0f 85 18 01 00 00 e8 d4 05 8a ff 8b 43 34
      [   15.254905] RSP: 0018:ffffc900019bfc08 EFLAGS: 00010293
      [   15.255376] RSP: 0018:ffffc900019c7c08 EFLAGS: 00010293
      [   15.255580]
      [   15.255583] RAX: ffff888004a7ac80 RBX: 0000000000000040 RCX: 0000000000000000
      [   15.255912]
      [   15.256724] RDX: 0000000000000000 RSI: ffffffff818e06c5 RDI: ffff88807f6ddd00
      [   15.257620] RAX: ffff88800daad900 RBX: 0000000000223b08 RCX: 0000000000000006
      [   15.259817] RBP: ffff88800e9006c0 R08: 0000000000000000 R09: 0000000000000000
      [   15.259818] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88800e9006f0
      [   15.259820] R13: 0000000000000000 R14: ffff88807f6ddd00 R15: 0000000000000002
      [   15.259822] FS:  00007fae4a60a700(0000) GS:ffff88807c500000(0000) knlGS:0000000000000000
      [   15.259826] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   15.260296] RDX: 0000000000000000 RSI: ffffffff818e06c5 RDI: ffff88807f6dc700
      [   15.262514] CR2: 0000000000000048 CR3: 000000000b89c000 CR4: 00000000000006e0
      [   15.262515] Call Trace:
      [   15.262519]  skb_release_all+0x28/0x30
      [   15.262523]  __kfree_skb+0x11/0x20
      [   15.263054] RBP: ffff88807f71a4c0 R08: 0000000000000001 R09: 0000000000000001
      [   15.263680]  tcp_data_queue+0x270/0x1240
      [   15.263843] R10: ffffc900019c7c18 R11: 0000000000000000 R12: ffff88807f71a4f0
      [   15.264693]  ? tcp_urg+0x50/0x2a0
      [   15.264856] R13: 0000000000000000 R14: ffff88807f6dc700 R15: 0000000000000002
      [   15.265720]  tcp_rcv_established+0x39a/0x890
      [   15.266438] FS:  00007f65d9b5f700(0000) GS:ffff88807c400000(0000) knlGS:0000000000000000
      [   15.267283]  ? __schedule+0x3fa/0x880
      [   15.267287]  tcp_v4_do_rcv+0xb9/0x270
      [   15.267290]  __release_sock+0x8a/0x160
      [   15.268049] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   15.268788]  release_sock+0x32/0xd0
      [   15.268791]  __inet_stream_connect+0x1d2/0x400
      [   15.268795]  ? do_wait_intr_irq+0x80/0x80
      [   15.269593] CR2: 0000000000223b10 CR3: 000000000b883000 CR4: 00000000000006f0
      [   15.270246]  inet_stream_connect+0x36/0x50
      [   15.270250]  mptcp_stream_connect+0x69/0x1b0
      [   15.270253]  __sys_connect+0x122/0x140
      [   15.271097] Kernel panic - not syncing: Fatal exception
      [   15.271820]  ? syscall_enter_from_user_mode+0x17/0x50
      [   15.283542]  ? lockdep_hardirqs_on_prepare+0xd4/0x170
      [   15.284275]  __x64_sys_connect+0x1a/0x20
      [   15.284853]  do_syscall_64+0x33/0x40
      [   15.285369]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   15.286105] RIP: 0033:0x7fae49f19469
      [   15.286638] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ff 49 2b 00 f7 d8 64 89 01 48
      [   15.289295] RSP: 002b:00007fae4a609da8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
      [   15.290375] RAX: ffffffffffffffda RBX: 000000000049bf00 RCX: 00007fae49f19469
      [   15.291403] RDX: 0000000000000010 RSI: 00000000200000c0 RDI: 0000000000000005
      [   15.292437] RBP: 000000000049bf00 R08: 0000000000000000 R09: 0000000000000000
      [   15.293456] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000049bf0c
      [   15.294473] R13: 00007fff0004b6bf R14: 00007fae4a5ea000 R15: 0000000000000003
      [   15.295492] Modules linked in:
      [   15.295944] CR2: 0000000000000048
      [   15.296567] Kernel Offset: disabled
      [   15.296941] ---[ end Kernel panic - not syncing: Fatal exception ]---
      Reported-by: NChristoph Paasch <cpaasch@apple.com>
      Fixes: 84dfe367 (mptcp: send out dedicated ADD_ADDR packet)
      Signed-off-by: NGeliang Tang <geliangtang@gmail.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Link: https://lore.kernel.org/r/ccca4e8f01457a1b495c5d612ed16c5f7a585706.1608010058.git.geliangtang@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      3ae32c07
    • K
      net/smc: fix access to parent of an ib device · 995433b7
      Karsten Graul 提交于
      The parent of an ib device is used to retrieve the PCI device
      attributes. It turns out that there are possible cases when an ib device
      has no parent set in the device structure, which may lead to page
      faults when trying to access this memory.
      Fix that by checking the parent pointer and consolidate the pci device
      specific processing in a new function.
      
      Fixes: a3db10ef ("net/smc: Add support for obtaining SMCR device list")
      Reported-by: syzbot+600fef7c414ee7e2d71b@syzkaller.appspotmail.com
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Link: https://lore.kernel.org/r/20201215091058.49354-2-kgraul@linux.ibm.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      995433b7