1. 15 5月, 2023 1 次提交
    • D
      net: nsh: Use correct mac_offset to unwind gso skb in nsh_gso_segment() · c83b4938
      Dong Chenchen 提交于
      As the call trace shows, skb_panic was caused by wrong skb->mac_header
      in nsh_gso_segment():
      
      invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
      CPU: 3 PID: 2737 Comm: syz Not tainted 6.3.0-next-20230505 #1
      RIP: 0010:skb_panic+0xda/0xe0
      call Trace:
       skb_push+0x91/0xa0
       nsh_gso_segment+0x4f3/0x570
       skb_mac_gso_segment+0x19e/0x270
       __skb_gso_segment+0x1e8/0x3c0
       validate_xmit_skb+0x452/0x890
       validate_xmit_skb_list+0x99/0xd0
       sch_direct_xmit+0x294/0x7c0
       __dev_queue_xmit+0x16f0/0x1d70
       packet_xmit+0x185/0x210
       packet_snd+0xc15/0x1170
       packet_sendmsg+0x7b/0xa0
       sock_sendmsg+0x14f/0x160
      
      The root cause is:
      nsh_gso_segment() use skb->network_header - nhoff to reset mac_header
      in skb_gso_error_unwind() if inner-layer protocol gso fails.
      However, skb->network_header may be reset by inner-layer protocol
      gso function e.g. mpls_gso_segment. skb->mac_header reset by the
      inaccurate network_header will be larger than skb headroom.
      
      nsh_gso_segment
          nhoff = skb->network_header - skb->mac_header;
          __skb_pull(skb,nsh_len)
          skb_mac_gso_segment
              mpls_gso_segment
                  skb_reset_network_header(skb);//skb->network_header+=nsh_len
                  return -EINVAL;
          skb_gso_error_unwind
              skb_push(skb, nsh_len);
              skb->mac_header = skb->network_header - nhoff;
              // skb->mac_header > skb->headroom, cause skb_push panic
      
      Use correct mac_offset to restore mac_header and get rid of nhoff.
      
      Fixes: c411ed85 ("nsh: add GSO support")
      Reported-by: syzbot+632b5d9964208bfef8c0@syzkaller.appspotmail.com
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDong Chenchen <dongchenchen2@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c83b4938
  2. 14 5月, 2023 9 次提交
  3. 13 5月, 2023 1 次提交
  4. 12 5月, 2023 15 次提交
  5. 11 5月, 2023 10 次提交
    • L
      Merge tag 'net-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 6e27831b
      Linus Torvalds 提交于
      Pull networking fixes from Paolo Abeni:
       "Including fixes from netfilter.
      
        Current release - regressions:
      
         - mtk_eth_soc: fix NULL pointer dereference
      
        Previous releases - regressions:
      
         - core:
            - skb_partial_csum_set() fix against transport header magic value
            - fix load-tearing on sk->sk_stamp in sock_recv_cmsgs().
            - annotate sk->sk_err write from do_recvmmsg()
            - add vlan_get_protocol_and_depth() helper
      
         - netlink: annotate accesses to nlk->cb_running
      
         - netfilter: always release netdev hooks from notifier
      
        Previous releases - always broken:
      
         - core: deal with most data-races in sk_wait_event()
      
         - netfilter: fix possible bug_on with enable_hooks=1
      
         - eth: bonding: fix send_peer_notif overflow
      
         - eth: xpcs: fix incorrect number of interfaces
      
         - eth: ipvlan: fix out-of-bounds caused by unclear skb->cb
      
         - eth: stmmac: Initialize MAC_ONEUS_TIC_COUNTER register"
      
      * tag 'net-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (31 commits)
        af_unix: Fix data races around sk->sk_shutdown.
        af_unix: Fix a data race of sk->sk_receive_queue->qlen.
        net: datagram: fix data-races in datagram_poll()
        net: mscc: ocelot: fix stat counter register values
        ipvlan:Fix out-of-bounds caused by unclear skb->cb
        docs: networking: fix x25-iface.rst heading & index order
        gve: Remove the code of clearing PBA bit
        tcp: add annotations around sk->sk_shutdown accesses
        net: add vlan_get_protocol_and_depth() helper
        net: pcs: xpcs: fix incorrect number of interfaces
        net: deal with most data-races in sk_wait_event()
        net: annotate sk->sk_err write from do_recvmmsg()
        netlink: annotate accesses to nlk->cb_running
        kselftest: bonding: add num_grat_arp test
        selftests: forwarding: lib: add netns support for tc rule handle stats get
        Documentation: bonding: fix the doc of peer_notif_delay
        bonding: fix send_peer_notif overflow
        net: ethernet: mtk_eth_soc: fix NULL pointer dereference
        selftests: nft_flowtable.sh: check ingress/egress chain too
        selftests: nft_flowtable.sh: monitor result file sizes
        ...
      6e27831b
    • L
      Merge tag 'media/v6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 691e1eee
      Linus Torvalds 提交于
      Pull media fixes from Mauro Carvalho Chehab:
      
       - fix some unused-variable warning in mtk-mdp3
      
       - ignore unused suspend operations in nxp
      
       - some driver fixes in rcar-vin
      
      * tag 'media/v6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
        media: platform: mtk-mdp3: work around unused-variable warning
        media: nxp: ignore unused suspend operations
        media: rcar-vin: Select correct interrupt mode for V4L2_FIELD_ALTERNATE
        media: rcar-vin: Fix NV12 size alignment
        media: rcar-vin: Gen3 can not scale NV12
      691e1eee
    • J
      Merge tag 'nf-23-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · cceac926
      Jakub Kicinski 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Fix UAF when releasing netnamespace, from Florian Westphal.
      
      2) Fix possible BUG_ON when nf_conntrack is enabled with enable_hooks,
         from Florian Westphal.
      
      3) Fixes for nft_flowtable.sh selftest, from Boris Sukholitko.
      
      4) Extend nft_flowtable.sh selftest to cover integration with
         ingress/egress hooks, from Florian Westphal.
      
      * tag 'nf-23-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        selftests: nft_flowtable.sh: check ingress/egress chain too
        selftests: nft_flowtable.sh: monitor result file sizes
        selftests: nft_flowtable.sh: wait for specific nc pids
        selftests: nft_flowtable.sh: no need for ps -x option
        selftests: nft_flowtable.sh: use /proc for pid checking
        netfilter: conntrack: fix possible bug_on with enable_hooks=1
        netfilter: nf_tables: always release netdev hooks from notifier
      ====================
      
      Link: https://lore.kernel.org/r/20230510083313.152961-1-pablo@netfilter.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      cceac926
    • J
      Merge branch 'af_unix-fix-two-data-races-reported-by-kcsan' · 33dcee99
      Jakub Kicinski 提交于
      Kuniyuki Iwashima says:
      
      ====================
      af_unix: Fix two data races reported by KCSAN.
      
      KCSAN reported data races around these two fields for AF_UNIX sockets.
      
        * sk->sk_receive_queue->qlen
        * sk->sk_shutdown
      
      Let's annotate them properly.
      ====================
      
      Link: https://lore.kernel.org/r/20230510003456.42357-1-kuniyu@amazon.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      33dcee99
    • K
      af_unix: Fix data races around sk->sk_shutdown. · e1d09c2c
      Kuniyuki Iwashima 提交于
      KCSAN found a data race around sk->sk_shutdown where unix_release_sock()
      and unix_shutdown() update it under unix_state_lock(), OTOH unix_poll()
      and unix_dgram_poll() read it locklessly.
      
      We need to annotate the writes and reads with WRITE_ONCE() and READ_ONCE().
      
      BUG: KCSAN: data-race in unix_poll / unix_release_sock
      
      write to 0xffff88800d0f8aec of 1 bytes by task 264 on cpu 0:
       unix_release_sock+0x75c/0x910 net/unix/af_unix.c:631
       unix_release+0x59/0x80 net/unix/af_unix.c:1042
       __sock_release+0x7d/0x170 net/socket.c:653
       sock_close+0x19/0x30 net/socket.c:1397
       __fput+0x179/0x5e0 fs/file_table.c:321
       ____fput+0x15/0x20 fs/file_table.c:349
       task_work_run+0x116/0x1a0 kernel/task_work.c:179
       resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
       exit_to_user_mode_prepare+0x174/0x180 kernel/entry/common.c:204
       __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
       syscall_exit_to_user_mode+0x1a/0x30 kernel/entry/common.c:297
       do_syscall_64+0x4b/0x90 arch/x86/entry/common.c:86
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      read to 0xffff88800d0f8aec of 1 bytes by task 222 on cpu 1:
       unix_poll+0xa3/0x2a0 net/unix/af_unix.c:3170
       sock_poll+0xcf/0x2b0 net/socket.c:1385
       vfs_poll include/linux/poll.h:88 [inline]
       ep_item_poll.isra.0+0x78/0xc0 fs/eventpoll.c:855
       ep_send_events fs/eventpoll.c:1694 [inline]
       ep_poll fs/eventpoll.c:1823 [inline]
       do_epoll_wait+0x6c4/0xea0 fs/eventpoll.c:2258
       __do_sys_epoll_wait fs/eventpoll.c:2270 [inline]
       __se_sys_epoll_wait fs/eventpoll.c:2265 [inline]
       __x64_sys_epoll_wait+0xcc/0x190 fs/eventpoll.c:2265
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      value changed: 0x00 -> 0x03
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 222 Comm: dbus-broker Not tainted 6.3.0-rc7-02330-gca6270c12e20 #2
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      
      Fixes: 3c73419c ("af_unix: fix 'poll for write'/ connected DGRAM sockets")
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NMichal Kubiak <michal.kubiak@intel.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      e1d09c2c
    • K
      af_unix: Fix a data race of sk->sk_receive_queue->qlen. · 679ed006
      Kuniyuki Iwashima 提交于
      KCSAN found a data race of sk->sk_receive_queue->qlen where recvmsg()
      updates qlen under the queue lock and sendmsg() checks qlen under
      unix_state_sock(), not the queue lock, so the reader side needs
      READ_ONCE().
      
      BUG: KCSAN: data-race in __skb_try_recv_from_queue / unix_wait_for_peer
      
      write (marked) to 0xffff888019fe7c68 of 4 bytes by task 49792 on cpu 0:
       __skb_unlink include/linux/skbuff.h:2347 [inline]
       __skb_try_recv_from_queue+0x3de/0x470 net/core/datagram.c:197
       __skb_try_recv_datagram+0xf7/0x390 net/core/datagram.c:263
       __unix_dgram_recvmsg+0x109/0x8a0 net/unix/af_unix.c:2452
       unix_dgram_recvmsg+0x94/0xa0 net/unix/af_unix.c:2549
       sock_recvmsg_nosec net/socket.c:1019 [inline]
       ____sys_recvmsg+0x3a3/0x3b0 net/socket.c:2720
       ___sys_recvmsg+0xc8/0x150 net/socket.c:2764
       do_recvmmsg+0x182/0x560 net/socket.c:2858
       __sys_recvmmsg net/socket.c:2937 [inline]
       __do_sys_recvmmsg net/socket.c:2960 [inline]
       __se_sys_recvmmsg net/socket.c:2953 [inline]
       __x64_sys_recvmmsg+0x153/0x170 net/socket.c:2953
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      read to 0xffff888019fe7c68 of 4 bytes by task 49793 on cpu 1:
       skb_queue_len include/linux/skbuff.h:2127 [inline]
       unix_recvq_full net/unix/af_unix.c:229 [inline]
       unix_wait_for_peer+0x154/0x1a0 net/unix/af_unix.c:1445
       unix_dgram_sendmsg+0x13bc/0x14b0 net/unix/af_unix.c:2048
       sock_sendmsg_nosec net/socket.c:724 [inline]
       sock_sendmsg+0x148/0x160 net/socket.c:747
       ____sys_sendmsg+0x20e/0x620 net/socket.c:2503
       ___sys_sendmsg+0xc6/0x140 net/socket.c:2557
       __sys_sendmmsg+0x11d/0x370 net/socket.c:2643
       __do_sys_sendmmsg net/socket.c:2672 [inline]
       __se_sys_sendmmsg net/socket.c:2669 [inline]
       __x64_sys_sendmmsg+0x58/0x70 net/socket.c:2669
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      value changed: 0x0000000b -> 0x00000001
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 49793 Comm: syz-executor.0 Not tainted 6.3.0-rc7-02330-gca6270c12e20 #2
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NMichal Kubiak <michal.kubiak@intel.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      679ed006
    • E
      net: datagram: fix data-races in datagram_poll() · 5bca1d08
      Eric Dumazet 提交于
      datagram_poll() runs locklessly, we should add READ_ONCE()
      annotations while reading sk->sk_err, sk->sk_shutdown and sk->sk_state.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230509173131.3263780-1-edumazet@google.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      5bca1d08
    • L
      MAINTAINERS: re-sort all entries and fields · 80e62bc8
      Linus Torvalds 提交于
      It's been a few years since we've sorted this thing, and the end result
      is that we've added MAINTAINERS entries in the wrong order, and a number
      of entries have their fields in non-canonical order too.
      
      So roll this boulder up the hill one more time by re-running
      
         ./scripts/parse-maintainers.pl --order
      
      on it.
      
      This file ends up being fairly painful for merge conflicts even
      normally, since unlike almost all other kernel files it's one of those
      "everybody touches the same thing", and re-ordering all entries is only
      going to make that worse.  But the alternative is to never do it at all,
      and just let it all rot..
      
      The rc2 week is likely the quietest and least painful time to do this.
      Requested-by: NRandy Dunlap <rdunlap@infradead.org>
      Requested-by: Joe Perches <joe@perches.com>	# "Please use --order"
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      80e62bc8
    • L
      Merge tag 'fsnotify_for_v6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · d295b66a
      Linus Torvalds 提交于
      Pull inotify fix from Jan Kara:
       "A fix for possibly reporting invalid watch descriptor with inotify
        event"
      
      * tag 'fsnotify_for_v6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
        inotify: Avoid reporting event with invalid wd
      d295b66a
    • L
      Merge tag 'gfs2-v6.3-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 · 2a78769d
      Linus Torvalds 提交于
      Pull gfs2 fix from Andreas Gruenbacher:
      
       - Fix a NULL pointer dereference when mounting corrupted filesystems
      
      * tag 'gfs2-v6.3-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
        gfs2: Don't deref jdesc in evict
      2a78769d
  6. 10 5月, 2023 4 次提交
    • B
      gfs2: Don't deref jdesc in evict · 504a10d9
      Bob Peterson 提交于
      On corrupt gfs2 file systems the evict code can try to reference the
      journal descriptor structure, jdesc, after it has been freed and set to
      NULL. The sequence of events is:
      
      init_journal()
      ...
      fail_jindex:
         gfs2_jindex_free(sdp); <------frees journals, sets jdesc = NULL
            if (gfs2_holder_initialized(&ji_gh))
               gfs2_glock_dq_uninit(&ji_gh);
      fail:
         iput(sdp->sd_jindex); <--references jdesc in evict_linked_inode
            evict()
               gfs2_evict_inode()
                  evict_linked_inode()
                     ret = gfs2_trans_begin(sdp, 0, sdp->sd_jdesc->jd_blocks);
      <------references the now freed/zeroed sd_jdesc pointer.
      
      The call to gfs2_trans_begin is done because the truncate_inode_pages
      call can cause gfs2 events that require a transaction, such as removing
      journaled data (jdata) blocks from the journal.
      
      This patch fixes the problem by adding a check for sdp->sd_jdesc to
      function gfs2_evict_inode. In theory, this should only happen to corrupt
      gfs2 file systems, when gfs2 detects the problem, reports it, then tries
      to evict all the system inodes it has read in up to that point.
      Reported-by: NYang Lan <lanyang0908@gmail.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      504a10d9
    • L
      Merge tag 'platform-drivers-x86-v6.4-2' of... · ad2fd53a
      Linus Torvalds 提交于
      Merge tag 'platform-drivers-x86-v6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
      
      Pull x86 platform driver fixes from Hans de Goede:
       "Nothing special to report just various small fixes:
      
         - thinkpad_acpi: Fix profile (performance/bal/low-power) regression
           on T490
      
         - misc other small fixes / hw-id additions"
      
      * tag 'platform-drivers-x86-v6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
        platform/mellanox: fix potential race in mlxbf-tmfifo driver
        platform/x86: touchscreen_dmi: Add info for the Dexp Ursus KX210i
        platform/x86: touchscreen_dmi: Add upside-down quirk for GDIX1002 ts on the Juno Tablet
        platform/x86: thinkpad_acpi: Add profile force ability
        platform/x86: thinkpad_acpi: Fix platform profiles on T490
        platform/x86: hp-wmi: add micmute to hp_wmi_keymap struct
        platform/x86/intel-uncore-freq: Return error on write frequency
        platform/x86: intel_scu_pcidrv: Add back PCI ID for Medfield
      ad2fd53a
    • C
      net: mscc: ocelot: fix stat counter register values · cdc2e28e
      Colin Foster 提交于
      Commit d4c36765 ("net: mscc: ocelot: keep ocelot_stat_layout by reg
      address, not offset") organized the stats counters for Ocelot chips, namely
      the VSC7512 and VSC7514. A few of the counter offsets were incorrect, and
      were caught by this warning:
      
      WARNING: CPU: 0 PID: 24 at drivers/net/ethernet/mscc/ocelot_stats.c:909
      ocelot_stats_init+0x1fc/0x2d8
      reg 0x5000078 had address 0x220 but reg 0x5000079 has address 0x214,
      bulking broken!
      
      Fix these register offsets.
      
      Fixes: d4c36765 ("net: mscc: ocelot: keep ocelot_stat_layout by reg address, not offset")
      Signed-off-by: NColin Foster <colin.foster@in-advantage.com>
      Reviewed-by: NSimon Horman <simon.horman@corigine.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cdc2e28e
    • T
      ipvlan:Fix out-of-bounds caused by unclear skb->cb · 90cbed52
      t.feng 提交于
      If skb enqueue the qdisc, fq_skb_cb(skb)->time_to_send is changed which
      is actually skb->cb, and IPCB(skb_in)->opt will be used in
      __ip_options_echo. It is possible that memcpy is out of bounds and lead
      to stack overflow.
      We should clear skb->cb before ip_local_out or ip6_local_out.
      
      v2:
      1. clean the stack info
      2. use IPCB/IP6CB instead of skb->cb
      
      crash on stable-5.10(reproduce in kasan kernel).
      Stack info:
      [ 2203.651571] BUG: KASAN: stack-out-of-bounds in
      __ip_options_echo+0x589/0x800
      [ 2203.653327] Write of size 4 at addr ffff88811a388f27 by task
      swapper/3/0
      [ 2203.655460] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Not tainted
      5.10.0-60.18.0.50.h856.kasan.eulerosv2r11.x86_64 #1
      [ 2203.655466] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
      BIOS rel-1.10.2-0-g5f4c7b1-20181220_000000-szxrtosci10000 04/01/2014
      [ 2203.655475] Call Trace:
      [ 2203.655481]  <IRQ>
      [ 2203.655501]  dump_stack+0x9c/0xd3
      [ 2203.655514]  print_address_description.constprop.0+0x19/0x170
      [ 2203.655530]  __kasan_report.cold+0x6c/0x84
      [ 2203.655586]  kasan_report+0x3a/0x50
      [ 2203.655594]  check_memory_region+0xfd/0x1f0
      [ 2203.655601]  memcpy+0x39/0x60
      [ 2203.655608]  __ip_options_echo+0x589/0x800
      [ 2203.655654]  __icmp_send+0x59a/0x960
      [ 2203.655755]  nf_send_unreach+0x129/0x3d0 [nf_reject_ipv4]
      [ 2203.655763]  reject_tg+0x77/0x1bf [ipt_REJECT]
      [ 2203.655772]  ipt_do_table+0x691/0xa40 [ip_tables]
      [ 2203.655821]  nf_hook_slow+0x69/0x100
      [ 2203.655828]  __ip_local_out+0x21e/0x2b0
      [ 2203.655857]  ip_local_out+0x28/0x90
      [ 2203.655868]  ipvlan_process_v4_outbound+0x21e/0x260 [ipvlan]
      [ 2203.655931]  ipvlan_xmit_mode_l3+0x3bd/0x400 [ipvlan]
      [ 2203.655967]  ipvlan_queue_xmit+0xb3/0x190 [ipvlan]
      [ 2203.655977]  ipvlan_start_xmit+0x2e/0xb0 [ipvlan]
      [ 2203.655984]  xmit_one.constprop.0+0xe1/0x280
      [ 2203.655992]  dev_hard_start_xmit+0x62/0x100
      [ 2203.656000]  sch_direct_xmit+0x215/0x640
      [ 2203.656028]  __qdisc_run+0x153/0x1f0
      [ 2203.656069]  __dev_queue_xmit+0x77f/0x1030
      [ 2203.656173]  ip_finish_output2+0x59b/0xc20
      [ 2203.656244]  __ip_finish_output.part.0+0x318/0x3d0
      [ 2203.656312]  ip_finish_output+0x168/0x190
      [ 2203.656320]  ip_output+0x12d/0x220
      [ 2203.656357]  __ip_queue_xmit+0x392/0x880
      [ 2203.656380]  __tcp_transmit_skb+0x1088/0x11c0
      [ 2203.656436]  __tcp_retransmit_skb+0x475/0xa30
      [ 2203.656505]  tcp_retransmit_skb+0x2d/0x190
      [ 2203.656512]  tcp_retransmit_timer+0x3af/0x9a0
      [ 2203.656519]  tcp_write_timer_handler+0x3ba/0x510
      [ 2203.656529]  tcp_write_timer+0x55/0x180
      [ 2203.656542]  call_timer_fn+0x3f/0x1d0
      [ 2203.656555]  expire_timers+0x160/0x200
      [ 2203.656562]  run_timer_softirq+0x1f4/0x480
      [ 2203.656606]  __do_softirq+0xfd/0x402
      [ 2203.656613]  asm_call_irq_on_stack+0x12/0x20
      [ 2203.656617]  </IRQ>
      [ 2203.656623]  do_softirq_own_stack+0x37/0x50
      [ 2203.656631]  irq_exit_rcu+0x134/0x1a0
      [ 2203.656639]  sysvec_apic_timer_interrupt+0x36/0x80
      [ 2203.656646]  asm_sysvec_apic_timer_interrupt+0x12/0x20
      [ 2203.656654] RIP: 0010:default_idle+0x13/0x20
      [ 2203.656663] Code: 89 f0 5d 41 5c 41 5d 41 5e c3 cc cc cc cc cc cc cc
      cc cc cc cc cc cc 0f 1f 44 00 00 0f 1f 44 00 00 0f 00 2d 9f 32 57 00 fb
      f4 <c3> cc cc cc cc 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 be 08
      [ 2203.656668] RSP: 0018:ffff88810036fe78 EFLAGS: 00000256
      [ 2203.656676] RAX: ffffffffaf2a87f0 RBX: ffff888100360000 RCX:
      ffffffffaf290191
      [ 2203.656681] RDX: 0000000000098b5e RSI: 0000000000000004 RDI:
      ffff88811a3c4f60
      [ 2203.656686] RBP: 0000000000000000 R08: 0000000000000001 R09:
      ffff88811a3c4f63
      [ 2203.656690] R10: ffffed10234789ec R11: 0000000000000001 R12:
      0000000000000003
      [ 2203.656695] R13: ffff888100360000 R14: 0000000000000000 R15:
      0000000000000000
      [ 2203.656729]  default_idle_call+0x5a/0x150
      [ 2203.656735]  cpuidle_idle_call+0x1c6/0x220
      [ 2203.656780]  do_idle+0xab/0x100
      [ 2203.656786]  cpu_startup_entry+0x19/0x20
      [ 2203.656793]  secondary_startup_64_no_verify+0xc2/0xcb
      
      [ 2203.657409] The buggy address belongs to the page:
      [ 2203.658648] page:0000000027a9842f refcount:1 mapcount:0
      mapping:0000000000000000 index:0x0 pfn:0x11a388
      [ 2203.658665] flags:
      0x17ffffc0001000(reserved|node=0|zone=2|lastcpupid=0x1fffff)
      [ 2203.658675] raw: 0017ffffc0001000 ffffea000468e208 ffffea000468e208
      0000000000000000
      [ 2203.658682] raw: 0000000000000000 0000000000000000 00000001ffffffff
      0000000000000000
      [ 2203.658686] page dumped because: kasan: bad access detected
      
      To reproduce(ipvlan with IPVLAN_MODE_L3):
      Env setting:
      =======================================================
      modprobe ipvlan ipvlan_default_mode=1
      sysctl net.ipv4.conf.eth0.forwarding=1
      iptables -t nat -A POSTROUTING -s 20.0.0.0/255.255.255.0 -o eth0 -j
      MASQUERADE
      ip link add gw link eth0 type ipvlan
      ip -4 addr add 20.0.0.254/24 dev gw
      ip netns add net1
      ip link add ipv1 link eth0 type ipvlan
      ip link set ipv1 netns net1
      ip netns exec net1 ip link set ipv1 up
      ip netns exec net1 ip -4 addr add 20.0.0.4/24 dev ipv1
      ip netns exec net1 route add default gw 20.0.0.254
      ip netns exec net1 tc qdisc add dev ipv1 root netem loss 10%
      ifconfig gw up
      iptables -t filter -A OUTPUT -p tcp --dport 8888 -j REJECT --reject-with
      icmp-port-unreachable
      =======================================================
      And then excute the shell(curl any address of eth0 can reach):
      
      for((i=1;i<=100000;i++))
      do
              ip netns exec net1 curl x.x.x.x:8888
      done
      =======================================================
      
      Fixes: 2ad7bf36 ("ipvlan: Initial check-in of the IPVLAN driver.")
      Signed-off-by: N"t.feng" <fengtao40@huawei.com>
      Suggested-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90cbed52