1. 05 3月, 2021 7 次提交
  2. 04 3月, 2021 3 次提交
    • Z
      rtnetlink: using dev_base_seq from target net · a9ecb0cb
      zhang kai 提交于
      Signed-off-by: Nzhang kai <zhangkaiheb@126.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a9ecb0cb
    • J
      net: 9p: advance iov on empty read · d65614a0
      Jisheng Zhang 提交于
      I met below warning when cating a small size(about 80bytes) txt file
      on 9pfs(msize=2097152 is passed to 9p mount option), the reason is we
      miss iov_iter_advance() if the read count is 0 for zerocopy case, so
      we didn't truncate the pipe, then iov_iter_pipe() thinks the pipe is
      full. Fix it by removing the exception for 0 to ensure to call
      iov_iter_advance() even on empty read for zerocopy case.
      
      [    8.279568] WARNING: CPU: 0 PID: 39 at lib/iov_iter.c:1203 iov_iter_pipe+0x31/0x40
      [    8.280028] Modules linked in:
      [    8.280561] CPU: 0 PID: 39 Comm: cat Not tainted 5.11.0+ #6
      [    8.281260] RIP: 0010:iov_iter_pipe+0x31/0x40
      [    8.281974] Code: 2b 42 54 39 42 5c 76 22 c7 07 20 00 00 00 48 89 57 18 8b 42 50 48 c7 47 08 b
      [    8.283169] RSP: 0018:ffff888000cbbd80 EFLAGS: 00000246
      [    8.283512] RAX: 0000000000000010 RBX: ffff888000117d00 RCX: 0000000000000000
      [    8.283876] RDX: ffff88800031d600 RSI: 0000000000000000 RDI: ffff888000cbbd90
      [    8.284244] RBP: ffff888000cbbe38 R08: 0000000000000000 R09: ffff8880008d2058
      [    8.284605] R10: 0000000000000002 R11: ffff888000375510 R12: 0000000000000050
      [    8.284964] R13: ffff888000cbbe80 R14: 0000000000000050 R15: ffff88800031d600
      [    8.285439] FS:  00007f24fd8af600(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000
      [    8.285844] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    8.286150] CR2: 00007f24fd7d7b90 CR3: 0000000000c97000 CR4: 00000000000406b0
      [    8.286710] Call Trace:
      [    8.288279]  generic_file_splice_read+0x31/0x1a0
      [    8.289273]  ? do_splice_to+0x2f/0x90
      [    8.289511]  splice_direct_to_actor+0xcc/0x220
      [    8.289788]  ? pipe_to_sendpage+0xa0/0xa0
      [    8.290052]  do_splice_direct+0x8b/0xd0
      [    8.290314]  do_sendfile+0x1ad/0x470
      [    8.290576]  do_syscall_64+0x2d/0x40
      [    8.290818]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [    8.291409] RIP: 0033:0x7f24fd7dca0a
      [    8.292511] Code: c3 0f 1f 80 00 00 00 00 4c 89 d2 4c 89 c6 e9 bd fd ff ff 0f 1f 44 00 00 31 8
      [    8.293360] RSP: 002b:00007ffc20932818 EFLAGS: 00000206 ORIG_RAX: 0000000000000028
      [    8.293800] RAX: ffffffffffffffda RBX: 0000000001000000 RCX: 00007f24fd7dca0a
      [    8.294153] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000001
      [    8.294504] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
      [    8.294867] R10: 0000000001000000 R11: 0000000000000206 R12: 0000000000000003
      [    8.295217] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
      [    8.295782] ---[ end trace 63317af81b3ca24b ]---
      Signed-off-by: NJisheng Zhang <Jisheng.Zhang@synaptics.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d65614a0
    • M
      net: l2tp: reduce log level of messages in receive path, add counter instead · 3e59e885
      Matthias Schiffer 提交于
      Commit 5ee759cd ("l2tp: use standard API for warning log messages")
      changed a number of warnings about invalid packets in the receive path
      so that they are always shown, instead of only when a special L2TP debug
      flag is set. Even with rate limiting these warnings can easily cause
      significant log spam - potentially triggered by a malicious party
      sending invalid packets on purpose.
      
      In addition these warnings were noticed by projects like Tunneldigger [1],
      which uses L2TP for its data path, but implements its own control
      protocol (which is sufficiently different from L2TP data packets that it
      would always be passed up to userspace even with future extensions of
      L2TP).
      
      Some of the warnings were already redundant, as l2tp_stats has a counter
      for these packets. This commit adds one additional counter for invalid
      packets that are passed up to userspace. Packets with unknown session are
      not counted as invalid, as there is nothing wrong with the format of
      these packets.
      
      With the additional counter, all of these messages are either redundant
      or benign, so we reduce them to pr_debug_ratelimited().
      
      [1] https://github.com/wlanslovenija/tunneldigger/issues/160
      
      Fixes: 5ee759cd ("l2tp: use standard API for warning log messages")
      Signed-off-by: NMatthias Schiffer <mschiffer@universe-factory.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3e59e885
  3. 02 3月, 2021 7 次提交
    • E
      tcp: add sanity tests to TCP_QUEUE_SEQ · 8811f4a9
      Eric Dumazet 提交于
      Qingyu Li reported a syzkaller bug where the repro
      changes RCV SEQ _after_ restoring data in the receive queue.
      
      mprotect(0x4aa000, 12288, PROT_READ)    = 0
      mmap(0x1ffff000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x1ffff000
      mmap(0x20000000, 16777216, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x20000000
      mmap(0x21000000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x21000000
      socket(AF_INET6, SOCK_STREAM, IPPROTO_IP) = 3
      setsockopt(3, SOL_TCP, TCP_REPAIR, [1], 4) = 0
      connect(3, {sa_family=AF_INET6, sin6_port=htons(0), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=0}, 28) = 0
      setsockopt(3, SOL_TCP, TCP_REPAIR_QUEUE, [1], 4) = 0
      sendmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="0x0000000000000003\0\0", iov_len=20}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20
      setsockopt(3, SOL_TCP, TCP_REPAIR, [0], 4) = 0
      setsockopt(3, SOL_TCP, TCP_QUEUE_SEQ, [128], 4) = 0
      recvfrom(3, NULL, 20, 0, NULL, NULL)    = -1 ECONNRESET (Connection reset by peer)
      
      syslog shows:
      [  111.205099] TCP recvmsg seq # bug 2: copied 80, seq 0, rcvnxt 80, fl 0
      [  111.207894] WARNING: CPU: 1 PID: 356 at net/ipv4/tcp.c:2343 tcp_recvmsg_locked+0x90e/0x29a0
      
      This should not be allowed. TCP_QUEUE_SEQ should only be used
      when queues are empty.
      
      This patch fixes this case, and the tx path as well.
      
      Fixes: ee995283 ("tcp: Initial repair mode")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=212005Reported-by: NQingyu Li <ieatmuttonchuan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8811f4a9
    • D
      net: dsa: tag_mtk: fix 802.1ad VLAN egress · 9200f515
      DENG Qingfang 提交于
      A different TPID bit is used for 802.1ad VLAN frames.
      Reported-by: NIlario Gelmetti <iochesonome@gmail.com>
      Fixes: f0af3431 ("net: dsa: mediatek: combine MediaTek tag with VLAN tag")
      Signed-off-by: NDENG Qingfang <dqfext@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9200f515
    • W
      net: expand textsearch ts_state to fit skb_seq_state · b228c9b0
      Willem de Bruijn 提交于
      The referenced commit expands the skb_seq_state used by
      skb_find_text with a 4B frag_off field, growing it to 48B.
      
      This exceeds container ts_state->cb, causing a stack corruption:
      
      [   73.238353] Kernel panic - not syncing: stack-protector: Kernel stack
      is corrupted in: skb_find_text+0xc5/0xd0
      [   73.247384] CPU: 1 PID: 376 Comm: nping Not tainted 5.11.0+ #4
      [   73.252613] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
      BIOS 1.14.0-2 04/01/2014
      [   73.260078] Call Trace:
      [   73.264677]  dump_stack+0x57/0x6a
      [   73.267866]  panic+0xf6/0x2b7
      [   73.270578]  ? skb_find_text+0xc5/0xd0
      [   73.273964]  __stack_chk_fail+0x10/0x10
      [   73.277491]  skb_find_text+0xc5/0xd0
      [   73.280727]  string_mt+0x1f/0x30
      [   73.283639]  ipt_do_table+0x214/0x410
      
      The struct is passed between skb_find_text and its callbacks
      skb_prepare_seq_read, skb_seq_read and skb_abort_seq read through
      the textsearch interface using TS_SKB_CB.
      
      I assumed that this mapped to skb->cb like other .._SKB_CB wrappers.
      skb->cb is 48B. But it maps to ts_state->cb, which is only 40B.
      
      skb->cb was increased from 40B to 48B after ts_state was introduced,
      in commit 3e3850e9 ("[NETFILTER]: Fix xfrm lookup in
      ip_route_me_harder/ip6_route_me_harder").
      
      Increase ts_state.cb[] to 48 to fit the struct.
      
      Also add a BUILD_BUG_ON to avoid a repeat.
      
      The alternative is to directly add a dependency from textsearch onto
      linux/skbuff.h, but I think the intent is textsearch to have no such
      dependencies on its callers.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=211911
      Fixes: 97550f6f ("net: compound page support in skb_seq_read")
      Reported-by: NKris Karas <bugs-a17@moonlit-rail.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b228c9b0
    • Y
      inetpeer: use div64_ul() and clamp_val() calculate inet_peer_threshold · 8bd2a055
      Yejune Deng 提交于
      In inet_initpeers(), struct inet_peer on IA32 uses 128 bytes in nowdays.
      Get rid of the cascade and use div64_ul() and clamp_val() calculate that
      will not need to be adjusted in the future as suggested by Eric Dumazet.
      Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NYejune Deng <yejune.deng@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8bd2a055
    • P
      net/qrtr: fix __netdev_alloc_skb call · 093b036a
      Pavel Skripkin 提交于
      syzbot found WARNING in __alloc_pages_nodemask()[1] when order >= MAX_ORDER.
      It was caused by a huge length value passed from userspace to qrtr_tun_write_iter(),
      which tries to allocate skb. Since the value comes from the untrusted source
      there is no need to raise a warning in __alloc_pages_nodemask().
      
      [1] WARNING in __alloc_pages_nodemask+0x5f8/0x730 mm/page_alloc.c:5014
      Call Trace:
       __alloc_pages include/linux/gfp.h:511 [inline]
       __alloc_pages_node include/linux/gfp.h:524 [inline]
       alloc_pages_node include/linux/gfp.h:538 [inline]
       kmalloc_large_node+0x60/0x110 mm/slub.c:3999
       __kmalloc_node_track_caller+0x319/0x3f0 mm/slub.c:4496
       __kmalloc_reserve net/core/skbuff.c:150 [inline]
       __alloc_skb+0x4e4/0x5a0 net/core/skbuff.c:210
       __netdev_alloc_skb+0x70/0x400 net/core/skbuff.c:446
       netdev_alloc_skb include/linux/skbuff.h:2832 [inline]
       qrtr_endpoint_post+0x84/0x11b0 net/qrtr/qrtr.c:442
       qrtr_tun_write_iter+0x11f/0x1a0 net/qrtr/tun.c:98
       call_write_iter include/linux/fs.h:1901 [inline]
       new_sync_write+0x426/0x650 fs/read_write.c:518
       vfs_write+0x791/0xa30 fs/read_write.c:605
       ksys_write+0x12d/0x250 fs/read_write.c:658
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported-by: syzbot+80dccaee7c6630fa9dcf@syzkaller.appspotmail.com
      Signed-off-by: NPavel Skripkin <paskripkin@gmail.com>
      Acked-by: NAlexander Lobakin <alobakin@pm.me>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      093b036a
    • J
      net: always use icmp{,v6}_ndo_send from ndo_start_xmit · 4372339e
      Jason A. Donenfeld 提交于
      There were a few remaining tunnel drivers that didn't receive the prior
      conversion to icmp{,v6}_ndo_send. Knowing now that this could lead to
      memory corrution (see ee576c47 ("net: icmp: pass zeroed opts from
      icmp{,v6}_ndo_send before sending") for details), there's even more
      imperative to have these all converted. So this commit goes through the
      remaining cases that I could find and does a boring translation to the
      ndo variety.
      
      The Fixes: line below is the merge that originally added icmp{,v6}_
      ndo_send and converted the first batch of icmp{,v6}_send users. The
      rationale then for the change applies equally to this patch. It's just
      that these drivers were left out of the initial conversion because these
      network devices are hiding in net/ rather than in drivers/net/.
      
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: David Ahern <dsahern@kernel.org>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Fixes: 803381f9 ("Merge branch 'icmp-account-for-NAT-when-sending-icmps-from-ndo-layer'")
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4372339e
    • D
      net: dsa: tag_rtl4_a: fix egress tags · 9eb8bc59
      DENG Qingfang 提交于
      Commit 86dd9868 has several issues, but was accepted too soon
      before anyone could take a look.
      
      - Double free. dsa_slave_xmit() will free the skb if the xmit function
        returns NULL, but the skb is already freed by eth_skb_pad(). Use
        __skb_put_padto() to avoid that.
      - Unnecessary allocation. It has been done by DSA core since commit
        a3b0b647.
      - A u16 pointer points to skb data. It should be __be16 for network
        byte order.
      - Typo in comments. "numer" -> "number".
      
      Fixes: 86dd9868 ("net: dsa: tag_rtl4_a: Support also egress tags")
      Signed-off-by: NDENG Qingfang <dqfext@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9eb8bc59
  4. 01 3月, 2021 2 次提交
    • D
      net: Fix gro aggregation for udp encaps with zero csum · 89e5c58f
      Daniel Borkmann 提交于
      We noticed a GRO issue for UDP-based encaps such as vxlan/geneve when the
      csum for the UDP header itself is 0. In that case, GRO aggregation does
      not take place on the phys dev, but instead is deferred to the vxlan/geneve
      driver (see trace below).
      
      The reason is essentially that GRO aggregation bails out in udp_gro_receive()
      for such case when drivers marked the skb with CHECKSUM_UNNECESSARY (ice, i40e,
      others) where for non-zero csums 2abb7cdc ("udp: Add support for doing
      checksum unnecessary conversion") promotes those skbs to CHECKSUM_COMPLETE
      and napi context has csum_valid set. This is however not the case for zero
      UDP csum (here: csum_cnt is still 0 and csum_valid continues to be false).
      
      At the same time 57c67ff4 ("udp: additional GRO support") added matches
      on !uh->check ^ !uh2->check as part to determine candidates for aggregation,
      so it certainly is expected to handle zero csums in udp_gro_receive(). The
      purpose of the check added via 662880f4 ("net: Allow GRO to use and set
      levels of checksum unnecessary") seems to catch bad csum and stop aggregation
      right away.
      
      One way to fix aggregation in the zero case is to only perform the !csum_valid
      check in udp_gro_receive() if uh->check is infact non-zero.
      
      Before:
      
        [...]
        swapper     0 [008]   731.946506: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100400 len=1500   (1)
        swapper     0 [008]   731.946507: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100200 len=1500
        swapper     0 [008]   731.946507: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101100 len=1500
        swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101700 len=1500
        swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101b00 len=1500
        swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100600 len=1500
        swapper     0 [008]   731.946508: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100f00 len=1500
        swapper     0 [008]   731.946509: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100a00 len=1500
        swapper     0 [008]   731.946516: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100500 len=1500
        swapper     0 [008]   731.946516: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100700 len=1500
        swapper     0 [008]   731.946516: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101d00 len=1500   (2)
        swapper     0 [008]   731.946517: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101000 len=1500
        swapper     0 [008]   731.946517: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101c00 len=1500
        swapper     0 [008]   731.946517: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101400 len=1500
        swapper     0 [008]   731.946518: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100e00 len=1500
        swapper     0 [008]   731.946518: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497101600 len=1500
        swapper     0 [008]   731.946521: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff966497100800 len=774
        swapper     0 [008]   731.946530: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff966497100400 len=14032 (1)
        swapper     0 [008]   731.946530: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff966497101d00 len=9112  (2)
        [...]
      
        # netperf -H 10.55.10.4 -t TCP_STREAM -l 20
        MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.55.10.4 () port 0 AF_INET : demo
        Recv   Send    Send
        Socket Socket  Message  Elapsed
        Size   Size    Size     Time     Throughput
        bytes  bytes   bytes    secs.    10^6bits/sec
      
         87380  16384  16384    20.01    13129.24
      
      After:
      
        [...]
        swapper     0 [026]   521.862641: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff93ab0d479000 len=11286 (1)
        swapper     0 [026]   521.862643: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d479000 len=11236 (1)
        swapper     0 [026]   521.862650: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff93ab0d478500 len=2898  (2)
        swapper     0 [026]   521.862650: net:netif_receive_skb: dev=enp10s0f0  skbaddr=0xffff93ab0d479f00 len=8490  (3)
        swapper     0 [026]   521.862653: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d478500 len=2848  (2)
        swapper     0 [026]   521.862653: net:netif_receive_skb: dev=test_vxlan skbaddr=0xffff93ab0d479f00 len=8440  (3)
        [...]
      
        # netperf -H 10.55.10.4 -t TCP_STREAM -l 20
        MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.55.10.4 () port 0 AF_INET : demo
        Recv   Send    Send
        Socket Socket  Message  Elapsed
        Size   Size    Size     Time     Throughput
        bytes  bytes   bytes    secs.    10^6bits/sec
      
         87380  16384  16384    20.01    24576.53
      
      Fixes: 57c67ff4 ("udp: additional GRO support")
      Fixes: 662880f4 ("net: Allow GRO to use and set levels of checksum unnecessary")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
      Cc: Tom Herbert <tom@herbertland.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/r/20210226212248.8300-1-daniel@iogearbox.netSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      89e5c58f
    • Y
      ethtool: fix the check logic of at least one channel for RX/TX · a4fc088a
      Yinjun Zhang 提交于
      The command "ethtool -L <intf> combined 0" may clean the RX/TX channel
      count and skip the error path, since the attrs
      tb[ETHTOOL_A_CHANNELS_RX_COUNT] and tb[ETHTOOL_A_CHANNELS_TX_COUNT]
      are NULL in this case when recent ethtool is used.
      
      Tested using ethtool v5.10.
      
      Fixes: 7be92514 ("ethtool: check if there is at least one channel for TX/RX in the core")
      Signed-off-by: NYinjun Zhang <yinjun.zhang@corigine.com>
      Signed-off-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NLouis Peens <louis.peens@netronome.com>
      Link: https://lore.kernel.org/r/20210225125102.23989-1-simon.horman@netronome.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      a4fc088a
  5. 27 2月, 2021 2 次提交
  6. 26 2月, 2021 2 次提交
  7. 25 2月, 2021 1 次提交
    • O
      net: introduce CAN specific pointer in the struct net_device · 4e096a18
      Oleksij Rempel 提交于
      Since 20dd3850 ("can: Speed up CAN frame receiption by using
      ml_priv") the CAN framework uses per device specific data in the AF_CAN
      protocol. For this purpose the struct net_device->ml_priv is used. Later
      the ml_priv usage in CAN was extended for other users, one of them being
      CAN_J1939.
      
      Later in the kernel ml_priv was converted to an union, used by other
      drivers. E.g. the tun driver started storing it's stats pointer.
      
      Since tun devices can claim to be a CAN device, CAN specific protocols
      will wrongly interpret this pointer, which will cause system crashes.
      Mostly this issue is visible in the CAN_J1939 stack.
      
      To fix this issue, we request a dedicated CAN pointer within the
      net_device struct.
      
      Reported-by: syzbot+5138c4dd15a0401bec7b@syzkaller.appspotmail.com
      Fixes: 20dd3850 ("can: Speed up CAN frame receiption by using ml_priv")
      Fixes: ffd956ee ("can: introduce CAN midlayer private and allocate it automatically")
      Fixes: 9d71dd0c ("can: add support of SAE J1939 protocol")
      Fixes: 497a5757 ("tun: switch to net core provided statistics counters")
      Signed-off-by: NOleksij Rempel <o.rempel@pengutronix.de>
      Link: https://lore.kernel.org/r/20210223070127.4538-1-o.rempel@pengutronix.deSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      4e096a18
  8. 24 2月, 2021 3 次提交
    • T
      net: qrtr: Fix memory leak in qrtr_tun_open · fc0494ea
      Takeshi Misawa 提交于
      If qrtr_endpoint_register() failed, tun is leaked.
      Fix this, by freeing tun in error path.
      
      syzbot report:
      BUG: memory leak
      unreferenced object 0xffff88811848d680 (size 64):
        comm "syz-executor684", pid 10171, jiffies 4294951561 (age 26.070s)
        hex dump (first 32 bytes):
          80 dd 0a 84 ff ff ff ff 00 00 00 00 00 00 00 00  ................
          90 d6 48 18 81 88 ff ff 90 d6 48 18 81 88 ff ff  ..H.......H.....
        backtrace:
          [<0000000018992a50>] kmalloc include/linux/slab.h:552 [inline]
          [<0000000018992a50>] kzalloc include/linux/slab.h:682 [inline]
          [<0000000018992a50>] qrtr_tun_open+0x22/0x90 net/qrtr/tun.c:35
          [<0000000003a453ef>] misc_open+0x19c/0x1e0 drivers/char/misc.c:141
          [<00000000dec38ac8>] chrdev_open+0x10d/0x340 fs/char_dev.c:414
          [<0000000079094996>] do_dentry_open+0x1e6/0x620 fs/open.c:817
          [<000000004096d290>] do_open fs/namei.c:3252 [inline]
          [<000000004096d290>] path_openat+0x74a/0x1b00 fs/namei.c:3369
          [<00000000b8e64241>] do_filp_open+0xa0/0x190 fs/namei.c:3396
          [<00000000a3299422>] do_sys_openat2+0xed/0x230 fs/open.c:1172
          [<000000002c1bdcef>] do_sys_open fs/open.c:1188 [inline]
          [<000000002c1bdcef>] __do_sys_openat fs/open.c:1204 [inline]
          [<000000002c1bdcef>] __se_sys_openat fs/open.c:1199 [inline]
          [<000000002c1bdcef>] __x64_sys_openat+0x7f/0xe0 fs/open.c:1199
          [<00000000f3a5728f>] do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
          [<000000004b38b7ec>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 28fb4e59 ("net: qrtr: Expose tunneling endpoint to user space")
      Reported-by: syzbot+5d6e4af21385f5cfc56a@syzkaller.appspotmail.com
      Signed-off-by: NTakeshi Misawa <jeliantsurux@gmail.com>
      Link: https://lore.kernel.org/r/20210221234427.GA2140@DESKTOPSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      fc0494ea
    • W
      net/sched: cls_flower: validate ct_state for invalid and reply flags · 3aed8b63
      wenxu 提交于
      Add invalid and reply flags validate in the fl_validate_ct_state.
      This makes the checking complete if compared to ovs'
      validate_ct_state().
      Signed-off-by: Nwenxu <wenxu@ucloud.cn>
      Reviewed-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Link: https://lore.kernel.org/r/1614064315-364-1-git-send-email-wenxu@ucloud.cnSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      3aed8b63
    • J
      net: icmp: pass zeroed opts from icmp{,v6}_ndo_send before sending · ee576c47
      Jason A. Donenfeld 提交于
      The icmp{,v6}_send functions make all sorts of use of skb->cb, casting
      it with IPCB or IP6CB, assuming the skb to have come directly from the
      inet layer. But when the packet comes from the ndo layer, especially
      when forwarded, there's no telling what might be in skb->cb at that
      point. As a result, the icmp sending code risks reading bogus memory
      contents, which can result in nasty stack overflows such as this one
      reported by a user:
      
          panic+0x108/0x2ea
          __stack_chk_fail+0x14/0x20
          __icmp_send+0x5bd/0x5c0
          icmp_ndo_send+0x148/0x160
      
      In icmp_send, skb->cb is cast with IPCB and an ip_options struct is read
      from it. The optlen parameter there is of particular note, as it can
      induce writes beyond bounds. There are quite a few ways that can happen
      in __ip_options_echo. For example:
      
          // sptr/skb are attacker-controlled skb bytes
          sptr = skb_network_header(skb);
          // dptr/dopt points to stack memory allocated by __icmp_send
          dptr = dopt->__data;
          // sopt is the corrupt skb->cb in question
          if (sopt->rr) {
              optlen  = sptr[sopt->rr+1]; // corrupt skb->cb + skb->data
              soffset = sptr[sopt->rr+2]; // corrupt skb->cb + skb->data
      	// this now writes potentially attacker-controlled data, over
      	// flowing the stack:
              memcpy(dptr, sptr+sopt->rr, optlen);
          }
      
      In the icmpv6_send case, the story is similar, but not as dire, as only
      IP6CB(skb)->iif and IP6CB(skb)->dsthao are used. The dsthao case is
      worse than the iif case, but it is passed to ipv6_find_tlv, which does
      a bit of bounds checking on the value.
      
      This is easy to simulate by doing a `memset(skb->cb, 0x41,
      sizeof(skb->cb));` before calling icmp{,v6}_ndo_send, and it's only by
      good fortune and the rarity of icmp sending from that context that we've
      avoided reports like this until now. For example, in KASAN:
      
          BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0xa0e/0x12b0
          Write of size 38 at addr ffff888006f1f80e by task ping/89
          CPU: 2 PID: 89 Comm: ping Not tainted 5.10.0-rc7-debug+ #5
          Call Trace:
           dump_stack+0x9a/0xcc
           print_address_description.constprop.0+0x1a/0x160
           __kasan_report.cold+0x20/0x38
           kasan_report+0x32/0x40
           check_memory_region+0x145/0x1a0
           memcpy+0x39/0x60
           __ip_options_echo+0xa0e/0x12b0
           __icmp_send+0x744/0x1700
      
      Actually, out of the 4 drivers that do this, only gtp zeroed the cb for
      the v4 case, while the rest did not. So this commit actually removes the
      gtp-specific zeroing, while putting the code where it belongs in the
      shared infrastructure of icmp{,v6}_ndo_send.
      
      This commit fixes the issue by passing an empty IPCB or IP6CB along to
      the functions that actually do the work. For the icmp_send, this was
      already trivial, thanks to __icmp_send providing the plumbing function.
      For icmpv6_send, this required a tiny bit of refactoring to make it
      behave like the v4 case, after which it was straight forward.
      
      Fixes: a2b78e9b ("sunvnet: generate ICMP PTMUD messages for smaller port MTUs")
      Reported-by: NSinYu <liuxyon@gmail.com>
      Reviewed-by: NWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/netdev/CAF=yD-LOF116aHub6RMe8vB8ZpnrrnoTdqhobEx+bvoA8AsP0w@mail.gmail.com/T/Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Link: https://lore.kernel.org/r/20210223131858.72082-1-Jason@zx2c4.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      ee576c47
  9. 23 2月, 2021 5 次提交
  10. 17 2月, 2021 8 次提交
    • L
      net: dsa: tag_rtl4_a: Support also egress tags · 86dd9868
      Linus Walleij 提交于
      Support also transmitting frames using the custom "8899 A"
      4 byte tag.
      
      Qingfang came up with the solution: we need to pad the
      ethernet frame to 60 bytes using eth_skb_pad(), then the
      switch will happily accept frames with custom tags.
      
      Cc: Mauri Sandberg <sandberg@mailfence.com>
      Reported-by: NDENG Qingfang <dqfext@gmail.com>
      Fixes: efd7fe68 ("net: dsa: tag_rtl4_a: Implement Realtek 4 byte A tag")
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86dd9868
    • V
      net: sched: fix police ext initialization · 396d7f23
      Vlad Buslov 提交于
      When police action is created by cls API tcf_exts_validate() first
      conditional that calls tcf_action_init_1() directly, the action idr is not
      updated according to latest changes in action API that require caller to
      commit newly created action to idr with tcf_idr_insert_many(). This results
      such action not being accessible through act API and causes crash reported
      by syzbot:
      
      ==================================================================
      BUG: KASAN: null-ptr-deref in instrument_atomic_read include/linux/instrumented.h:71 [inline]
      BUG: KASAN: null-ptr-deref in atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
      BUG: KASAN: null-ptr-deref in __tcf_idr_release net/sched/act_api.c:178 [inline]
      BUG: KASAN: null-ptr-deref in tcf_idrinfo_destroy+0x129/0x1d0 net/sched/act_api.c:598
      Read of size 4 at addr 0000000000000010 by task kworker/u4:5/204
      
      CPU: 0 PID: 204 Comm: kworker/u4:5 Not tainted 5.11.0-rc7-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: netns cleanup_net
      Call Trace:
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x107/0x163 lib/dump_stack.c:120
       __kasan_report mm/kasan/report.c:400 [inline]
       kasan_report.cold+0x5f/0xd5 mm/kasan/report.c:413
       check_memory_region_inline mm/kasan/generic.c:179 [inline]
       check_memory_region+0x13d/0x180 mm/kasan/generic.c:185
       instrument_atomic_read include/linux/instrumented.h:71 [inline]
       atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
       __tcf_idr_release net/sched/act_api.c:178 [inline]
       tcf_idrinfo_destroy+0x129/0x1d0 net/sched/act_api.c:598
       tc_action_net_exit include/net/act_api.h:151 [inline]
       police_exit_net+0x168/0x360 net/sched/act_police.c:390
       ops_exit_list+0x10d/0x160 net/core/net_namespace.c:190
       cleanup_net+0x4ea/0xb10 net/core/net_namespace.c:604
       process_one_work+0x98d/0x15f0 kernel/workqueue.c:2275
       worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
       kthread+0x3b1/0x4a0 kernel/kthread.c:292
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
      ==================================================================
      Kernel panic - not syncing: panic_on_warn set ...
      CPU: 0 PID: 204 Comm: kworker/u4:5 Tainted: G    B             5.11.0-rc7-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: netns cleanup_net
      Call Trace:
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x107/0x163 lib/dump_stack.c:120
       panic+0x306/0x73d kernel/panic.c:231
       end_report+0x58/0x5e mm/kasan/report.c:100
       __kasan_report mm/kasan/report.c:403 [inline]
       kasan_report.cold+0x67/0xd5 mm/kasan/report.c:413
       check_memory_region_inline mm/kasan/generic.c:179 [inline]
       check_memory_region+0x13d/0x180 mm/kasan/generic.c:185
       instrument_atomic_read include/linux/instrumented.h:71 [inline]
       atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
       __tcf_idr_release net/sched/act_api.c:178 [inline]
       tcf_idrinfo_destroy+0x129/0x1d0 net/sched/act_api.c:598
       tc_action_net_exit include/net/act_api.h:151 [inline]
       police_exit_net+0x168/0x360 net/sched/act_police.c:390
       ops_exit_list+0x10d/0x160 net/core/net_namespace.c:190
       cleanup_net+0x4ea/0xb10 net/core/net_namespace.c:604
       process_one_work+0x98d/0x15f0 kernel/workqueue.c:2275
       worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
       kthread+0x3b1/0x4a0 kernel/kthread.c:292
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
      Kernel Offset: disabled
      
      Fix the issue by calling tcf_idr_insert_many() after successful action
      initialization.
      
      Fixes: 0fedc63f ("net_sched: commit action insertions together")
      Reported-by: syzbot+151e3e714d34ae4ce7e8@syzkaller.appspotmail.com
      Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      396d7f23
    • H
      net: dsa: felix: Add support for MRP · a026c50b
      Horatiu Vultur 提交于
      Implement functions 'port_mrp_add', 'port_mrp_del',
      'port_mrp_add_ring_role' and 'port_mrp_del_ring_role' to call the mrp
      functions from ocelot.
      
      Also all MRP frames that arrive to CPU on queue number OCELOT_MRP_CPUQ
      will be forward by the SW.
      Signed-off-by: NHoratiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a026c50b
    • H
      net: dsa: add MRP support · c595c433
      Horatiu Vultur 提交于
      Add support for offloading MRP in HW. Currently implement the switchdev
      calls 'SWITCHDEV_OBJ_ID_MRP', 'SWITCHDEV_OBJ_ID_RING_ROLE_MRP',
      to allow to create MRP instances and to set the role of these instances.
      
      Add DSA_NOTIFIER_MRP_ADD/DEL and DSA_NOTIFIER_MRP_ADD/DEL_RING_ROLE
      which calls to .port_mrp_add/del and .port_mrp_add/del_ring_role in the
      DSA driver for the switch.
      Signed-off-by: NHoratiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c595c433
    • H
      bridge: mrp: Update br_mrp to use new return values of br_mrp_switchdev · cd605d45
      Horatiu Vultur 提交于
      Check the return values of the br_mrp_switchdev function.
      In case of:
      - BR_MRP_NONE, return the error to userspace,
      - BR_MRP_SW, continue with SW implementation,
      - BR_MRP_HW, continue without SW implementation,
      Signed-off-by: NHoratiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd605d45
    • H
      bridge: mrp: Extend br_mrp_switchdev to detect better the errors · 1a3ddb0b
      Horatiu Vultur 提交于
      This patch extends the br_mrp_switchdev functions to be able to have a
      better understanding what cause the issue and if the SW needs to be used
      as a backup.
      
      There are the following cases:
      - when the code is compiled without CONFIG_NET_SWITCHDEV. In this case
        return success so the SW can continue with the protocol. Depending
        on the function, it returns 0 or BR_MRP_SW.
      - when code is compiled with CONFIG_NET_SWITCHDEV and the driver doesn't
        implement any MRP callbacks. In this case the HW can't run MRP so it
        just returns -EOPNOTSUPP. So the SW will stop further to configure the
        node.
      - when code is compiled with CONFIG_NET_SWITCHDEV and the driver fully
        supports any MRP functionality. In this case the SW doesn't need to do
        anything. The functions will return 0 or BR_MRP_HW.
      - when code is compiled with CONFIG_NET_SWITCHDEV and the HW can't run
        completely the protocol but it can help the SW to run it. For
        example, the HW can't support completely MRM role(can't detect when it
        stops receiving MRP Test frames) but it can redirect these frames to
        CPU. In this case it is possible to have a SW fallback. The SW will
        try initially to call the driver with sw_backup set to false, meaning
        that the HW should implement completely the role. If the driver returns
        -EOPNOTSUPP, the SW will try again with sw_backup set to false,
        meaning that the SW will detect when it stops receiving the frames but
        it needs HW support to redirect the frames to CPU. In case the driver
        returns 0 then the SW will continue to configure the node accordingly.
      Signed-off-by: NHoratiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a3ddb0b
    • H
      bridge: mrp: Add 'enum br_mrp_hw_support' · e1bd99d0
      Horatiu Vultur 提交于
      Add the enum br_mrp_hw_support that is used by the br_mrp_switchdev
      functions to allow the SW to detect the cases where HW can't implement
      the functionality or when SW is used as a backup.
      Signed-off-by: NHoratiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e1bd99d0
    • C
      SUNRPC: Further clean up svc_tcp_sendmsg() · 4d12b727
      Chuck Lever 提交于
      Clean up: The msghdr is no longer needed in the caller.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      4d12b727