1. 30 11月, 2022 1 次提交
  2. 12 10月, 2022 1 次提交
    • J
      treewide: use prandom_u32_max() when possible, part 1 · 81895a65
      Jason A. Donenfeld 提交于
      Rather than incurring a division or requesting too many random bytes for
      the given range, use the prandom_u32_max() function, which only takes
      the minimum required bytes from the RNG and avoids divisions. This was
      done mechanically with this coccinelle script:
      
      @basic@
      expression E;
      type T;
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      typedef u64;
      @@
      (
      - ((T)get_random_u32() % (E))
      + prandom_u32_max(E)
      |
      - ((T)get_random_u32() & ((E) - 1))
      + prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2)
      |
      - ((u64)(E) * get_random_u32() >> 32)
      + prandom_u32_max(E)
      |
      - ((T)get_random_u32() & ~PAGE_MASK)
      + prandom_u32_max(PAGE_SIZE)
      )
      
      @multi_line@
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      identifier RAND;
      expression E;
      @@
      
      -       RAND = get_random_u32();
              ... when != RAND
      -       RAND %= (E);
      +       RAND = prandom_u32_max(E);
      
      // Find a potential literal
      @literal_mask@
      expression LITERAL;
      type T;
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      position p;
      @@
      
              ((T)get_random_u32()@p & (LITERAL))
      
      // Add one to the literal.
      @script:python add_one@
      literal << literal_mask.LITERAL;
      RESULT;
      @@
      
      value = None
      if literal.startswith('0x'):
              value = int(literal, 16)
      elif literal[0] in '123456789':
              value = int(literal, 10)
      if value is None:
              print("I don't know how to handle %s" % (literal))
              cocci.include_match(False)
      elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1:
              print("Skipping 0x%x for cleanup elsewhere" % (value))
              cocci.include_match(False)
      elif value & (value + 1) != 0:
              print("Skipping 0x%x because it's not a power of two minus one" % (value))
              cocci.include_match(False)
      elif literal.startswith('0x'):
              coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1))
      else:
              coccinelle.RESULT = cocci.make_expr("%d" % (value + 1))
      
      // Replace the literal mask with the calculated result.
      @plus_one@
      expression literal_mask.LITERAL;
      position literal_mask.p;
      expression add_one.RESULT;
      identifier FUNC;
      @@
      
      -       (FUNC()@p & (LITERAL))
      +       prandom_u32_max(RESULT)
      
      @collapse_ret@
      type T;
      identifier VAR;
      expression E;
      @@
      
       {
      -       T VAR;
      -       VAR = (E);
      -       return VAR;
      +       return E;
       }
      
      @drop_var@
      type T;
      identifier VAR;
      @@
      
       {
      -       T VAR;
              ... when != VAR
       }
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NYury Norov <yury.norov@gmail.com>
      Reviewed-by: NKP Singh <kpsingh@kernel.org>
      Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap
      Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd
      Acked-by: NJakub Kicinski <kuba@kernel.org>
      Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
      Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
      Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      81895a65
  3. 21 9月, 2022 1 次提交
    • Z
      net/af_packet: registration process optimization in packet_init() · 63b7c2eb
      Ziyang Xuan 提交于
      Now, register_pernet_subsys() and register_netdevice_notifier() are both
      after sock_register(). It can create PF_PACKET socket and process socket
      once sock_register() successfully. It is possible PF_PACKET socket is
      creating but register_pernet_subsys() and register_netdevice_notifier()
      are not registered yet. Thus net->packet.sklist_lock and net->packet.sklist
      will be accessed without initialization that is done in packet_net_init().
      Although this is a low probability scenario.
      
      Move register_pernet_subsys() and register_netdevice_notifier() to the
      front in packet_init(). Correspondingly, adjust the unregister process
      in packet_exit().
      Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      63b7c2eb
  4. 23 8月, 2022 1 次提交
  5. 29 7月, 2022 1 次提交
  6. 10 6月, 2022 1 次提交
  7. 03 6月, 2022 1 次提交
    • E
      net/af_packet: make sure to pull mac header · e9d3f809
      Eric Dumazet 提交于
      GSO assumes skb->head contains link layer headers.
      
      tun device in some case can provide base 14 bytes,
      regardless of VLAN being used or not.
      
      After blamed commit, we can end up setting a network
      header offset of 18+, we better pull the missing
      bytes to avoid a posible crash in GSO.
      
      syzbot report was:
      kernel BUG at include/linux/skbuff.h:2699!
      invalid opcode: 0000 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 3601 Comm: syz-executor210 Not tainted 5.18.0-syzkaller-11338-g2c5ca23f #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:__skb_pull include/linux/skbuff.h:2699 [inline]
      RIP: 0010:skb_mac_gso_segment+0x48f/0x530 net/core/gro.c:136
      Code: 00 48 c7 c7 00 96 d4 8a c6 05 cb d3 45 06 01 e8 26 bb d0 01 e9 2f fd ff ff 49 c7 c4 ea ff ff ff e9 f1 fe ff ff e8 91 84 19 fa <0f> 0b 48 89 df e8 97 44 66 fa e9 7f fd ff ff e8 ad 44 66 fa e9 48
      RSP: 0018:ffffc90002e2f4b8 EFLAGS: 00010293
      RAX: 0000000000000000 RBX: 0000000000000012 RCX: 0000000000000000
      RDX: ffff88805bb58000 RSI: ffffffff8760ed0f RDI: 0000000000000004
      RBP: 0000000000005dbc R08: 0000000000000004 R09: 0000000000000fe0
      R10: 0000000000000fe4 R11: 0000000000000000 R12: 0000000000000fe0
      R13: ffff88807194d780 R14: 1ffff920005c5e9b R15: 0000000000000012
      FS:  000055555730f300(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000200015c0 CR3: 0000000071ff8000 CR4: 0000000000350ee0
      Call Trace:
       <TASK>
       __skb_gso_segment+0x327/0x6e0 net/core/dev.c:3411
       skb_gso_segment include/linux/netdevice.h:4749 [inline]
       validate_xmit_skb+0x6bc/0xf10 net/core/dev.c:3669
       validate_xmit_skb_list+0xbc/0x120 net/core/dev.c:3719
       sch_direct_xmit+0x3d1/0xbe0 net/sched/sch_generic.c:327
       __dev_xmit_skb net/core/dev.c:3815 [inline]
       __dev_queue_xmit+0x14a1/0x3a00 net/core/dev.c:4219
       packet_snd net/packet/af_packet.c:3071 [inline]
       packet_sendmsg+0x21cb/0x5550 net/packet/af_packet.c:3102
       sock_sendmsg_nosec net/socket.c:714 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:734
       ____sys_sendmsg+0x6eb/0x810 net/socket.c:2492
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2546
       __sys_sendmsg net/socket.c:2575 [inline]
       __do_sys_sendmsg net/socket.c:2584 [inline]
       __se_sys_sendmsg net/socket.c:2582 [inline]
       __x64_sys_sendmsg+0x132/0x220 net/socket.c:2582
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x7f4b95da06c9
      Code: 28 c3 e8 4a 15 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007ffd7defc4c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007ffd7defc4f0 RCX: 00007f4b95da06c9
      RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003
      RBP: 0000000000000003 R08: bb1414ac00000050 R09: bb1414ac00000050
      R10: 0000000000000004 R11: 0000000000000246 R12: 0000000000000000
      R13: 00007ffd7defc4e0 R14: 00007ffd7defc4d8 R15: 00007ffd7defc4d4
       </TASK>
      
      Fixes: dfed913e ("net/af_packet: add VLAN support for AF_PACKET SOCK_RAW GSO")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Acked-by: NHangbin Liu <liuhangbin@gmail.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      e9d3f809
  8. 29 4月, 2022 1 次提交
  9. 26 4月, 2022 1 次提交
    • H
      net/af_packet: add VLAN support for AF_PACKET SOCK_RAW GSO · dfed913e
      Hangbin Liu 提交于
      Currently, the kernel drops GSO VLAN tagged packet if it's created with
      socket(AF_PACKET, SOCK_RAW, 0) plus virtio_net_hdr.
      
      The reason is AF_PACKET doesn't adjust the skb network header if there is
      a VLAN tag. Then after virtio_net_hdr_set_proto() called, the skb->protocol
      will be set to ETH_P_IP/IPv6. And in later inet/ipv6_gso_segment() the skb
      is dropped as network header position is invalid.
      
      Let's handle VLAN packets by adjusting network header position in
      packet_parse_headers(). The adjustment is safe and does not affect the
      later xmit as tap device also did that.
      
      In packet_snd(), packet_parse_headers() need to be moved before calling
      virtio_net_hdr_set_proto(), so we can set correct skb->protocol and
      network header first.
      
      There is no need to update tpacket_snd() as it calls packet_parse_headers()
      in tpacket_fill_skb(), which is already before calling virtio_net_hdr_*
      functions.
      
      skb->no_fcs setting is also moved upper to make all skb settings together
      and keep consistency with function packet_sendmsg_spkt().
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Link: https://lore.kernel.org/r/20220425014502.985464-1-liuhangbin@gmail.comSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
      dfed913e
  10. 15 4月, 2022 1 次提交
  11. 06 4月, 2022 1 次提交
    • O
      net: remove noblock parameter from skb_recv_datagram() · f4b41f06
      Oliver Hartkopp 提交于
      skb_recv_datagram() has two parameters 'flags' and 'noblock' that are
      merged inside skb_recv_datagram() by 'flags | (noblock ? MSG_DONTWAIT : 0)'
      
      As 'flags' may contain MSG_DONTWAIT as value most callers split the 'flags'
      into 'flags' and 'noblock' with finally obsolete bit operations like this:
      
      skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, flags & MSG_DONTWAIT, &rc);
      
      And this is not even done consistently with the 'flags' parameter.
      
      This patch removes the obsolete and costly splitting into two parameters
      and only performs bit operations when really needed on the caller side.
      
      One missing conversion thankfully reported by kernel test robot. I missed
      to enable kunit tests to build the mctp code.
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4b41f06
  12. 15 3月, 2022 1 次提交
    • E
      net/packet: fix slab-out-of-bounds access in packet_recvmsg() · c700525f
      Eric Dumazet 提交于
      syzbot found that when an AF_PACKET socket is using PACKET_COPY_THRESH
      and mmap operations, tpacket_rcv() is queueing skbs with
      garbage in skb->cb[], triggering a too big copy [1]
      
      Presumably, users of af_packet using mmap() already gets correct
      metadata from the mapped buffer, we can simply make sure
      to clear 12 bytes that might be copied to user space later.
      
      BUG: KASAN: stack-out-of-bounds in memcpy include/linux/fortify-string.h:225 [inline]
      BUG: KASAN: stack-out-of-bounds in packet_recvmsg+0x56c/0x1150 net/packet/af_packet.c:3489
      Write of size 165 at addr ffffc9000385fb78 by task syz-executor233/3631
      
      CPU: 0 PID: 3631 Comm: syz-executor233 Not tainted 5.17.0-rc7-syzkaller-02396-g0b366069 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       print_address_description.constprop.0.cold+0xf/0x336 mm/kasan/report.c:255
       __kasan_report mm/kasan/report.c:442 [inline]
       kasan_report.cold+0x83/0xdf mm/kasan/report.c:459
       check_region_inline mm/kasan/generic.c:183 [inline]
       kasan_check_range+0x13d/0x180 mm/kasan/generic.c:189
       memcpy+0x39/0x60 mm/kasan/shadow.c:66
       memcpy include/linux/fortify-string.h:225 [inline]
       packet_recvmsg+0x56c/0x1150 net/packet/af_packet.c:3489
       sock_recvmsg_nosec net/socket.c:948 [inline]
       sock_recvmsg net/socket.c:966 [inline]
       sock_recvmsg net/socket.c:962 [inline]
       ____sys_recvmsg+0x2c4/0x600 net/socket.c:2632
       ___sys_recvmsg+0x127/0x200 net/socket.c:2674
       __sys_recvmsg+0xe2/0x1a0 net/socket.c:2704
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7fdfd5954c29
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007ffcf8e71e48 EFLAGS: 00000246 ORIG_RAX: 000000000000002f
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fdfd5954c29
      RDX: 0000000000000000 RSI: 0000000020000500 RDI: 0000000000000005
      RBP: 0000000000000000 R08: 000000000000000d R09: 000000000000000d
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffcf8e71e60
      R13: 00000000000f4240 R14: 000000000000c1ff R15: 00007ffcf8e71e54
       </TASK>
      
      addr ffffc9000385fb78 is located in stack of task syz-executor233/3631 at offset 32 in frame:
       ____sys_recvmsg+0x0/0x600 include/linux/uio.h:246
      
      this frame has 1 object:
       [32, 160) 'addr'
      
      Memory state around the buggy address:
       ffffc9000385fa80: 00 04 f3 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00
       ffffc9000385fb00: 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00
      >ffffc9000385fb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f3
                                                                      ^
       ffffc9000385fc00: f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 f1
       ffffc9000385fc80: f1 f1 f1 00 f2 f2 f2 00 f2 f2 f2 00 00 00 00 00
      ==================================================================
      
      Fixes: 0fb375fb ("[AF_PACKET]: Allow for > 8 byte hardware addresses.")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20220312232958.3535620-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      c700525f
  13. 03 3月, 2022 1 次提交
    • M
      net: Handle delivery_time in skb->tstamp during network tapping with af_packet · 27942a15
      Martin KaFai Lau 提交于
      A latter patch will set the skb->mono_delivery_time to flag the skb->tstamp
      is used as the mono delivery_time (EDT) instead of the (rcv) timestamp.
      skb_clear_tstamp() will then keep this delivery_time during forwarding.
      
      This patch is to make the network tapping (with af_packet) to handle
      the delivery_time stored in skb->tstamp.
      
      Regardless of tapping at the ingress or egress,  the tapped skb is
      received by the af_packet socket, so it is ingress to the af_packet
      socket and it expects the (rcv) timestamp.
      
      When tapping at egress, dev_queue_xmit_nit() is used.  It has already
      expected skb->tstamp may have delivery_time,  so it does
      skb_clone()+net_timestamp_set() to ensure the cloned skb has
      the (rcv) timestamp before passing to the af_packet sk.
      This patch only adds to clear the skb->mono_delivery_time
      bit in net_timestamp_set().
      
      When tapping at ingress, it currently expects the skb->tstamp is either 0
      or the (rcv) timestamp.  Meaning, the tapping at ingress path
      has already expected the skb->tstamp could be 0 and it will get
      the (rcv) timestamp by ktime_get_real() when needed.
      
      There are two cases for tapping at ingress:
      
      One case is af_packet queues the skb to its sk_receive_queue.
      The skb is either not shared or new clone created.  The newly
      added skb_clear_delivery_time() is called to clear the
      delivery_time (if any) and set the (rcv) timestamp if
      needed before the skb is queued to the sk_receive_queue.
      
      Another case, the ingress skb is directly copied to the rx_ring
      and tpacket_get_timestamp() is used to get the (rcv) timestamp.
      The newly added skb_tstamp() is used in tpacket_get_timestamp()
      to check the skb->mono_delivery_time bit before returning skb->tstamp.
      As mentioned earlier, the tapping@ingress has already expected
      the skb may not have the (rcv) timestamp (because no sk has asked
      for it) and has handled this case by directly calling ktime_get_real().
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27942a15
  14. 02 2月, 2022 1 次提交
    • E
      af_packet: fix data-race in packet_setsockopt / packet_setsockopt · e42e70ad
      Eric Dumazet 提交于
      When packet_setsockopt( PACKET_FANOUT_DATA ) reads po->fanout,
      no lock is held, meaning that another thread can change po->fanout.
      
      Given that po->fanout can only be set once during the socket lifetime
      (it is only cleared from fanout_release()), we can use
      READ_ONCE()/WRITE_ONCE() to document the race.
      
      BUG: KCSAN: data-race in packet_setsockopt / packet_setsockopt
      
      write to 0xffff88813ae8e300 of 8 bytes by task 14653 on cpu 0:
       fanout_add net/packet/af_packet.c:1791 [inline]
       packet_setsockopt+0x22fe/0x24a0 net/packet/af_packet.c:3931
       __sys_setsockopt+0x209/0x2a0 net/socket.c:2180
       __do_sys_setsockopt net/socket.c:2191 [inline]
       __se_sys_setsockopt net/socket.c:2188 [inline]
       __x64_sys_setsockopt+0x62/0x70 net/socket.c:2188
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff88813ae8e300 of 8 bytes by task 14654 on cpu 1:
       packet_setsockopt+0x691/0x24a0 net/packet/af_packet.c:3935
       __sys_setsockopt+0x209/0x2a0 net/socket.c:2180
       __do_sys_setsockopt net/socket.c:2191 [inline]
       __se_sys_setsockopt net/socket.c:2188 [inline]
       __x64_sys_setsockopt+0x62/0x70 net/socket.c:2188
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x0000000000000000 -> 0xffff888106f8c000
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 14654 Comm: syz-executor.3 Not tainted 5.16.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 47dceb8e ("packet: add classic BPF fanout mode")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20220201022358.330621-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      e42e70ad
  15. 20 1月, 2022 1 次提交
    • C
      net: fix information leakage in /proc/net/ptype · 47934e06
      Congyu Liu 提交于
      In one net namespace, after creating a packet socket without binding
      it to a device, users in other net namespaces can observe the new
      `packet_type` added by this packet socket by reading `/proc/net/ptype`
      file. This is minor information leakage as packet socket is
      namespace aware.
      
      Add a net pointer in `packet_type` to keep the net namespace of
      of corresponding packet socket. In `ptype_seq_show`, this net pointer
      must be checked when it is not NULL.
      
      Fixes: 2feb27db ("[NETNS]: Minor information leak via /proc/net/ptype file.")
      Signed-off-by: NCongyu Liu <liu3101@purdue.edu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47934e06
  16. 08 1月, 2022 1 次提交
  17. 30 12月, 2021 1 次提交
  18. 16 12月, 2021 1 次提交
  19. 15 12月, 2021 1 次提交
  20. 16 11月, 2021 1 次提交
  21. 15 10月, 2021 1 次提交
  22. 10 9月, 2021 1 次提交
  23. 05 8月, 2021 1 次提交
  24. 30 6月, 2021 1 次提交
  25. 17 6月, 2021 2 次提交
    • E
      net/packet: annotate accesses to po->ifindex · e032f7c9
      Eric Dumazet 提交于
      Like prior patch, we need to annotate lockless accesses to po->ifindex
      For instance, packet_getname() is reading po->ifindex (twice) while
      another thread is able to change po->ifindex.
      
      KCSAN reported:
      
      BUG: KCSAN: data-race in packet_do_bind / packet_getname
      
      write to 0xffff888143ce3cbc of 4 bytes by task 25573 on cpu 1:
       packet_do_bind+0x420/0x7e0 net/packet/af_packet.c:3191
       packet_bind+0xc3/0xd0 net/packet/af_packet.c:3255
       __sys_bind+0x200/0x290 net/socket.c:1637
       __do_sys_bind net/socket.c:1648 [inline]
       __se_sys_bind net/socket.c:1646 [inline]
       __x64_sys_bind+0x3d/0x50 net/socket.c:1646
       do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff888143ce3cbc of 4 bytes by task 25578 on cpu 0:
       packet_getname+0x5b/0x1a0 net/packet/af_packet.c:3525
       __sys_getsockname+0x10e/0x1a0 net/socket.c:1887
       __do_sys_getsockname net/socket.c:1902 [inline]
       __se_sys_getsockname net/socket.c:1899 [inline]
       __x64_sys_getsockname+0x3e/0x50 net/socket.c:1899
       do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x00000000 -> 0x00000001
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 25578 Comm: syz-executor.5 Not tainted 5.13.0-rc6-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e032f7c9
    • E
      net/packet: annotate accesses to po->bind · c7d2ef5d
      Eric Dumazet 提交于
      tpacket_snd(), packet_snd(), packet_getname() and packet_seq_show()
      can read po->num without holding a lock. This means other threads
      can change po->num at the same time.
      
      KCSAN complained about this known fact [1]
      Add READ_ONCE()/WRITE_ONCE() to address the issue.
      
      [1] BUG: KCSAN: data-race in packet_do_bind / packet_sendmsg
      
      write to 0xffff888131a0dcc0 of 2 bytes by task 24714 on cpu 0:
       packet_do_bind+0x3ab/0x7e0 net/packet/af_packet.c:3181
       packet_bind+0xc3/0xd0 net/packet/af_packet.c:3255
       __sys_bind+0x200/0x290 net/socket.c:1637
       __do_sys_bind net/socket.c:1648 [inline]
       __se_sys_bind net/socket.c:1646 [inline]
       __x64_sys_bind+0x3d/0x50 net/socket.c:1646
       do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff888131a0dcc0 of 2 bytes by task 24719 on cpu 1:
       packet_snd net/packet/af_packet.c:2899 [inline]
       packet_sendmsg+0x317/0x3570 net/packet/af_packet.c:3040
       sock_sendmsg_nosec net/socket.c:654 [inline]
       sock_sendmsg net/socket.c:674 [inline]
       ____sys_sendmsg+0x360/0x4d0 net/socket.c:2350
       ___sys_sendmsg net/socket.c:2404 [inline]
       __sys_sendmsg+0x1ed/0x270 net/socket.c:2433
       __do_sys_sendmsg net/socket.c:2442 [inline]
       __se_sys_sendmsg net/socket.c:2440 [inline]
       __x64_sys_sendmsg+0x42/0x50 net/socket.c:2440
       do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x0000 -> 0x1200
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 24719 Comm: syz-executor.5 Not tainted 5.13.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7d2ef5d
  26. 11 6月, 2021 1 次提交
  27. 18 5月, 2021 2 次提交
  28. 13 5月, 2021 1 次提交
  29. 15 4月, 2021 1 次提交
  30. 25 3月, 2021 1 次提交
  31. 07 2月, 2021 1 次提交
  32. 19 12月, 2020 1 次提交
  33. 15 12月, 2020 1 次提交
    • Y
      net: fix proc_fs init handling in af_packet and tls · a268e0f2
      Yonatan Linik 提交于
      proc_fs was used, in af_packet, without a surrounding #ifdef,
      although there is no hard dependency on proc_fs.
      That caused the initialization of the af_packet module to fail
      when CONFIG_PROC_FS=n.
      
      Specifically, proc_create_net() was used in af_packet.c,
      and when it fails, packet_net_init() returns -ENOMEM.
      It will always fail when the kernel is compiled without proc_fs,
      because, proc_create_net() for example always returns NULL.
      
      The calling order that starts in af_packet.c is as follows:
      packet_init()
      register_pernet_subsys()
      register_pernet_operations()
      __register_pernet_operations()
      ops_init()
      ops->init() (packet_net_ops.init=packet_net_init())
      proc_create_net()
      
      It worked in the past because register_pernet_subsys()'s return value
      wasn't checked before this Commit 36096f2f ("packet: Fix error path in
      packet_init.").
      It always returned an error, but was not checked before, so everything
      was working even when CONFIG_PROC_FS=n.
      
      The fix here is simply to add the necessary #ifdef.
      
      This also fixes a similar error in tls_proc.c, that was found by Jakub
      Kicinski.
      
      Fixes: d26b698d ("net/tls: add skeleton of MIB statistics")
      Fixes: 36096f2f ("packet: Fix error path in packet_init")
      Signed-off-by: NYonatan Linik <yonatanlinik@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      a268e0f2
  34. 24 11月, 2020 2 次提交
  35. 10 11月, 2020 1 次提交
  36. 20 9月, 2020 1 次提交
  37. 18 9月, 2020 1 次提交
    • X
      net/packet: Fix a comment about mac_header · b79a80bd
      Xie He 提交于
      1. Change all "dev->hard_header" to "dev->header_ops"
      
      2. On receiving incoming frames when header_ops == NULL:
      
      The comment only says what is wrong, but doesn't say what is right.
      This patch changes the comment to make it clear what is right.
      
      3. On transmitting and receiving outgoing frames when header_ops == NULL:
      
      The comment explains that the LL header will be later added by the driver.
      
      However, I think it's better to simply say that the LL header is invisible
      to us. This phrasing is better from a software engineering perspective,
      because this makes it clear that what happens in the driver should be
      hidden from us and we should not care about what happens internally in the
      driver.
      
      4. On resuming the LL header (for RAW frames) when header_ops == NULL:
      
      The comment says we are "unlikely" to restore the LL header.
      
      However, we should say that we are "unable" to restore it.
      It's not possible (rather than not likely) to restore it, because:
      
      1) There is no way for us to restore because the LL header internally
      processed by the driver should be invisible to us.
      
      2) In function packet_rcv and tpacket_rcv, the code only tries to restore
      the LL header when header_ops != NULL.
      
      Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
      Signed-off-by: NXie He <xie.he.0141@gmail.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b79a80bd