1. 06 4月, 2022 1 次提交
    • O
      net: remove noblock parameter from skb_recv_datagram() · f4b41f06
      Oliver Hartkopp 提交于
      skb_recv_datagram() has two parameters 'flags' and 'noblock' that are
      merged inside skb_recv_datagram() by 'flags | (noblock ? MSG_DONTWAIT : 0)'
      
      As 'flags' may contain MSG_DONTWAIT as value most callers split the 'flags'
      into 'flags' and 'noblock' with finally obsolete bit operations like this:
      
      skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, flags & MSG_DONTWAIT, &rc);
      
      And this is not even done consistently with the 'flags' parameter.
      
      This patch removes the obsolete and costly splitting into two parameters
      and only performs bit operations when really needed on the caller side.
      
      One missing conversion thankfully reported by kernel test robot. I missed
      to enable kunit tests to build the mctp code.
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4b41f06
  2. 27 1月, 2022 1 次提交
    • E
      ipv4: raw: lock the socket in raw_bind() · 153a0d18
      Eric Dumazet 提交于
      For some reason, raw_bind() forgot to lock the socket.
      
      BUG: KCSAN: data-race in __ip4_datagram_connect / raw_bind
      
      write to 0xffff8881170d4308 of 4 bytes by task 5466 on cpu 0:
       raw_bind+0x1b0/0x250 net/ipv4/raw.c:739
       inet_bind+0x56/0xa0 net/ipv4/af_inet.c:443
       __sys_bind+0x14b/0x1b0 net/socket.c:1697
       __do_sys_bind net/socket.c:1708 [inline]
       __se_sys_bind net/socket.c:1706 [inline]
       __x64_sys_bind+0x3d/0x50 net/socket.c:1706
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff8881170d4308 of 4 bytes by task 5468 on cpu 1:
       __ip4_datagram_connect+0xb7/0x7b0 net/ipv4/datagram.c:39
       ip4_datagram_connect+0x2a/0x40 net/ipv4/datagram.c:89
       inet_dgram_connect+0x107/0x190 net/ipv4/af_inet.c:576
       __sys_connect_file net/socket.c:1900 [inline]
       __sys_connect+0x197/0x1b0 net/socket.c:1917
       __do_sys_connect net/socket.c:1927 [inline]
       __se_sys_connect net/socket.c:1924 [inline]
       __x64_sys_connect+0x3d/0x50 net/socket.c:1924
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x00000000 -> 0x0003007f
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 5468 Comm: syz-executor.5 Not tainted 5.17.0-rc1-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      153a0d18
  3. 22 1月, 2022 1 次提交
  4. 18 11月, 2021 1 次提交
  5. 16 11月, 2021 1 次提交
  6. 14 9月, 2021 1 次提交
  7. 13 9月, 2021 1 次提交
  8. 30 6月, 2021 1 次提交
  9. 24 11月, 2020 1 次提交
  10. 27 8月, 2020 1 次提交
  11. 25 8月, 2020 2 次提交
  12. 21 8月, 2020 1 次提交
    • A
      csum_partial_copy_nocheck(): drop the last argument · cc44c17b
      Al Viro 提交于
      It's always 0.  Note that we theoretically could use ~0U as well -
      result will be the same modulo 0xffff, _if_ the damn thing did the
      right thing for any value of initial sum; later we'll make use of
      that when convenient.
      
      However, unlike csum_and_copy_..._user(), there are instances that
      did not work for arbitrary initial sums; c6x is one such.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      cc44c17b
  13. 25 7月, 2020 1 次提交
  14. 20 7月, 2020 1 次提交
  15. 12 3月, 2020 1 次提交
  16. 02 10月, 2019 1 次提交
    • F
      netfilter: drop bridge nf reset from nf_reset · 895b5c9f
      Florian Westphal 提交于
      commit 174e2381
      ("sk_buff: drop all skb extensions on free and skb scrubbing") made napi
      recycle always drop skb extensions.  The additional skb_ext_del() that is
      performed via nf_reset on napi skb recycle is not needed anymore.
      
      Most nf_reset() calls in the stack are there so queued skb won't block
      'rmmod nf_conntrack' indefinitely.
      
      This removes the skb_ext_del from nf_reset, and renames it to a more
      fitting nf_reset_ct().
      
      In a few selected places, add a call to skb_ext_reset to make sure that
      no active extensions remain.
      
      I am submitting this for "net", because we're still early in the release
      cycle.  The patch applies to net-next too, but I think the rename causes
      needless divergence between those trees.
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      895b5c9f
  17. 14 9月, 2019 1 次提交
    • W
      ip: support SO_MARK cmsg · c6af0c22
      Willem de Bruijn 提交于
      Enable setting skb->mark for UDP and RAW sockets using cmsg.
      
      This is analogous to existing support for TOS, TTL, txtime, etc.
      
      Packet sockets already support this as of commit c7d39e32
      ("packet: support per-packet fwmark for af_packet sendmsg").
      
      Similar to other fields, implement by
      1. initialize the sockcm_cookie.mark from socket option sk_mark
      2. optionally overwrite this in ip_cmsg_send/ip6_datagram_send_ctl
      3. initialize inet_cork.mark from sockcm_cookie.mark
      4. initialize each (usually just one) skb->mark from inet_cork.mark
      
      Step 1 is handled in one location for most protocols by ipcm_init_sk
      as of commit 35178206 ("ipv4: ipcm_cookie initializers").
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c6af0c22
  18. 26 6月, 2019 1 次提交
  19. 31 5月, 2019 1 次提交
  20. 20 5月, 2019 1 次提交
  21. 09 5月, 2019 1 次提交
  22. 18 12月, 2018 1 次提交
  23. 28 11月, 2018 1 次提交
  24. 08 11月, 2018 2 次提交
  25. 03 10月, 2018 1 次提交
  26. 07 7月, 2018 2 次提交
    • W
      ip: remove tx_flags from ipcm_cookie and use same logic for v4 and v6 · 678ca42d
      Willem de Bruijn 提交于
      skb_shinfo(skb)->tx_flags is derived from sk->sk_tsflags, possibly
      after modification by __sock_cmsg_send, by calling sock_tx_timestamp.
      
      The IPv4 and IPv6 paths do this conversion differently. In IPv4, the
      individual protocols that support tx timestamps call this function
      and store the result in ipc.tx_flags. In IPv6, sock_tx_timestamp is
      called in __ip6_append_data.
      
      There is no need to store both tx_flags and ts_flags in the cookie
      as one is derived from the other. Convert when setting up the cork
      and remove the redundant field. This is similar to IPv6, only have
      the conversion happen only once per datagram, in ip(6)_setup_cork.
      
      Also change __ip6_append_data to match __ip_append_data. Only update
      tskey if timestamping is enabled with OPT_ID. The SOCK_.. test is
      redundant: only valid protocols can have non-zero cork->tx_flags.
      
      After this change the IPv4 and IPv6 logic is the same.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      678ca42d
    • W
      ipv4: ipcm_cookie initializers · 35178206
      Willem de Bruijn 提交于
      Initialize the cookie in one location to reduce code duplication and
      avoid bugs from inconsistent initialization, such as that fixed in
      commit 9887cba1 ("ip: limit use of gso_size to udp").
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35178206
  27. 04 7月, 2018 1 次提交
  28. 16 5月, 2018 2 次提交
  29. 28 3月, 2018 1 次提交
  30. 27 3月, 2018 1 次提交
  31. 23 3月, 2018 1 次提交
  32. 13 2月, 2018 1 次提交
    • K
      net: Convert pernet_subsys, registered from inet_init() · f84c6821
      Kirill Tkhai 提交于
      arp_net_ops just addr/removes /proc entry.
      
      devinet_ops allocates and frees duplicate of init_net tables
      and (un)registers sysctl entries.
      
      fib_net_ops allocates and frees pernet tables, creates/destroys
      netlink socket and (un)initializes /proc entries. Foreign
      pernet_operations do not touch them.
      
      ip_rt_proc_ops only modifies pernet /proc entries.
      
      xfrm_net_ops creates/destroys /proc entries, allocates/frees
      pernet statistics, hashes and tables, and (un)initializes
      sysctl files. These are not touched by foreigh pernet_operations
      
      xfrm4_net_ops allocates/frees private pernet memory, and
      configures sysctls.
      
      sysctl_route_ops creates/destroys sysctls.
      
      rt_genid_ops only initializes fields of just allocated net.
      
      ipv4_inetpeer_ops allocated/frees net private memory.
      
      igmp_net_ops just creates/destroys /proc files and socket,
      noone else interested in.
      
      tcp_sk_ops seems to be safe, because tcp_sk_init() does not
      depend on any other pernet_operations modifications. Iteration
      over hash table in inet_twsk_purge() is made under RCU lock,
      and it's safe to iterate the table this way. Removing from
      the table happen from inet_twsk_deschedule_put(), but this
      function is safe without any extern locks, as it's synchronized
      inside itself. There are many examples, it's used in different
      context. So, it's safe to leave tcp_sk_exit_batch() unlocked.
      
      tcp_net_metrics_ops is synchronized on tcp_metrics_lock and safe.
      
      udplite4_net_ops only creates/destroys pernet /proc file.
      
      icmp_sk_ops creates percpu sockets, not touched by foreign
      pernet_operations.
      
      ipmr_net_ops creates/destroys pernet fib tables, (un)registers
      fib rules and /proc files. This seem to be safe to execute
      in parallel with foreign pernet_operations.
      
      af_inet_ops just sets up default parameters of newly created net.
      
      ipv4_mib_ops creates and destroys pernet percpu statistics.
      
      raw_net_ops, tcp4_net_ops, udp4_net_ops, ping_v4_net_ops
      and ip_proc_ops only create/destroy pernet /proc files.
      
      ip4_frags_ops creates and destroys sysctl file.
      
      So, it's safe to make the pernet_operations async.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: NAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f84c6821
  33. 26 1月, 2018 1 次提交
    • D
      net/ipv4: Allow send to local broadcast from a socket bound to a VRF · 9515a2e0
      David Ahern 提交于
      Message sends to the local broadcast address (255.255.255.255) require
      uc_index or sk_bound_dev_if to be set to an egress device. However,
      responses or only received if the socket is bound to the device. This
      is overly constraining for processes running in an L3 domain. This
      patch allows a socket bound to the VRF device to send to the local
      broadcast address by using IP_UNICAST_IF to set the egress interface
      with packet receipt handled by the VRF binding.
      
      Similar to IP_MULTICAST_IF, relax the constraint on setting
      IP_UNICAST_IF if a socket is bound to an L3 master device. In this
      case allow uc_index to be set to an enslaved if sk_bound_dev_if is
      an L3 master device and is the master device for the ifindex.
      
      In udp and raw sendmsg, allow uc_index to override the oif if
      uc_index master device is oif (ie., the oif is an L3 master and the
      index is an L3 slave).
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9515a2e0
  34. 17 1月, 2018 1 次提交
    • A
      net: delete /proc THIS_MODULE references · 96890d62
      Alexey Dobriyan 提交于
      /proc has been ignoring struct file_operations::owner field for 10 years.
      Specifically, it started with commit 786d7e16
      ("Fix rmmod/read/write races in /proc entries"). Notice the chunk where
      inode->i_fop is initialized with proxy struct file_operations for
      regular files:
      
      	-               if (de->proc_fops)
      	-                       inode->i_fop = de->proc_fops;
      	+               if (de->proc_fops) {
      	+                       if (S_ISREG(inode->i_mode))
      	+                               inode->i_fop = &proc_reg_file_ops;
      	+                       else
      	+                               inode->i_fop = de->proc_fops;
      	+               }
      
      VFS stopped pinning module at this point.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96890d62
  35. 16 1月, 2018 1 次提交
    • D
      ip: Define usercopy region in IP proto slab cache · 8c2bc895
      David Windsor 提交于
      The ICMP filters for IPv4 and IPv6 raw sockets need to be copied to/from
      userspace. In support of usercopy hardening, this patch defines a region
      in the struct proto slab cache in which userspace copy operations are
      allowed.
      
      example usage trace:
      
          net/ipv4/raw.c:
              raw_seticmpfilter(...):
                  ...
                  copy_from_user(&raw_sk(sk)->filter, ..., optlen)
      
              raw_geticmpfilter(...):
                  ...
                  copy_to_user(..., &raw_sk(sk)->filter, len)
      
          net/ipv6/raw.c:
              rawv6_seticmpfilter(...):
                  ...
                  copy_from_user(&raw6_sk(sk)->filter, ..., optlen)
      
              rawv6_geticmpfilter(...):
                  ...
                  copy_to_user(..., &raw6_sk(sk)->filter, len)
      
      This region is known as the slab cache's usercopy region. Slab caches
      can now check that each dynamically sized copy operation involving
      cache-managed memory falls entirely within the slab's usercopy region.
      
      This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
      whitelisting code in the last public patch of grsecurity/PaX based on my
      understanding of the code. Changes or omissions from the original code are
      mine and don't reflect the original grsecurity/PaX code.
      Signed-off-by: NDavid Windsor <dave@nullcore.net>
      [kees: split from network patch, provide usage trace]
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: netdev@vger.kernel.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      8c2bc895
  36. 10 1月, 2018 1 次提交
    • N
      net: ipv4: emulate READ_ONCE() on ->hdrincl bit-field in raw_sendmsg() · 20b50d79
      Nicolai Stange 提交于
      Commit 8f659a03 ("net: ipv4: fix for a race condition in
      raw_sendmsg") fixed the issue of possibly inconsistent ->hdrincl handling
      due to concurrent updates by reading this bit-field member into a local
      variable and using the thus stabilized value in subsequent tests.
      
      However, aforementioned commit also adds the (correct) comment that
      
        /* hdrincl should be READ_ONCE(inet->hdrincl)
         * but READ_ONCE() doesn't work with bit fields
         */
      
      because as it stands, the compiler is free to shortcut or even eliminate
      the local variable at its will.
      
      Note that I have not seen anything like this happening in reality and thus,
      the concern is a theoretical one.
      
      However, in order to be on the safe side, emulate a READ_ONCE() on the
      bit-field by doing it on the local 'hdrincl' variable itself:
      
      	int hdrincl = inet->hdrincl;
      	hdrincl = READ_ONCE(hdrincl);
      
      This breaks the chain in the sense that the compiler is not allowed
      to replace subsequent reads from hdrincl with reloads from inet->hdrincl.
      
      Fixes: 8f659a03 ("net: ipv4: fix for a race condition in raw_sendmsg")
      Signed-off-by: NNicolai Stange <nstange@suse.de>
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      20b50d79