1. 10 9月, 2022 1 次提交
  2. 18 6月, 2022 1 次提交
  3. 17 6月, 2022 1 次提交
  4. 06 5月, 2022 1 次提交
    • N
      ping: fix address binding wrt vrf · e1a7ac6f
      Nicolas Dichtel 提交于
      When ping_group_range is updated, 'ping' uses the DGRAM ICMP socket,
      instead of an IP raw socket. In this case, 'ping' is unable to bind its
      socket to a local address owned by a vrflite.
      
      Before the patch:
      $ sysctl -w net.ipv4.ping_group_range='0  2147483647'
      $ ip link add blue type vrf table 10
      $ ip link add foo type dummy
      $ ip link set foo master blue
      $ ip link set foo up
      $ ip addr add 192.168.1.1/24 dev foo
      $ ip addr add 2001::1/64 dev foo
      $ ip vrf exec blue ping -c1 -I 192.168.1.1 192.168.1.2
      ping: bind: Cannot assign requested address
      $ ip vrf exec blue ping6 -c1 -I 2001::1 2001::2
      ping6: bind icmp socket: Cannot assign requested address
      
      CC: stable@vger.kernel.org
      Fixes: 1b69c6d0 ("net: Introduce L3 Master device abstraction")
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      e1a7ac6f
  5. 30 4月, 2022 1 次提交
  6. 12 4月, 2022 1 次提交
    • O
      net: remove noblock parameter from recvmsg() entities · ec095263
      Oliver Hartkopp 提交于
      The internal recvmsg() functions have two parameters 'flags' and 'noblock'
      that were merged inside skb_recv_datagram(). As a follow up patch to commit
      f4b41f06 ("net: remove noblock parameter from skb_recv_datagram()")
      this patch removes the separate 'noblock' parameter for recvmsg().
      
      Analogue to the referenced patch for skb_recv_datagram() the 'flags' and
      'noblock' parameters are unnecessarily split up with e.g.
      
      err = sk->sk_prot->recvmsg(sk, msg, size, flags & MSG_DONTWAIT,
                                 flags & ~MSG_DONTWAIT, &addr_len);
      
      or in
      
      err = INDIRECT_CALL_2(sk->sk_prot->recvmsg, tcp_recvmsg, udp_recvmsg,
                            sk, msg, size, flags & MSG_DONTWAIT,
                            flags & ~MSG_DONTWAIT, &addr_len);
      
      instead of simply using only flags all the time and check for MSG_DONTWAIT
      where needed (to preserve for the formerly separated no(n)block condition).
      Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
      Link: https://lore.kernel.org/r/20220411124955.154876-1-socketcan@hartkopp.netSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
      ec095263
  7. 11 4月, 2022 2 次提交
    • M
      net: icmp: add skb drop reasons to icmp protocol · b384c95a
      Menglong Dong 提交于
      Replace kfree_skb() used in icmp_rcv() and icmpv6_rcv() with
      kfree_skb_reason().
      
      In order to get the reasons of the skb drops after icmp message handle,
      we change the return type of 'handler()' in 'struct icmp_control' from
      'bool' to 'enum skb_drop_reason'. This may change its original
      intention, as 'false' means failure, but 'SKB_NOT_DROPPED_YET' means
      success now. Therefore, all 'handler' and the call of them need to be
      handled. Following 'handler' functions are involved:
      
      icmp_unreach()
      icmp_redirect()
      icmp_echo()
      icmp_timestamp()
      icmp_discard()
      
      And following new drop reasons are added:
      
      SKB_DROP_REASON_ICMP_CSUM
      SKB_DROP_REASON_INVALID_PROTO
      
      The reason 'INVALID_PROTO' is introduced for the case that the packet
      doesn't follow rfc 1122 and is dropped. This is not a common case, and
      I believe we can locate the problem from the data in the packet. For now,
      this 'INVALID_PROTO' is used for the icmp broadcasts with wrong types.
      
      Maybe there should be a document file for these reasons. For example,
      list all the case that causes the 'UNHANDLED_PROTO' and 'INVALID_PROTO'
      drop reason. Therefore, users can locate their problems according to the
      document.
      Reviewed-by: NHao Peng <flyingpeng@tencent.com>
      Reviewed-by: NJiang Biao <benbjiang@tencent.com>
      Signed-off-by: NMenglong Dong <imagedong@tencent.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b384c95a
    • M
      net: icmp: introduce __ping_queue_rcv_skb() to report drop reasons · 41a95a00
      Menglong Dong 提交于
      In order to avoid to change the return value of ping_queue_rcv_skb(),
      introduce the function __ping_queue_rcv_skb(), which is able to report
      the reasons of skb drop as its return value, as Paolo suggested.
      
      Meanwhile, make ping_queue_rcv_skb() a simple call to
      __ping_queue_rcv_skb().
      
      The kfree_skb() and sock_queue_rcv_skb() used in ping_queue_rcv_skb()
      are replaced with kfree_skb_reason() and sock_queue_rcv_skb_reason()
      now.
      Reviewed-by: NHao Peng <flyingpeng@tencent.com>
      Reviewed-by: NJiang Biao <benbjiang@tencent.com>
      Signed-off-by: NMenglong Dong <imagedong@tencent.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41a95a00
  8. 06 4月, 2022 1 次提交
    • O
      net: remove noblock parameter from skb_recv_datagram() · f4b41f06
      Oliver Hartkopp 提交于
      skb_recv_datagram() has two parameters 'flags' and 'noblock' that are
      merged inside skb_recv_datagram() by 'flags | (noblock ? MSG_DONTWAIT : 0)'
      
      As 'flags' may contain MSG_DONTWAIT as value most callers split the 'flags'
      into 'flags' and 'noblock' with finally obsolete bit operations like this:
      
      skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, flags & MSG_DONTWAIT, &rc);
      
      And this is not even done consistently with the 'flags' parameter.
      
      This patch removes the obsolete and costly splitting into two parameters
      and only performs bit operations when really needed on the caller side.
      
      One missing conversion thankfully reported by kernel test robot. I missed
      to enable kunit tests to build the mctp code.
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4b41f06
  9. 25 2月, 2022 1 次提交
  10. 17 2月, 2022 1 次提交
    • X
      ping: fix the dif and sdif check in ping_lookup · 35a79e64
      Xin Long 提交于
      When 'ping' changes to use PING socket instead of RAW socket by:
      
         # sysctl -w net.ipv4.ping_group_range="0 100"
      
      There is another regression caused when matching sk_bound_dev_if
      and dif, RAW socket is using inet_iif() while PING socket lookup
      is using skb->dev->ifindex, the cmd below fails due to this:
      
        # ip link add dummy0 type dummy
        # ip link set dummy0 up
        # ip addr add 192.168.111.1/24 dev dummy0
        # ping -I dummy0 192.168.111.1 -c1
      
      The issue was also reported on:
      
        https://github.com/iputils/iputils/issues/104
      
      But fixed in iputils in a wrong way by not binding to device when
      destination IP is on device, and it will cause some of kselftests
      to fail, as Jianlin noticed.
      
      This patch is to use inet(6)_iif and inet(6)_sdif to get dif and
      sdif for PING socket, and keep consistent with RAW socket.
      
      Fixes: c319b4d7 ("net: ipv4: add IPPROTO_ICMP socket kind")
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35a79e64
  11. 24 1月, 2022 1 次提交
    • X
      ping: fix the sk_bound_dev_if match in ping_lookup · 2afc3b5a
      Xin Long 提交于
      When 'ping' changes to use PING socket instead of RAW socket by:
      
         # sysctl -w net.ipv4.ping_group_range="0 100"
      
      the selftests 'router_broadcast.sh' will fail, as such command
      
        # ip vrf exec vrf-h1 ping -I veth0 198.51.100.255 -b
      
      can't receive the response skb by the PING socket. It's caused by mismatch
      of sk_bound_dev_if and dif in ping_rcv() when looking up the PING socket,
      as dif is vrf-h1 if dif's master was set to vrf-h1.
      
      This patch is to fix this regression by also checking the sk_bound_dev_if
      against sdif so that the packets can stil be received even if the socket
      is not bound to the vrf device but to the real iif.
      
      Fixes: c319b4d7 ("net: ipv4: add IPPROTO_ICMP socket kind")
      Reported-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2afc3b5a
  12. 07 1月, 2022 1 次提交
    • M
      net: bpf: Handle return value of BPF_CGROUP_RUN_PROG_INET{4,6}_POST_BIND() · 91a760b2
      Menglong Dong 提交于
      The return value of BPF_CGROUP_RUN_PROG_INET{4,6}_POST_BIND() in
      __inet_bind() is not handled properly. While the return value
      is non-zero, it will set inet_saddr and inet_rcv_saddr to 0 and
      exit:
      
      	err = BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk);
      	if (err) {
      		inet->inet_saddr = inet->inet_rcv_saddr = 0;
      		goto out_release_sock;
      	}
      
      Let's take UDP for example and see what will happen. For UDP
      socket, it will be added to 'udp_prot.h.udp_table->hash' and
      'udp_prot.h.udp_table->hash2' after the sk->sk_prot->get_port()
      called success. If 'inet->inet_rcv_saddr' is specified here,
      then 'sk' will be in the 'hslot2' of 'hash2' that it don't belong
      to (because inet_saddr is changed to 0), and UDP packet received
      will not be passed to this sock. If 'inet->inet_rcv_saddr' is not
      specified here, the sock will work fine, as it can receive packet
      properly, which is wired, as the 'bind()' is already failed.
      
      To undo the get_port() operation, introduce the 'put_port' field
      for 'struct proto'. For TCP proto, it is inet_put_port(); For UDP
      proto, it is udp_lib_unhash(); For icmp proto, it is
      ping_unhash().
      
      Therefore, after sys_bind() fail caused by
      BPF_CGROUP_RUN_PROG_INET4_POST_BIND(), it will be unbinded, which
      means that it can try to be binded to another port.
      Signed-off-by: NMenglong Dong <imagedong@tencent.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220106132022.3470772-2-imagedong@tencent.com
      91a760b2
  13. 18 11月, 2021 1 次提交
  14. 14 9月, 2021 1 次提交
  15. 13 9月, 2021 1 次提交
  16. 30 6月, 2021 1 次提交
  17. 11 6月, 2021 1 次提交
  18. 31 3月, 2021 1 次提交
  19. 24 11月, 2020 1 次提交
  20. 01 9月, 2020 1 次提交
  21. 25 8月, 2020 2 次提交
  22. 08 7月, 2020 1 次提交
  23. 14 9月, 2019 1 次提交
    • W
      ip: support SO_MARK cmsg · c6af0c22
      Willem de Bruijn 提交于
      Enable setting skb->mark for UDP and RAW sockets using cmsg.
      
      This is analogous to existing support for TOS, TTL, txtime, etc.
      
      Packet sockets already support this as of commit c7d39e32
      ("packet: support per-packet fwmark for af_packet sendmsg").
      
      Similar to other fields, implement by
      1. initialize the sockcm_cookie.mark from socket option sk_mark
      2. optionally overwrite this in ip_cmsg_send/ip6_datagram_send_ctl
      3. initialize inet_cork.mark from sockcm_cookie.mark
      4. initialize each (usually just one) skb->mark from inet_cork.mark
      
      Step 1 is handled in one location for most protocols by ipcm_init_sk
      as of commit 35178206 ("ipv4: ipcm_cookie initializers").
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c6af0c22
  24. 31 5月, 2019 1 次提交
  25. 20 5月, 2019 1 次提交
  26. 03 10月, 2018 1 次提交
  27. 02 8月, 2018 1 次提交
  28. 07 7月, 2018 2 次提交
    • W
      ip: remove tx_flags from ipcm_cookie and use same logic for v4 and v6 · 678ca42d
      Willem de Bruijn 提交于
      skb_shinfo(skb)->tx_flags is derived from sk->sk_tsflags, possibly
      after modification by __sock_cmsg_send, by calling sock_tx_timestamp.
      
      The IPv4 and IPv6 paths do this conversion differently. In IPv4, the
      individual protocols that support tx timestamps call this function
      and store the result in ipc.tx_flags. In IPv6, sock_tx_timestamp is
      called in __ip6_append_data.
      
      There is no need to store both tx_flags and ts_flags in the cookie
      as one is derived from the other. Convert when setting up the cork
      and remove the redundant field. This is similar to IPv6, only have
      the conversion happen only once per datagram, in ip(6)_setup_cork.
      
      Also change __ip6_append_data to match __ip_append_data. Only update
      tskey if timestamping is enabled with OPT_ID. The SOCK_.. test is
      redundant: only valid protocols can have non-zero cork->tx_flags.
      
      After this change the IPv4 and IPv6 logic is the same.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      678ca42d
    • W
      ipv4: ipcm_cookie initializers · 35178206
      Willem de Bruijn 提交于
      Initialize the cookie in one location to reduce code duplication and
      avoid bugs from inconsistent initialization, such as that fixed in
      commit 9887cba1 ("ip: limit use of gso_size to udp").
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35178206
  29. 04 7月, 2018 1 次提交
  30. 16 5月, 2018 2 次提交
  31. 12 5月, 2018 1 次提交
  32. 28 3月, 2018 1 次提交
  33. 27 3月, 2018 1 次提交
  34. 13 2月, 2018 1 次提交
    • K
      net: Convert pernet_subsys, registered from inet_init() · f84c6821
      Kirill Tkhai 提交于
      arp_net_ops just addr/removes /proc entry.
      
      devinet_ops allocates and frees duplicate of init_net tables
      and (un)registers sysctl entries.
      
      fib_net_ops allocates and frees pernet tables, creates/destroys
      netlink socket and (un)initializes /proc entries. Foreign
      pernet_operations do not touch them.
      
      ip_rt_proc_ops only modifies pernet /proc entries.
      
      xfrm_net_ops creates/destroys /proc entries, allocates/frees
      pernet statistics, hashes and tables, and (un)initializes
      sysctl files. These are not touched by foreigh pernet_operations
      
      xfrm4_net_ops allocates/frees private pernet memory, and
      configures sysctls.
      
      sysctl_route_ops creates/destroys sysctls.
      
      rt_genid_ops only initializes fields of just allocated net.
      
      ipv4_inetpeer_ops allocated/frees net private memory.
      
      igmp_net_ops just creates/destroys /proc files and socket,
      noone else interested in.
      
      tcp_sk_ops seems to be safe, because tcp_sk_init() does not
      depend on any other pernet_operations modifications. Iteration
      over hash table in inet_twsk_purge() is made under RCU lock,
      and it's safe to iterate the table this way. Removing from
      the table happen from inet_twsk_deschedule_put(), but this
      function is safe without any extern locks, as it's synchronized
      inside itself. There are many examples, it's used in different
      context. So, it's safe to leave tcp_sk_exit_batch() unlocked.
      
      tcp_net_metrics_ops is synchronized on tcp_metrics_lock and safe.
      
      udplite4_net_ops only creates/destroys pernet /proc file.
      
      icmp_sk_ops creates percpu sockets, not touched by foreign
      pernet_operations.
      
      ipmr_net_ops creates/destroys pernet fib tables, (un)registers
      fib rules and /proc files. This seem to be safe to execute
      in parallel with foreign pernet_operations.
      
      af_inet_ops just sets up default parameters of newly created net.
      
      ipv4_mib_ops creates and destroys pernet percpu statistics.
      
      raw_net_ops, tcp4_net_ops, udp4_net_ops, ping_v4_net_ops
      and ip_proc_ops only create/destroy pernet /proc files.
      
      ip4_frags_ops creates and destroys sysctl file.
      
      So, it's safe to make the pernet_operations async.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: NAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f84c6821
  35. 01 7月, 2017 1 次提交
  36. 25 3月, 2017 1 次提交