1. 14 11月, 2020 2 次提交
  2. 13 11月, 2020 1 次提交
    • M
      bpf: Allow using bpf_sk_storage in FENTRY/FEXIT/RAW_TP · 8e4597c6
      Martin KaFai Lau 提交于
      This patch enables the FENTRY/FEXIT/RAW_TP tracing program to use
      the bpf_sk_storage_(get|delete) helper, so those tracing programs
      can access the sk's bpf_local_storage and the later selftest
      will show some examples.
      
      The bpf_sk_storage is currently used in bpf-tcp-cc, tc,
      cg sockops...etc which is running either in softirq or
      task context.
      
      This patch adds bpf_sk_storage_get_tracing_proto and
      bpf_sk_storage_delete_tracing_proto.  They will check
      in runtime that the helpers can only be called when serving
      softirq or running in a task context.  That should enable
      most common tracing use cases on sk.
      
      During the load time, the new tracing_allowed() function
      will ensure the tracing prog using the bpf_sk_storage_(get|delete)
      helper is not tracing any bpf_sk_storage*() function itself.
      The sk is passed as "void *" when calling into bpf_local_storage.
      
      This patch only allows tracing a kernel function.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20201112211313.2587383-1-kafai@fb.com
      8e4597c6
  3. 11 11月, 2020 4 次提交
  4. 10 11月, 2020 1 次提交
  5. 08 11月, 2020 1 次提交
  6. 07 11月, 2020 4 次提交
    • I
      nexthop: Pass extack to register_nexthop_notifier() · ce7e9c8a
      Ido Schimmel 提交于
      This will be used by the next patch which extends the function to replay
      all the existing nexthops to the notifier block being registered.
      
      Device drivers will be able to pass extack to the function since it is
      passed to them upon reload from devlink.
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      ce7e9c8a
    • I
      nexthop: Emit a notification when a nexthop is added · 732d167b
      Ido Schimmel 提交于
      Emit a notification in the nexthop notification chain when a new nexthop
      is added (not replaced). The nexthop can either be a new group or a
      single nexthop.
      
      The notification is sent after the nexthop is inserted into the
      red-black tree, as listeners might need to callback into the nexthop
      code with the nexthop ID in order to mark the nexthop as offloaded.
      
      A 'REPLACE' notification is emitted instead of 'ADD' as the distinction
      between the two is not important for in-kernel listeners. In case the
      listener is not familiar with the encoded nexthop ID, it can simply
      treat it as a new one. This is also consistent with the route offload
      API.
      
      Changes since RFC:
      * Reword commit message
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      732d167b
    • I
      nexthop: Allow setting "offload" and "trap" indications on nexthops · e95f2592
      Ido Schimmel 提交于
      Add a function that can be called by device drivers to set "offload" or
      "trap" indication on nexthops following nexthop notifications.
      
      Changes since RFC:
      * s/nexthop_hw_flags_set/nexthop_set_hw_flags/
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      e95f2592
    • I
      nexthop: Add nexthop notification data structures · 1c9cac65
      Ido Schimmel 提交于
      Add data structures that will be used for nexthop replace and delete
      notifications in the previously introduced nexthop notification chain.
      
      New data structures are added instead of passing the existing nexthop
      code structures directly for several reasons.
      
      First, the existing structures encode a lot of bookkeeping information
      which is irrelevant for listeners of the notification chain.
      
      Second, the existing structures can be changed without worrying about
      introducing regressions in listeners since they are not accessed
      directly by them.
      
      Third, listeners of the notification chain do not need to each parse the
      relatively complex nexthop code structures. They are passing the
      required information in a simplified way.
      
      Note that a single 'has_encap' bit is added instead of the actual
      encapsulation information since current listeners do not support such
      nexthops.
      
      Changes since RFC:
      * s/is_encap/has_encap/
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      1c9cac65
  7. 06 11月, 2020 6 次提交
  8. 05 11月, 2020 1 次提交
    • P
      tcp: propagate MPTCP skb extensions on xmit splits · 5a369ca6
      Paolo Abeni 提交于
      When the TCP stack splits a packet on the write queue, the tail
      half currently lose the associated skb extensions, and will not
      carry the DSM on the wire.
      
      The above does not cause functional problems and is allowed by
      the RFC, but interact badly with GRO and RX coalescing, as possible
      candidates for aggregation will carry different TCP options.
      
      This change tries to improve the MPTCP behavior, propagating the
      skb extensions on split.
      
      Additionally, we must prevent the MPTCP stack from updating the
      mapping after the split occur: that will both violate the RFC and
      fool the reader.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      5a369ca6
  9. 04 11月, 2020 1 次提交
  10. 03 11月, 2020 1 次提交
  11. 31 10月, 2020 7 次提交
    • J
      netfilter: nf_reject: add reject skbuff creation helpers · fa538f7c
      Jose M. Guisado Gomez 提交于
      Adds reject skbuff creation helper functions to ipv4/6 nf_reject
      infrastructure. Use these functions for reject verdict in bridge
      family.
      
      Can be reused by all different families that support reject and
      will not inject the reject packet through ip local out.
      Signed-off-by: NJose M. Guisado Gomez <guigom@riseup.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      fa538f7c
    • X
      sctp: add the error cause for new encapsulation port restart · e38d86b3
      Xin Long 提交于
      This patch is to add the function to make the abort chunk with
      the error cause for new encapsulation port restart, defined
      on Section 4.4 in draft-tuexen-tsvwg-sctp-udp-encaps-cons-03.
      
      v1->v2:
        - no change.
      v2->v3:
        - no need to call htons() when setting nep.cur_port/new_port.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      e38d86b3
    • X
      sctp: add udphdr to overhead when udp_port is set · f1bfe8b5
      Xin Long 提交于
      sctp_mtu_payload() is for calculating the frag size before making
      chunks from a msg. So we should only add udphdr size to overhead
      when udp socks are listening, as only then sctp can handle the
      incoming sctp over udp packets and outgoing sctp over udp packets
      will be possible.
      
      Note that we can't do this according to transport->encap_port, as
      different transports may be set to different values, while the
      chunks were made before choosing the transport, we could not be
      able to meet all rfc6951#section-5.6 recommends.
      
      v1->v2:
        - Add udp_port for sctp_sock to avoid a potential race issue, it
          will be used in xmit path in the next patch.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      f1bfe8b5
    • X
      sctp: allow changing transport encap_port by peer packets · a1dd2cf2
      Xin Long 提交于
      As rfc6951#section-5.4 says:
      
        "After finding the SCTP association (which
         includes checking the verification tag), the UDP source port MUST be
         stored as the encapsulation port for the destination address the SCTP
         packet is received from (see Section 5.1).
      
         When a non-encapsulated SCTP packet is received by the SCTP stack,
         the encapsulation of outgoing packets belonging to the same
         association and the corresponding destination address MUST be
         disabled."
      
      transport encap_port should be updated by a validated incoming packet's
      udp src port.
      
      We save the udp src port in sctp_input_cb->encap_port, and then update
      the transport in two places:
      
        1. right after vtag is verified, which is required by RFC, and this
           allows the existent transports to be updated by the chunks that
           can only be processed on an asoc.
      
        2. right before processing the 'init' where the transports are added,
           and this allows building a sctp over udp connection by client with
           the server not knowing the remote encap port.
      
        3. when processing ootb_pkt and creating the temporary transport for
           the reply pkt.
      
      Note that sctp_input_cb->header is removed, as it's not used any more
      in sctp.
      
      v1->v2:
        - Change encap_port as __be16 for sctp_input_cb.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      a1dd2cf2
    • X
      sctp: add encap_port for netns sock asoc and transport · e8a3001c
      Xin Long 提交于
      encap_port is added as per netns/sock/assoc/transport, and the
      latter one's encap_port inherits the former one's by default.
      The transport's encap_port value would mostly decide if one
      packet should go out with udp encapsulated or not.
      
      This patch also allows users to set netns' encap_port by sysctl.
      
      v1->v2:
        - Change to define encap_port as __be16 for sctp_sock, asoc and
          transport.
      v2->v3:
        - No change.
      v3->v4:
        - Add 'encap_port' entry in ip-sysctl.rst.
      v4->v5:
        - Improve the description of encap_port in ip-sysctl.rst.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      e8a3001c
    • X
      sctp: create udp6 sock and set its encap_rcv · 9d6ba260
      Xin Long 提交于
      This patch is to add the udp6 sock part in sctp_udp_sock_start/stop().
      udp_conf.use_udp6_rx_checksums is set to true, as:
      
         "The SCTP checksum MUST be computed for IPv4 and IPv6, and the UDP
          checksum SHOULD be computed for IPv4 and IPv6"
      
      says in rfc6951#section-5.3.
      
      v1->v2:
        - Add pr_err() when fails to create udp v6 sock.
        - Add #if IS_ENABLED(CONFIG_IPV6) not to create v6 sock when ipv6 is
          disabled.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      9d6ba260
    • X
      sctp: create udp4 sock and add its encap_rcv · 965ae444
      Xin Long 提交于
      This patch is to add the functions to create/release udp4 sock,
      and set the sock's encap_rcv to process the incoming udp encap
      sctp packets. In sctp_udp_rcv(), as we can see, all we need to
      do is fix the transport header for sctp_rcv(), then it would
      implement the part of rfc6951#section-5.4:
      
        "When an encapsulated packet is received, the UDP header is removed.
         Then, the generic lookup is performed, as done by an SCTP stack
         whenever a packet is received, to find the association for the
         received SCTP packet"
      
      Note that these functions will be called in the last patch of
      this patchset when enabling this feature.
      
      v1->v2:
        - Add pr_err() when fails to create udp v4 sock.
      v2->v3:
        - Add 'select NET_UDP_TUNNEL' in sctp Kconfig.
      v3->v4:
        - No change.
      v4->v5:
        - Change to set udp_port to 0 by default.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      965ae444
  12. 30 10月, 2020 2 次提交
  13. 29 10月, 2020 1 次提交
    • M
      xsk: Fix possible memory leak at socket close · e5e1a4bc
      Magnus Karlsson 提交于
      Fix a possible memory leak at xsk socket close that is caused by the
      refcounting of the umem object being wrong. The reference count of the
      umem was decremented only after the pool had been freed. Note that if
      the buffer pool is destroyed, it is important that the umem is
      destroyed after the pool, otherwise the umem would disappear while the
      driver is still running. And as the buffer pool needs to be destroyed
      in a work queue, the umem is also (if its refcount reaches zero)
      destroyed after the buffer pool in that same work queue.
      
      What was missing is that the refcount also needs to be decremented
      when the pool is not freed and when the pool has not even been
      created. The first case happens when the refcount of the pool is
      higher than 1, i.e. it is still being used by some other socket using
      the same device and queue id. In this case, it is safe to decrement
      the refcount of the umem outside of the work queue as the umem will
      never be freed because the refcount of the umem is always greater than
      or equal to the refcount of the buffer pool. The second case is if the
      buffer pool has not been created yet, i.e. the socket was closed
      before it was bound but after the umem was created. In this case, it
      is safe to destroy the umem outside of the work queue, since there is
      no pool that can use it by definition.
      
      Fixes: 1c1efc2a ("xsk: Create and free buffer pool independently from umem")
      Reported-by: syzbot+eb71df123dc2be2c1456@syzkaller.appspotmail.com
      Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
      Link: https://lore.kernel.org/bpf/1603801921-2712-1-git-send-email-magnus.karlsson@gmail.com
      e5e1a4bc
  14. 20 10月, 2020 1 次提交
  15. 16 10月, 2020 1 次提交
    • L
      net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info · d086a1c6
      Leon Romanovsky 提交于
      The access of tcf_tunnel_info() produces the following splat, so fix it
      by dereferencing the tcf_tunnel_key_params pointer with marker that
      internal tcfa_liock is held.
      
       =============================
       WARNING: suspicious RCU usage
       5.9.0+ #1 Not tainted
       -----------------------------
       include/net/tc_act/tc_tunnel_key.h:59 suspicious rcu_dereference_protected() usage!
       other info that might help us debug this:
      
       rcu_scheduler_active = 2, debug_locks = 1
       1 lock held by tc/34839:
        #0: ffff88828572c2a0 (&p->tcfa_lock){+...}-{2:2}, at: tc_setup_flow_action+0xb3/0x48b5
       stack backtrace:
       CPU: 1 PID: 34839 Comm: tc Not tainted 5.9.0+ #1
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
       Call Trace:
        dump_stack+0x9a/0xd0
        tc_setup_flow_action+0x14cb/0x48b5
        fl_hw_replace_filter+0x347/0x690 [cls_flower]
        fl_change+0x2bad/0x4875 [cls_flower]
        tc_new_tfilter+0xf6f/0x1ba0
        rtnetlink_rcv_msg+0x5f2/0x870
        netlink_rcv_skb+0x124/0x350
        netlink_unicast+0x433/0x700
        netlink_sendmsg+0x6f1/0xbd0
        sock_sendmsg+0xb0/0xe0
        ____sys_sendmsg+0x4fa/0x6d0
        ___sys_sendmsg+0x12e/0x1b0
        __sys_sendmsg+0xa4/0x120
        do_syscall_64+0x2d/0x40
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7f1f8cd4fe57
       Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
       RSP: 002b:00007ffdc1e193b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1f8cd4fe57
       RDX: 0000000000000000 RSI: 00007ffdc1e19420 RDI: 0000000000000003
       RBP: 000000005f85aafa R08: 0000000000000001 R09: 00007ffdc1e1936c
       R10: 000000000040522d R11: 0000000000000246 R12: 0000000000000001
       R13: 0000000000000000 R14: 00007ffdc1e1d6f0 R15: 0000000000482420
      
      Fixes: 3ebaf6da ("net: sched: Do not assume RTNL is held in tunnel key action helpers")
      Fixes: 7a472814 ("net: sched: lock action when translating it to flow_action infra")
      Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      d086a1c6
  16. 15 10月, 2020 2 次提交
  17. 14 10月, 2020 1 次提交
    • P
      netfilter: nf_log: missing vlan offload tag and proto · 0d9826bc
      Pablo Neira Ayuso 提交于
      Dump vlan tag and proto for the usual vlan offload case if the
      NF_LOG_MACDECODE flag is set on. Without this information the logging is
      misleading as there is no reference to the VLAN header.
      
      [12716.993704] test: IN=veth0 OUT= MACSRC=86:6c:92:ea:d6:73 MACDST=0e:3b:eb:86:73:76 VPROTO=8100 VID=10 MACPROTO=0800 SRC=192.168.10.2 DST=172.217.168.163 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=2548 DF PROTO=TCP SPT=55848 DPT=80 WINDOW=501 RES=0x00 ACK FIN URGP=0
      [12721.157643] test: IN=veth0 OUT= MACSRC=86:6c:92:ea:d6:73 MACDST=0e:3b:eb:86:73:76 VPROTO=8100 VID=10 MACPROTO=0806 ARP HTYPE=1 PTYPE=0x0800 OPCODE=2 MACSRC=86:6c:92:ea:d6:73 IPSRC=192.168.10.2 MACDST=0e:3b:eb:86:73:76 IPDST=192.168.10.1
      
      Fixes: 83e96d44 ("netfilter: log: split family specific code to nf_log_{ip,ip6,common}.c files")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      0d9826bc
  18. 12 10月, 2020 2 次提交
  19. 10 10月, 2020 1 次提交
    • J
      netlink: export policy in extended ACK · 44f3625b
      Johannes Berg 提交于
      Add a new attribute NLMSGERR_ATTR_POLICY to the extended ACK
      to advertise the policy, e.g. if an attribute was out of range,
      you'll know the range that's permissible.
      
      Add new NL_SET_ERR_MSG_ATTR_POL() and NL_SET_ERR_MSG_ATTR_POL()
      macros to set this, since realistically it's only useful to do
      this when the bad attribute (offset) is also returned.
      
      Use it in lib/nlattr.c which practically does all the policy
      validation.
      
      v2:
       - add and use netlink_policy_dump_attr_size_estimate()
      v3:
       - remove redundant break
      v4:
       - really remove redundant break ... sorry
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      44f3625b