- 10 9月, 2020 3 次提交
-
-
由 Parav Pandit 提交于
A devlink eswitch port may represent PCI PF/VF ports of a controller. A controller either located on same system or it can be an external controller located in host where such NIC is plugged in. Add the ability for driver to specify if a port is for external controller. Use such flag in the mlx5_core driver. An example of an external controller having VF1 of PF0 belong to controller 1. $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev ens2f0pf0vf1 flavour pcivf pfnum 0 vfnum 1 external true splittable false function: hw_addr 00:00:00:00:00:00 $ devlink port show pci/0000:06:00.0/2 -jp { "port": { "pci/0000:06:00.0/2": { "type": "eth", "netdev": "ens2f0pf0vf1", "flavour": "pcivf", "pfnum": 0, "vfnum": 1, "external": true, "splittable": false, "function": { "hw_addr": "00:00:00:00:00:00" } } } } Signed-off-by: NParav Pandit <parav@nvidia.com> Reviewed-by: NJiri Pirko <jiri@nvidia.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Parav Pandit 提交于
To add more fields to the PCI PF and VF port attributes, follow standard structure comment format. Signed-off-by: NParav Pandit <parav@nvidia.com> Reviewed-by: NJiri Pirko <jiri@nvidia.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Parav Pandit 提交于
Add comment block for physical, PF and VF port attributes. Signed-off-by: NParav Pandit <parav@nvidia.com> Reviewed-by: NJiri Pirko <jiri@nvidia.com> Reviewed-by: NRoi Dayan <roid@nvidia.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 9月, 2020 1 次提交
-
-
由 Jose M. Guisado Gomez 提交于
Enables storing userdata for nft_object. Initially this will store an optional comment but can be extended in the future as needed. Adds new attribute NFTA_OBJ_USERDATA to nft_object. Signed-off-by: NJose M. Guisado Gomez <guigom@riseup.net> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 01 9月, 2020 15 次提交
-
-
由 Yaroslav Bolyukin 提交于
This dependency was added because ipv6_find_hdr was in iptables specific code but is no longer required Fixes: f8f62675 ("ipv6: Move ipv6_find_hdr() out of Netfilter code.") Fixes: 63dca2c0 ("ipvs: Fix faulty IPv6 extension header handling in IPVS") Signed-off-by: NYaroslav Bolyukin <iam@lach.pw> Acked-by: NJulian Anastasov <ja@ssi.bg> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Miaohe Lin 提交于
The arg exact_dif is not used anymore, remove it. inet_exact_dif_match() is no longer needed after the above is removed, so remove it too. Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Miaohe Lin 提交于
This is a pure codestyle cleanup patch. No functional change intended. Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 wenxu 提交于
Add ipv6_fragment to ipv6_stub to avoid calling netfilter when access ip6_fragment. Signed-off-by: Nwenxu <wenxu@ucloud.cn> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Magnus Karlsson 提交于
Add support to share a umem between queue ids on the same device. This mode can be invoked with the XDP_SHARED_UMEM bind flag. Previously, sharing was only supported within the same queue id and device, and you shared one set of fill and completion rings. However, note that when sharing a umem between queue ids, you need to create a fill ring and a completion ring and tie them to the socket before you do the bind with the XDP_SHARED_UMEM flag. This so that the single-producer single-consumer semantics can be upheld. Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NBjörn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-12-git-send-email-magnus.karlsson@intel.com
-
由 Magnus Karlsson 提交于
Test for dma_need_sync earlier to increase performance. xsk_buff_dma_sync_for_cpu() takes an xdp_buff as parameter and from that the xsk_buff_pool reference is dug out. Perf shows that this dereference causes a lot of cache misses. But as the buffer pool is now sent down to the driver at zero-copy initialization time, we might as well use this pointer directly, instead of going via the xsk_buff and we can do so already in xsk_buff_dma_sync_for_cpu() instead of in xp_dma_sync_for_cpu. This gets rid of these cache misses. Throughput increases with 3% for the xdpsock l2fwd sample application on my machine. Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NBjörn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-11-git-send-email-magnus.karlsson@intel.com
-
由 Magnus Karlsson 提交于
Rearrange the xdp_sock, xdp_umem and xsk_buff_pool structures so that they get smaller and align better to the cache lines. In the previous commits of this patch set, these structs have been reordered with the focus on functionality and simplicity, not performance. This patch improves throughput performance by around 3%. Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NBjörn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-10-git-send-email-magnus.karlsson@intel.com
-
由 Magnus Karlsson 提交于
Enable the sharing of dma mappings by moving them out from the buffer pool. Instead we put each dma mapped umem region in a list in the umem structure. If dma has already been mapped for this umem and device, it is not mapped again and the existing dma mappings are reused. Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NBjörn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-9-git-send-email-magnus.karlsson@intel.com
-
由 Magnus Karlsson 提交于
Replicate the addrs pointer in the buffer pool to the umem. This mapping will be the same for all buffer pools sharing the same umem. In the buffer pool we leave the addrs pointer for performance reasons. Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NBjörn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-8-git-send-email-magnus.karlsson@intel.com
-
由 Magnus Karlsson 提交于
Move the xsk_tx_list and the xsk_tx_list_lock from the umem to the buffer pool. This so that we in a later commit can share the umem between multiple HW queues. There is one xsk_tx_list per device and queue id, so it should be located in the buffer pool. Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NBjörn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-7-git-send-email-magnus.karlsson@intel.com
-
由 Magnus Karlsson 提交于
Move queue_id, dev, and need_wakeup from the umem to the buffer pool. This so that we in a later commit can share the umem between multiple HW queues. There is one buffer pool per dev and queue id, so these variables should belong to the buffer pool, not the umem. Need_wakeup is also something that is set on a per napi level, so there is usually one per device and queue id. So move this to the buffer pool too. Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NBjörn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-6-git-send-email-magnus.karlsson@intel.com
-
由 Magnus Karlsson 提交于
Move the fill and completion rings from the umem to the buffer pool. This so that we in a later commit can share the umem between multiple HW queue ids. In this case, we need one fill and completion ring per queue id. As the buffer pool is per queue id and napi id this is a natural place for it and one umem struture can be shared between these buffer pools. Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NBjörn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-5-git-send-email-magnus.karlsson@intel.com
-
由 Magnus Karlsson 提交于
Create and free the buffer pool independently from the umem. Move these operations that are performed on the buffer pool from the umem create and destroy functions to new create and destroy functions just for the buffer pool. This so that in later commits we can instantiate multiple buffer pools per umem when sharing a umem between HW queues and/or devices. We also erradicate the back pointer from the umem to the buffer pool as this will not work when we introduce the possibility to have multiple buffer pools per umem. Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NBjörn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-4-git-send-email-magnus.karlsson@intel.com
-
由 Magnus Karlsson 提交于
Rename the AF_XDP zero-copy driver interface functions to better reflect what they do after the replacement of umems with buffer pools in the previous commit. Mostly it is about replacing the umem name from the function names with xsk_buff and also have them take the a buffer pool pointer instead of a umem. The various ring functions have also been renamed in the process so that they have the same naming convention as the internal functions in xsk_queue.h. This so that it will be clearer what they do and also for consistency. Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NBjörn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-3-git-send-email-magnus.karlsson@intel.com
-
由 Magnus Karlsson 提交于
Replace the explicit umem reference passed to the driver in AF_XDP zero-copy mode with the buffer pool instead. This in preparation for extending the functionality of the zero-copy mode so that umems can be shared between queues on the same netdev and also between netdevs. In this commit, only an umem reference has been added to the buffer pool struct. But later commits will add other entities to it. These are going to be entities that are different between different queue ids and netdevs even though the umem is shared between them. Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NBjörn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-2-git-send-email-magnus.karlsson@intel.com
-
- 29 8月, 2020 1 次提交
-
-
由 Jose M. Guisado Gomez 提交于
Enables storing userdata for nft_table. Field udata points to user data and udlen store its length. Adds new attribute flag NFTA_TABLE_USERDATA Signed-off-by: NJose M. Guisado Gomez <guigom@riseup.net> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 27 8月, 2020 2 次提交
-
-
由 John Crispin 提交于
We want to reuse the functions and structs for other counters such as BSS color change. Rename them to more generic names. Signed-off-by: NJohn Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/20200811080107.3615705-2-john@phrozen.orgSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Miles Hu 提交于
This patch adds the nl80211 structs, definitions, policies and parsing code required to pass fixed HE rate, GI and LTF settings. Signed-off-by: NMiles Hu <milehu@codeaurora.org> Signed-off-by: NJohn Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/20200804081630.2013619-1-john@phrozen.org [fix comment] Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
- 26 8月, 2020 6 次提交
-
-
由 Eric Dumazet 提交于
This is no longer used, SCTP now uses a private helper. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 KP Singh 提交于
Adds support for both bpf_{sk, inode}_storage_{get, delete} to be used in LSM programs. These helpers are not used for tracing programs (currently) as their usage is tied to the life-cycle of the object and should only be used where the owning object won't be freed (when the owning object is passed as an argument to the LSM hook). Thus, they are safer to use in LSM hooks than tracing. Usage of local storage in tracing programs will probably follow a per function based whitelist approach. Since the UAPI helper signature for bpf_sk_storage expect a bpf_sock, it, leads to a compilation warning for LSM programs, it's also updated to accept a void * pointer instead. Signed-off-by: NKP Singh <kpsingh@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NMartin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200825182919.1118197-7-kpsingh@chromium.org
-
由 KP Singh 提交于
A purely mechanical change: bpf_sk_storage.c = bpf_sk_storage.c + bpf_local_storage.c bpf_sk_storage.h = bpf_sk_storage.h + bpf_local_storage.h Signed-off-by: NKP Singh <kpsingh@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NMartin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200825182919.1118197-5-kpsingh@chromium.org
-
由 KP Singh 提交于
Refactor the functionality in bpf_sk_storage.c so that concept of storage linked to kernel objects can be extended to other objects like inode, task_struct etc. Each new local storage will still be a separate map and provide its own set of helpers. This allows for future object specific extensions and still share a lot of the underlying implementation. This includes the changes suggested by Martin in: https://lore.kernel.org/bpf/20200725013047.4006241-1-kafai@fb.com/ adding new map operations to support bpf_local_storage maps: * storages for different kernel objects to optionally have different memory charging strategy (map_local_storage_charge, map_local_storage_uncharge) * Functionality to extract the storage pointer from a pointer to the owning object (map_owner_storage_ptr) Co-developed-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NKP Singh <kpsingh@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200825182919.1118197-4-kpsingh@chromium.org
-
由 KP Singh 提交于
Provide the a ability to define local storage caches on a per-object type basis. The caches and caching indices for different objects should not be inter-mixed as suggested in: https://lore.kernel.org/bpf/20200630193441.kdwnkestulg5erii@kafai-mbp.dhcp.thefacebook.com/ "Caching a sk-storage at idx=0 of a sk should not stop an inode-storage to be cached at the same idx of a inode." Signed-off-by: NKP Singh <kpsingh@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NMartin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200825182919.1118197-3-kpsingh@chromium.org
-
由 KP Singh 提交于
A purely mechanical change to split the renaming from the actual generalization. Flags/consts: SK_STORAGE_CREATE_FLAG_MASK BPF_LOCAL_STORAGE_CREATE_FLAG_MASK BPF_SK_STORAGE_CACHE_SIZE BPF_LOCAL_STORAGE_CACHE_SIZE MAX_VALUE_SIZE BPF_LOCAL_STORAGE_MAX_VALUE_SIZE Structs: bucket bpf_local_storage_map_bucket bpf_sk_storage_map bpf_local_storage_map bpf_sk_storage_data bpf_local_storage_data bpf_sk_storage_elem bpf_local_storage_elem bpf_sk_storage bpf_local_storage The "sk" member in bpf_local_storage is also updated to "owner" in preparation for changing the type to void * in a subsequent patch. Functions: selem_linked_to_sk selem_linked_to_storage selem_alloc bpf_selem_alloc __selem_unlink_sk bpf_selem_unlink_storage_nolock __selem_link_sk bpf_selem_link_storage_nolock selem_unlink_sk __bpf_selem_unlink_storage sk_storage_update bpf_local_storage_update __sk_storage_lookup bpf_local_storage_lookup bpf_sk_storage_map_free bpf_local_storage_map_free bpf_sk_storage_map_alloc bpf_local_storage_map_alloc bpf_sk_storage_map_alloc_check bpf_local_storage_map_alloc_check bpf_sk_storage_map_check_btf bpf_local_storage_map_check_btf Signed-off-by: NKP Singh <kpsingh@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NMartin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200825182919.1118197-2-kpsingh@chromium.org
-
- 25 8月, 2020 7 次提交
-
-
由 Martin KaFai Lau 提交于
This patch is adapted from Eric's patch in an earlier discussion [1]. The TCP_SAVE_SYN currently only stores the network header and tcp header. This patch allows it to optionally store the mac header also if the setsockopt's optval is 2. It requires one more bit for the "save_syn" bit field in tcp_sock. This patch achieves this by moving the syn_smc bit next to the is_mptcp. The syn_smc is currently used with the TCP experimental option. Since syn_smc is only used when CONFIG_SMC is enabled, this patch also puts the "IS_ENABLED(CONFIG_SMC)" around it like the is_mptcp did with "IS_ENABLED(CONFIG_MPTCP)". The mac_hdrlen is also stored in the "struct saved_syn" to allow a quick offset from the bpf prog if it chooses to start getting from the network header or the tcp header. [1]: https://lore.kernel.org/netdev/CANn89iLJNWh6bkH7DNhy_kmcAexuUCccqERqe7z2QsvPhGrYPQ@mail.gmail.com/Suggested-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Reviewed-by: NEric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/bpf/20200820190123.2886935-1-kafai@fb.com
-
由 Martin KaFai Lau 提交于
[ Note: The TCP changes here is mainly to implement the bpf pieces into the bpf_skops_*() functions introduced in the earlier patches. ] The earlier effort in BPF-TCP-CC allows the TCP Congestion Control algorithm to be written in BPF. It opens up opportunities to allow a faster turnaround time in testing/releasing new congestion control ideas to production environment. The same flexibility can be extended to writing TCP header option. It is not uncommon that people want to test new TCP header option to improve the TCP performance. Another use case is for data-center that has a more controlled environment and has more flexibility in putting header options for internal only use. For example, we want to test the idea in putting maximum delay ACK in TCP header option which is similar to a draft RFC proposal [1]. This patch introduces the necessary BPF API and use them in the TCP stack to allow BPF_PROG_TYPE_SOCK_OPS program to parse and write TCP header options. It currently supports most of the TCP packet except RST. Supported TCP header option: ─────────────────────────── This patch allows the bpf-prog to write any option kind. Different bpf-progs can write its own option by calling the new helper bpf_store_hdr_opt(). The helper will ensure there is no duplicated option in the header. By allowing bpf-prog to write any option kind, this gives a lot of flexibility to the bpf-prog. Different bpf-prog can write its own option kind. It could also allow the bpf-prog to support a recently standardized option on an older kernel. Sockops Callback Flags: ────────────────────── The bpf program will only be called to parse/write tcp header option if the following newly added callback flags are enabled in tp->bpf_sock_ops_cb_flags: BPF_SOCK_OPS_PARSE_UNKNOWN_HDR_OPT_CB_FLAG BPF_SOCK_OPS_PARSE_ALL_HDR_OPT_CB_FLAG BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG A few words on the PARSE CB flags. When the above PARSE CB flags are turned on, the bpf-prog will be called on packets received at a sk that has at least reached the ESTABLISHED state. The parsing of the SYN-SYNACK-ACK will be discussed in the "3 Way HandShake" section. The default is off for all of the above new CB flags, i.e. the bpf prog will not be called to parse or write bpf hdr option. There are details comment on these new cb flags in the UAPI bpf.h. sock_ops->skb_data and bpf_load_hdr_opt() ───────────────────────────────────────── sock_ops->skb_data and sock_ops->skb_data_end covers the whole TCP header and its options. They are read only. The new bpf_load_hdr_opt() helps to read a particular option "kind" from the skb_data. Please refer to the comment in UAPI bpf.h. It has details on what skb_data contains under different sock_ops->op. 3 Way HandShake ─────────────── The bpf-prog can learn if it is sending SYN or SYNACK by reading the sock_ops->skb_tcp_flags. * Passive side When writing SYNACK (i.e. sock_ops->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB), the received SYN skb will be available to the bpf prog. The bpf prog can use the SYN skb (which may carry the header option sent from the remote bpf prog) to decide what bpf header option should be written to the outgoing SYNACK skb. The SYN packet can be obtained by getsockopt(TCP_BPF_SYN*). More on this later. Also, the bpf prog can learn if it is in syncookie mode (by checking sock_ops->args[0] == BPF_WRITE_HDR_TCP_SYNACK_COOKIE). The bpf prog can store the received SYN pkt by using the existing bpf_setsockopt(TCP_SAVE_SYN). The example in a later patch does it. [ Note that the fullsock here is a listen sk, bpf_sk_storage is not very useful here since the listen sk will be shared by many concurrent connection requests. Extending bpf_sk_storage support to request_sock will add weight to the minisock and it is not necessary better than storing the whole ~100 bytes SYN pkt. ] When the connection is established, the bpf prog will be called in the existing PASSIVE_ESTABLISHED_CB callback. At that time, the bpf prog can get the header option from the saved syn and then apply the needed operation to the newly established socket. The later patch will use the max delay ack specified in the SYN header and set the RTO of this newly established connection as an example. The received ACK (that concludes the 3WHS) will also be available to the bpf prog during PASSIVE_ESTABLISHED_CB through the sock_ops->skb_data. It could be useful in syncookie scenario. More on this later. There is an existing getsockopt "TCP_SAVED_SYN" to return the whole saved syn pkt which includes the IP[46] header and the TCP header. A few "TCP_BPF_SYN*" getsockopt has been added to allow specifying where to start getting from, e.g. starting from TCP header, or from IP[46] header. The new getsockopt(TCP_BPF_SYN*) will also know where it can get the SYN's packet from: - (a) the just received syn (available when the bpf prog is writing SYNACK) and it is the only way to get SYN during syncookie mode. or - (b) the saved syn (available in PASSIVE_ESTABLISHED_CB and also other existing CB). The bpf prog does not need to know where the SYN pkt is coming from. The getsockopt(TCP_BPF_SYN*) will hide this details. Similarly, a flags "BPF_LOAD_HDR_OPT_TCP_SYN" is also added to bpf_load_hdr_opt() to read a particular header option from the SYN packet. * Fastopen Fastopen should work the same as the regular non fastopen case. This is a test in a later patch. * Syncookie For syncookie, the later example patch asks the active side's bpf prog to resend the header options in ACK. The server can use bpf_load_hdr_opt() to look at the options in this received ACK during PASSIVE_ESTABLISHED_CB. * Active side The bpf prog will get a chance to write the bpf header option in the SYN packet during WRITE_HDR_OPT_CB. The received SYNACK pkt will also be available to the bpf prog during the existing ACTIVE_ESTABLISHED_CB callback through the sock_ops->skb_data and bpf_load_hdr_opt(). * Turn off header CB flags after 3WHS If the bpf prog does not need to write/parse header options beyond the 3WHS, the bpf prog can clear the bpf_sock_ops_cb_flags to avoid being called for header options. Or the bpf-prog can select to leave the UNKNOWN_HDR_OPT_CB_FLAG on so that the kernel will only call it when there is option that the kernel cannot handle. [1]: draft-wang-tcpm-low-latency-opt-00 https://tools.ietf.org/html/draft-wang-tcpm-low-latency-opt-00Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200820190104.2885895-1-kafai@fb.com
-
由 Martin KaFai Lau 提交于
The bpf prog needs to parse the SYN header to learn what options have been sent by the peer's bpf-prog before writing its options into SYNACK. This patch adds a "syn_skb" arg to tcp_make_synack() and send_synack(). This syn_skb will eventually be made available (as read-only) to the bpf prog. This will be the only SYN packet available to the bpf prog during syncookie. For other regular cases, the bpf prog can also use the saved_syn. When writing options, the bpf prog will first be called to tell the kernel its required number of bytes. It is done by the new bpf_skops_hdr_opt_len(). The bpf prog will only be called when the new BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG is set in tp->bpf_sock_ops_cb_flags. When the bpf prog returns, the kernel will know how many bytes are needed and then update the "*remaining" arg accordingly. 4 byte alignment will be included in the "*remaining" before this function returns. The 4 byte aligned number of bytes will also be stored into the opts->bpf_opt_len. "bpf_opt_len" is a newly added member to the struct tcp_out_options. Then the new bpf_skops_write_hdr_opt() will call the bpf prog to write the header options. The bpf prog is only called if it has reserved spaces before (opts->bpf_opt_len > 0). The bpf prog is the last one getting a chance to reserve header space and writing the header option. These two functions are half implemented to highlight the changes in TCP stack. The actual codes preparing the bpf running context and invoking the bpf prog will be added in the later patch with other necessary bpf pieces. Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Reviewed-by: NEric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/bpf/20200820190052.2885316-1-kafai@fb.com
-
由 Martin KaFai Lau 提交于
In tcp_init_transfer(), it currently calls the bpf prog to give it a chance to handle the just "ESTABLISHED" event (e.g. do setsockopt on the newly established sk). Right now, it is done by calling the general purpose tcp_call_bpf(). In the later patch, it also needs to pass the just-received skb which concludes the 3 way handshake. E.g. the SYNACK received at the active side. The bpf prog can then learn some specific header options written by the peer's bpf-prog and potentially do setsockopt on the newly established sk. Thus, instead of reusing the general purpose tcp_call_bpf(), a new function bpf_skops_established() is added to allow passing the "skb" to the bpf prog. The actual skb passing from bpf_skops_established() to the bpf prog will happen together in a later patch which has the necessary bpf pieces. A "skb" arg is also added to tcp_init_transfer() such that it can then be passed to bpf_skops_established(). Calling the new bpf_skops_established() instead of tcp_call_bpf() should be a noop in this patch. Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200820190039.2884750-1-kafai@fb.com
-
由 Martin KaFai Lau 提交于
This patch adds bpf_setsockopt(TCP_BPF_RTO_MIN) to allow bpf prog to set the min rto of a connection. It could be used together with the earlier patch which has added bpf_setsockopt(TCP_BPF_DELACK_MAX). A later selftest patch will communicate the max delay ack in a bpf tcp header option and then the receiving side can use bpf_setsockopt(TCP_BPF_RTO_MIN) to set a shorter rto. Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Reviewed-by: NEric Dumazet <edumazet@google.com> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200820190027.2884170-1-kafai@fb.com
-
由 Martin KaFai Lau 提交于
This change is mostly from an internal patch and adapts it from sysctl config to the bpf_setsockopt setup. The bpf_prog can set the max delay ack by using bpf_setsockopt(TCP_BPF_DELACK_MAX). This max delay ack can be communicated to its peer through bpf header option. The receiving peer can then use this max delay ack and set a potentially lower rto by using bpf_setsockopt(TCP_BPF_RTO_MIN) which will be introduced in the next patch. Another later selftest patch will also use it like the above to show how to write and parse bpf tcp header option. Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Reviewed-by: NEric Dumazet <edumazet@google.com> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200820190021.2884000-1-kafai@fb.com
-
由 Martin KaFai Lau 提交于
The TCP_SAVE_SYN has both the network header and tcp header. The total length of the saved syn packet is currently stored in the first 4 bytes (u32) of an array and the actual packet data is stored after that. A later patch will add a bpf helper that allows to get the tcp header alone from the saved syn without the network header. It will be more convenient to have a direct offset to a specific header instead of re-parsing it. This requires to separately store the network hdrlen. The total header length (i.e. network + tcp) is still needed for the current usage in getsockopt. Although this total length can be obtained by looking into the tcphdr and then get the (th->doff << 2), this patch chooses to directly store the tcp hdrlen in the second four bytes of this newly created "struct saved_syn". By using a new struct, it can give a readable name to each individual header length. Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Reviewed-by: NEric Dumazet <edumazet@google.com> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200820190014.2883694-1-kafai@fb.com
-
- 24 8月, 2020 1 次提交
-
-
由 Tobias Klauser 提交于
Commit 32927393 ("sysctl: pass kernel pointers to ->proc_handler") changed ndisc_ifinfo_sysctl_change to take a kernel pointer. Adjust its prototype in net/ndisc.h as well to fix the following sparse warning: net/ipv6/ndisc.c:1838:5: error: symbol 'ndisc_ifinfo_sysctl_change' redeclared with different type (incompatible argument 3 (different address spaces)): net/ipv6/ndisc.c:1838:5: int extern [addressable] [signed] [toplevel] ndisc_ifinfo_sysctl_change( ... ) net/ipv6/ndisc.c: note: in included file (through include/net/ipv6.h): ./include/net/ndisc.h:496:5: note: previously declared as: ./include/net/ndisc.h:496:5: int extern [addressable] [signed] [toplevel] ndisc_ifinfo_sysctl_change( ... ) net/ipv6/ndisc.c: note: in included file (through include/net/ip6_route.h): Fixes: 32927393 ("sysctl: pass kernel pointers to ->proc_handler") Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NTobias Klauser <tklauser@distanz.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 8月, 2020 1 次提交
-
-
由 Florian Westphal 提交于
Following bug was reported via irc: nft list ruleset set knock_candidates_ipv4 { type ipv4_addr . inet_service size 65535 elements = { 127.0.0.1 . 123, 127.0.0.1 . 123 } } .. udp dport 123 add @knock_candidates_ipv4 { ip saddr . 123 } udp dport 123 add @knock_candidates_ipv4 { ip saddr . udp dport } It should not have been possible to add a duplicate set entry. After some debugging it turned out that the problem is the immediate value (123) in the second-to-last rule. Concatenations use 32bit registers, i.e. the elements are 8 bytes each, not 6 and it turns out the kernel inserted inet firewall @knock_candidates_ipv4 element 0100007f ffff7b00 : 0 [end] element 0100007f 00007b00 : 0 [end] Note the non-zero upper bits of the first element. It turns out that nft_immediate doesn't zero the destination register, but this is needed when the length isn't a multiple of 4. Furthermore, the zeroing in nft_payload is broken. We can't use [len / 4] = 0 -- if len is a multiple of 4, index is off by one. Skip zeroing in this case and use a conditional instead of (len -1) / 4. Fixes: 49499c3e ("netfilter: nf_tables: switch registers to 32 bit addressing") Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 21 8月, 2020 1 次提交
-
-
由 David Howells 提交于
Fix rxrpc_kernel_get_srtt() to indicate the validity of the returned smoothed RTT. If we haven't had any valid samples yet, the SRTT isn't useful. Fixes: c410bf01 ("rxrpc: Fix the excessive initial retransmission timeout") Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
- 19 8月, 2020 2 次提交
-
-
由 Xin Long 提交于
This patch is to do 3 things for ipv6_dev_find(): As David A. noticed, - rt6_lookup() is not really needed. Different from __ip_dev_find(), ipv6_dev_find() doesn't have a compatibility problem, so remove it. As Hideaki suggested, - "valid" (non-tentative) check for the address is also needed. ipv6_chk_addr() calls ipv6_chk_addr_and_flags(), which will traverse the address hash list, but it's heavy to be called inside ipv6_dev_find(). This patch is to reuse the code of ipv6_chk_addr_and_flags() for ipv6_dev_find(). - dev parameter is passed into ipv6_dev_find(), as link-local addresses from user space has sin6_scope_id set and the dev lookup needs it. Fixes: 81f6cb31 ("ipv6: add ipv6_dev_find()") Suggested-by: NYOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Reported-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Johannes Berg 提交于
Add range validation for NLA_BINARY, allowing validation of any combination of combination minimum or maximum lengths, using the existing NLA_POLICY_RANGE()/NLA_POLICY_FULL_RANGE() macros, just like for integers where the value is checked. Also make NLA_POLICY_EXACT_LEN(), NLA_POLICY_EXACT_LEN_WARN() and NLA_POLICY_MIN_LEN() special cases of this, removing the old types NLA_EXACT_LEN and NLA_MIN_LEN. This allows us to save some code where both minimum and maximum lengths are requires, currently the policy only allows maximum (NLA_BINARY), minimum (NLA_MIN_LEN) or exact (NLA_EXACT_LEN), so a range of lengths cannot be accepted and must be checked by the code that consumes the attributes later. Also, this allows advertising the correct ranges in the policy export to userspace. Here, NLA_MIN_LEN and NLA_EXACT_LEN already were special cases of NLA_BINARY with min and min/max length respectively. Signed-off-by: NJohannes Berg <johannes.berg@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-