- 07 1月, 2022 5 次提交
-
-
由 Wang Hai 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4PNEK CVE: NA ------------------------------------------------- Tcp compression is used to reduce the amount of data transmitted between multiple machines, which can increase the transmission capacity. The local tcp connection is a single machine transfer, so there is no meaning to use tcp compression. Ignore it by default. Enable by sysctl: echo 1 > /proc/net/ipv4/tcp_compression_local Signed-off-by: NWang Hai <wanghai38@huawei.com> Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NWang Yufen <wangyufen@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NLu Wei <luwei32@huawei.com> Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang Yufen 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4PNEK CVE: NA ------------------------------------------------- Only enable compression for give server ports, this means we will check either dport when send SYN or sport when send SYN-ACK. Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NWang Yufen <wangyufen@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NLu Wei <luwei32@huawei.com> Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wei Yongjun 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4PNEK CVE: NA ------------------------------------------------- Add sysctl interface for enable/disable tcp compression by ports. Example: $ echo 4000 > /proc/sys/net/ipv4/tcp_compression_ports will enable port 4000 for tcp compression $ echo 4000,5000 > /proc/sys/net/ipv4/tcp_compression_ports will enable both port 4000 and 5000 for tcp compression $ echo > /proc/sys/net/ipv4/tcp_compression_ports will disable tcp compression. Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NWang Yufen <wangyufen@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NLu Wei <luwei32@huawei.com> Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang Yufen 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4PNEK CVE: NA ------------------------------------------------- When establishing a tcp connection or closing it, the tcp compression needs to be initialized or cleaned up at the same time. Add dummy init and cleanup hook for tcp compression. It will be implemented later. Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NWang Yufen <wangyufen@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NLu Wei <luwei32@huawei.com> Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang Yufen 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4PNEK CVE: NA ------------------------------------------------- Add new tcp COMP option to SYN and SYN-ACK when tcp COMP is enabled. connection compress payload only when both side support it. Signed-off-by: NWang Yufen <wangyufen@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NLu Wei <luwei32@huawei.com> Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 31 12月, 2021 1 次提交
-
-
由 Wang Hai 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4HE7P?from=project-issue CVE: NA -------- Reserve some fields beforehand for net bpf framework related structures prone to change. --------- Signed-off-by: NWang Hai <wanghai38@huawei.com> Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com> Reviewed-by: NYue Haibing <yuehaibing@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 06 12月, 2021 1 次提交
-
-
由 Eric Dumazet 提交于
stable inclusion from stable-5.10.80 commit a342cb4772f44915b1e6688b101a41c01f9e71aa bugzilla: 185821 https://gitee.com/openeuler/kernel/issues/I4L7CG Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a342cb4772f44915b1e6688b101a41c01f9e71aa -------------------------------- [ Upstream commit 19757ceb ] Use of percpu_counter structure to track count of orphaned sockets is causing problems on modern hosts with 256 cpus or more. Stefan Bach reported a serious spinlock contention in real workloads, that I was able to reproduce with a netfilter rule dropping incoming FIN packets. 53.56% server [kernel.kallsyms] [k] queued_spin_lock_slowpath | ---queued_spin_lock_slowpath | --53.51%--_raw_spin_lock_irqsave | --53.51%--__percpu_counter_sum tcp_check_oom | |--39.03%--__tcp_close | tcp_close | inet_release | inet6_release | sock_close | __fput | ____fput | task_work_run | exit_to_usermode_loop | do_syscall_64 | entry_SYSCALL_64_after_hwframe | __GI___libc_close | --14.48%--tcp_out_of_resources tcp_write_timeout tcp_retransmit_timer tcp_write_timer_handler tcp_write_timer call_timer_fn expire_timers __run_timers run_timer_softirq __softirqentry_text_start As explained in commit cf86a086 ("net/dst: use a smaller percpu_counter batch for dst entries accounting"), default batch size is too big for the default value of tcp_max_orphans (262144). But even if we reduce batch sizes, there would still be cases where the estimated count of orphans is beyond the limit, and where tcp_too_many_orphans() has to call the expensive percpu_counter_sum_positive(). One solution is to use plain per-cpu counters, and have a timer to periodically refresh this cache. Updating this cache every 100ms seems about right, tcp pressure state is not radically changing over shorter periods. percpu_counter was nice 15 years ago while hosts had less than 16 cpus, not anymore by current standards. v2: Fix the build issue for CONFIG_CRYPTO_DEV_CHELSIO_TLS=m, reported by kernel test robot <lkp@intel.com> Remove unused socket argument from tcp_too_many_orphans() Fixes: dd24c001 ("net: Use a percpu_counter for orphan_count") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NStefan Bach <sfb@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Reviewed-by: NWeilong Chen <chenweilong@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 15 10月, 2021 1 次提交
-
-
由 Paolo Abeni 提交于
stable inclusion from stable-5.10.53 commit ea66fcb296058e0d41afa770f6900b335f8738a5 bugzilla: 175574 https://gitee.com/openeuler/kernel/issues/I4DTUX Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ea66fcb296058e0d41afa770f6900b335f8738a5 -------------------------------- commit 71158bb1 upstream. The MPTCP receive path is hooked only into the TCP slow-path. The DSS presence allows plain MPTCP traffic to hit that consistently. Since commit e1ff9e82 ("net: mptcp: improve fallback to TCP"), when an MPTCP socket falls back to TCP, it can hit the TCP receive fast-path, and delay or stop triggering the event notification. Address the issue explicitly disabling the header prediction for MPTCP sockets. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/200 Fixes: e1ff9e82 ("net: mptcp: improve fallback to TCP") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 09 4月, 2021 1 次提交
-
-
由 Eric Dumazet 提交于
stable inclusion from stable-5.10.20 commit 8e81baeb83a36c7ddeec6298b30b408d31c72aeb bugzilla: 50608 -------------------------------- [ Upstream commit f969dc5a ] While commit 24adbc16 ("tcp: fix SO_RCVLOWAT hangs with fat skbs") fixed an issue vs too small sk_rcvbuf for given sk_rcvlowat constraint, it missed to address issue caused by memory pressure. 1) If we are under memory pressure and socket receive queue is empty. First incoming packet is allowed to be queued, after commit 76dfa608 ("tcp: allow one skb to be received per socket under memory pressure") But we do not send EPOLLIN yet, in case tcp_data_ready() sees sk_rcvlowat is bigger than skb length. 2) Then, when next packet comes, it is dropped, and we directly call sk->sk_data_ready(). 3) If application is using poll(), tcp_poll() will then use tcp_stream_is_readable() and decide the socket receive queue is not yet filled, so nothing will happen. Even when sender retransmits packets, phases 2) & 3) repeat and flow is effectively frozen, until memory pressure is off. Fix is to consider tcp_under_memory_pressure() to take care of global memory pressure or memcg pressure. Fixes: 24adbc16 ("tcp: fix SO_RCVLOWAT hangs with fat skbs") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NArjun Roy <arjunroy@google.com> Suggested-by: NWei Wang <weiwan@google.com> Reviewed-by: NWei Wang <weiwan@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 09 2月, 2021 2 次提交
-
-
由 Pengcheng Yang 提交于
stable inclusion from stable-5.10.13 commit a9cd144eb74505420ce73047bed2bd5dca572d50 bugzilla: 47995 -------------------------------- commit 62d9f1a6 upstream. Upon receiving a cumulative ACK that changes the congestion state from Disorder to Open, the TLP timer is not set. If the sender is app-limited, it can only wait for the RTO timer to expire and retransmit. The reason for this is that the TLP timer is set before the congestion state changes in tcp_ack(), so we delay the time point of calling tcp_set_xmit_timer() until after tcp_fastretrans_alert() returns and remove the FLAG_SET_XMIT_TIMER from ack_flag when the RACK reorder timer is set. This commit has two additional benefits: 1) Make sure to reset RTO according to RFC6298 when receiving ACK, to avoid spurious RTO caused by RTO timer early expires. 2) Reduce the xmit timer reschedule once per ACK when the RACK reorder timer is set. Fixes: df92c839 ("tcp: fix xmit timer to only be reset if data ACKed/SACKed") Link: https://lore.kernel.org/netdev/1611311242-6675-1-git-send-email-yangpc@wangsu.comSigned-off-by: NPengcheng Yang <yangpc@wangsu.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NYuchung Cheng <ycheng@google.com> Cc: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/1611464834-23030-1-git-send-email-yangpc@wangsu.comSigned-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Enke Chen 提交于
stable inclusion from stable-5.10.13 commit 011c3d9427da318472d1e6f7de518cc288039927 bugzilla: 47995 -------------------------------- commit 344db93a upstream. The TCP_USER_TIMEOUT is checked by the 0-window probe timer. As the timer has backoff with a max interval of about two minutes, the actual timeout for TCP_USER_TIMEOUT can be off by up to two minutes. In this patch the TCP_USER_TIMEOUT is made more accurate by taking it into account when computing the timer value for the 0-window probes. This patch is similar to and builds on top of the one that made TCP_USER_TIMEOUT accurate for RTOs in commit b701a99e ("tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy"). Fixes: 9721e709 ("tcp: simplify window probe aborting on USER_TIMEOUT") Signed-off-by: NEnke Chen <enchen@paloaltonetworks.com> Reviewed-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20210122191306.GA99540@localhost.localdomainSigned-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
- 03 10月, 2020 1 次提交
-
-
由 Martin KaFai Lau 提交于
The commit 0813a841 ("bpf: tcp: Allow bpf prog to write and parse TCP header option") unnecessarily introduced bpf_skops_init_child() which limited the child sk from inheriting all bpf_sock_ops_cb_flags of the listen sk. That breaks existing user expectation. This patch removes the bpf_skops_init_child() and just allows sock_copy() to do its job to copy everything from listen sk to the child sk. Fixes: 0813a841 ("bpf: tcp: Allow bpf prog to write and parse TCP header option") Reported-by: NStanislav Fomichev <sdf@google.com> Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201002013448.2542025-1-kafai@fb.com
-
- 15 9月, 2020 1 次提交
-
-
由 Paolo Abeni 提交于
That is needed to let the subflows announce promptly when new space is available in the receive buffer. tcp_cleanup_rbuf() is currently a static function, drop the scope modifier and add a declaration in the TCP header. Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 9月, 2020 1 次提交
-
-
由 Neal Cardwell 提交于
Now that the previous patches ensure that all call sites for tcp_set_congestion_control() want to initialize congestion control, we can simplify tcp_set_congestion_control() by removing the reinit argument and the code to support it. Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NYuchung Cheng <ycheng@google.com> Acked-by: NKevin Yang <yyd@google.com> Cc: Lawrence Brakmo <brakmo@fb.com>
-
- 01 9月, 2020 1 次提交
-
-
由 Miaohe Lin 提交于
The arg exact_dif is not used anymore, remove it. inet_exact_dif_match() is no longer needed after the above is removed, so remove it too. Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 8月, 2020 4 次提交
-
-
由 Martin KaFai Lau 提交于
[ Note: The TCP changes here is mainly to implement the bpf pieces into the bpf_skops_*() functions introduced in the earlier patches. ] The earlier effort in BPF-TCP-CC allows the TCP Congestion Control algorithm to be written in BPF. It opens up opportunities to allow a faster turnaround time in testing/releasing new congestion control ideas to production environment. The same flexibility can be extended to writing TCP header option. It is not uncommon that people want to test new TCP header option to improve the TCP performance. Another use case is for data-center that has a more controlled environment and has more flexibility in putting header options for internal only use. For example, we want to test the idea in putting maximum delay ACK in TCP header option which is similar to a draft RFC proposal [1]. This patch introduces the necessary BPF API and use them in the TCP stack to allow BPF_PROG_TYPE_SOCK_OPS program to parse and write TCP header options. It currently supports most of the TCP packet except RST. Supported TCP header option: ─────────────────────────── This patch allows the bpf-prog to write any option kind. Different bpf-progs can write its own option by calling the new helper bpf_store_hdr_opt(). The helper will ensure there is no duplicated option in the header. By allowing bpf-prog to write any option kind, this gives a lot of flexibility to the bpf-prog. Different bpf-prog can write its own option kind. It could also allow the bpf-prog to support a recently standardized option on an older kernel. Sockops Callback Flags: ────────────────────── The bpf program will only be called to parse/write tcp header option if the following newly added callback flags are enabled in tp->bpf_sock_ops_cb_flags: BPF_SOCK_OPS_PARSE_UNKNOWN_HDR_OPT_CB_FLAG BPF_SOCK_OPS_PARSE_ALL_HDR_OPT_CB_FLAG BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG A few words on the PARSE CB flags. When the above PARSE CB flags are turned on, the bpf-prog will be called on packets received at a sk that has at least reached the ESTABLISHED state. The parsing of the SYN-SYNACK-ACK will be discussed in the "3 Way HandShake" section. The default is off for all of the above new CB flags, i.e. the bpf prog will not be called to parse or write bpf hdr option. There are details comment on these new cb flags in the UAPI bpf.h. sock_ops->skb_data and bpf_load_hdr_opt() ───────────────────────────────────────── sock_ops->skb_data and sock_ops->skb_data_end covers the whole TCP header and its options. They are read only. The new bpf_load_hdr_opt() helps to read a particular option "kind" from the skb_data. Please refer to the comment in UAPI bpf.h. It has details on what skb_data contains under different sock_ops->op. 3 Way HandShake ─────────────── The bpf-prog can learn if it is sending SYN or SYNACK by reading the sock_ops->skb_tcp_flags. * Passive side When writing SYNACK (i.e. sock_ops->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB), the received SYN skb will be available to the bpf prog. The bpf prog can use the SYN skb (which may carry the header option sent from the remote bpf prog) to decide what bpf header option should be written to the outgoing SYNACK skb. The SYN packet can be obtained by getsockopt(TCP_BPF_SYN*). More on this later. Also, the bpf prog can learn if it is in syncookie mode (by checking sock_ops->args[0] == BPF_WRITE_HDR_TCP_SYNACK_COOKIE). The bpf prog can store the received SYN pkt by using the existing bpf_setsockopt(TCP_SAVE_SYN). The example in a later patch does it. [ Note that the fullsock here is a listen sk, bpf_sk_storage is not very useful here since the listen sk will be shared by many concurrent connection requests. Extending bpf_sk_storage support to request_sock will add weight to the minisock and it is not necessary better than storing the whole ~100 bytes SYN pkt. ] When the connection is established, the bpf prog will be called in the existing PASSIVE_ESTABLISHED_CB callback. At that time, the bpf prog can get the header option from the saved syn and then apply the needed operation to the newly established socket. The later patch will use the max delay ack specified in the SYN header and set the RTO of this newly established connection as an example. The received ACK (that concludes the 3WHS) will also be available to the bpf prog during PASSIVE_ESTABLISHED_CB through the sock_ops->skb_data. It could be useful in syncookie scenario. More on this later. There is an existing getsockopt "TCP_SAVED_SYN" to return the whole saved syn pkt which includes the IP[46] header and the TCP header. A few "TCP_BPF_SYN*" getsockopt has been added to allow specifying where to start getting from, e.g. starting from TCP header, or from IP[46] header. The new getsockopt(TCP_BPF_SYN*) will also know where it can get the SYN's packet from: - (a) the just received syn (available when the bpf prog is writing SYNACK) and it is the only way to get SYN during syncookie mode. or - (b) the saved syn (available in PASSIVE_ESTABLISHED_CB and also other existing CB). The bpf prog does not need to know where the SYN pkt is coming from. The getsockopt(TCP_BPF_SYN*) will hide this details. Similarly, a flags "BPF_LOAD_HDR_OPT_TCP_SYN" is also added to bpf_load_hdr_opt() to read a particular header option from the SYN packet. * Fastopen Fastopen should work the same as the regular non fastopen case. This is a test in a later patch. * Syncookie For syncookie, the later example patch asks the active side's bpf prog to resend the header options in ACK. The server can use bpf_load_hdr_opt() to look at the options in this received ACK during PASSIVE_ESTABLISHED_CB. * Active side The bpf prog will get a chance to write the bpf header option in the SYN packet during WRITE_HDR_OPT_CB. The received SYNACK pkt will also be available to the bpf prog during the existing ACTIVE_ESTABLISHED_CB callback through the sock_ops->skb_data and bpf_load_hdr_opt(). * Turn off header CB flags after 3WHS If the bpf prog does not need to write/parse header options beyond the 3WHS, the bpf prog can clear the bpf_sock_ops_cb_flags to avoid being called for header options. Or the bpf-prog can select to leave the UNKNOWN_HDR_OPT_CB_FLAG on so that the kernel will only call it when there is option that the kernel cannot handle. [1]: draft-wang-tcpm-low-latency-opt-00 https://tools.ietf.org/html/draft-wang-tcpm-low-latency-opt-00Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200820190104.2885895-1-kafai@fb.com
-
由 Martin KaFai Lau 提交于
The bpf prog needs to parse the SYN header to learn what options have been sent by the peer's bpf-prog before writing its options into SYNACK. This patch adds a "syn_skb" arg to tcp_make_synack() and send_synack(). This syn_skb will eventually be made available (as read-only) to the bpf prog. This will be the only SYN packet available to the bpf prog during syncookie. For other regular cases, the bpf prog can also use the saved_syn. When writing options, the bpf prog will first be called to tell the kernel its required number of bytes. It is done by the new bpf_skops_hdr_opt_len(). The bpf prog will only be called when the new BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG is set in tp->bpf_sock_ops_cb_flags. When the bpf prog returns, the kernel will know how many bytes are needed and then update the "*remaining" arg accordingly. 4 byte alignment will be included in the "*remaining" before this function returns. The 4 byte aligned number of bytes will also be stored into the opts->bpf_opt_len. "bpf_opt_len" is a newly added member to the struct tcp_out_options. Then the new bpf_skops_write_hdr_opt() will call the bpf prog to write the header options. The bpf prog is only called if it has reserved spaces before (opts->bpf_opt_len > 0). The bpf prog is the last one getting a chance to reserve header space and writing the header option. These two functions are half implemented to highlight the changes in TCP stack. The actual codes preparing the bpf running context and invoking the bpf prog will be added in the later patch with other necessary bpf pieces. Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Reviewed-by: NEric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/bpf/20200820190052.2885316-1-kafai@fb.com
-
由 Martin KaFai Lau 提交于
In tcp_init_transfer(), it currently calls the bpf prog to give it a chance to handle the just "ESTABLISHED" event (e.g. do setsockopt on the newly established sk). Right now, it is done by calling the general purpose tcp_call_bpf(). In the later patch, it also needs to pass the just-received skb which concludes the 3 way handshake. E.g. the SYNACK received at the active side. The bpf prog can then learn some specific header options written by the peer's bpf-prog and potentially do setsockopt on the newly established sk. Thus, instead of reusing the general purpose tcp_call_bpf(), a new function bpf_skops_established() is added to allow passing the "skb" to the bpf prog. The actual skb passing from bpf_skops_established() to the bpf prog will happen together in a later patch which has the necessary bpf pieces. A "skb" arg is also added to tcp_init_transfer() such that it can then be passed to bpf_skops_established(). Calling the new bpf_skops_established() instead of tcp_call_bpf() should be a noop in this patch. Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200820190039.2884750-1-kafai@fb.com
-
由 Martin KaFai Lau 提交于
This patch adds bpf_setsockopt(TCP_BPF_RTO_MIN) to allow bpf prog to set the min rto of a connection. It could be used together with the earlier patch which has added bpf_setsockopt(TCP_BPF_DELACK_MAX). A later selftest patch will communicate the max delay ack in a bpf tcp header option and then the receiving side can use bpf_setsockopt(TCP_BPF_RTO_MIN) to set a shorter rto. Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Reviewed-by: NEric Dumazet <edumazet@google.com> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200820190027.2884170-1-kafai@fb.com
-
- 11 8月, 2020 1 次提交
-
-
由 Jason Baron 提交于
When TFO keys are read back on big endian systems either via the global sysctl interface or via getsockopt() using TCP_FASTOPEN_KEY, the values don't match what was written. For example, on s390x: # echo "1-2-3-4" > /proc/sys/net/ipv4/tcp_fastopen_key # cat /proc/sys/net/ipv4/tcp_fastopen_key 02000000-01000000-04000000-03000000 Instead of: # cat /proc/sys/net/ipv4/tcp_fastopen_key 00000001-00000002-00000003-00000004 Fix this by converting to the correct endianness on read. This was reported by Colin Ian King when running the 'tcp_fastopen_backup_key' net selftest on s390x, which depends on the read value matching what was written. I've confirmed that the test now passes on big and little endian systems. Signed-off-by: NJason Baron <jbaron@akamai.com> Fixes: 438ac880 ("net: fastopen: robustness and endianness fixes for SipHash") Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Eric Dumazet <edumazet@google.com> Reported-and-tested-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 8月, 2020 1 次提交
-
-
由 Florian Westphal 提交于
If SYN packet contains MP_CAPABLE option, keep it enabled. Syncokie validation and cookie-based socket creation is changed to instantiate an mptcp request sockets if the ACK contains an MPTCP connection request. Rather than extend both cookie_v4/6_check, add a common helper to create the (mp)tcp request socket. Suggested-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 7月, 2020 2 次提交
-
-
由 Christoph Hellwig 提交于
Rework the remaining setsockopt code to pass a sockptr_t instead of a plain user pointer. This removes the last remaining set_fs(KERNEL_DS) outside of architecture specific code. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> [ieee802154] Acked-by: NMatthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Christoph Hellwig 提交于
Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 7月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
Handle the few cases that need special treatment in-line using in_compat_syscall(). This also removes all the now unused compat_{get,set}sockopt methods. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 6月, 2020 1 次提交
-
-
由 Yonghong Song 提交于
A new field bpf_seq_afinfo is added to tcp_iter_state to provide bpf tcp iterator afinfo. There are two reasons on why we did this. First, the current way to get afinfo from PDE_DATA does not work for bpf iterator as its seq_file inode does not conform to /proc/net/{tcp,tcp6} inode structures. More specifically, anonymous bpf iterator will use an anonymous inode which is shared in the system and we cannot change inode private data structure at all. Second, bpf iterator for tcp/tcp6 wants to traverse all tcp and tcp6 sockets in one pass and bpf program can control whether they want to skip one sk_family or not. Having a different afinfo with family AF_UNSPEC make it easier to understand in the code. This patch does not change /proc/net/{tcp,tcp6} behavior as the bpf_seq_afinfo will be NULL for these two proc files. Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NMartin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200623230804.3987829-1-yhs@fb.com
-
- 24 6月, 2020 4 次提交
-
-
由 Eric Dumazet 提交于
This patch removes following (C=1 W=1) warnings for CONFIG_RETPOLINE=y : net/ipv4/tcp_offload.c:306:16: warning: symbol 'tcp4_gro_receive' was not declared. Should it be static? net/ipv4/tcp_offload.c:306:17: warning: no previous prototype for 'tcp4_gro_receive' [-Wmissing-prototypes] net/ipv4/tcp_offload.c:319:29: warning: symbol 'tcp4_gro_complete' was not declared. Should it be static? net/ipv4/tcp_offload.c:319:29: warning: no previous prototype for 'tcp4_gro_complete' [-Wmissing-prototypes] CHECK net/ipv6/tcpv6_offload.c net/ipv6/tcpv6_offload.c:16:16: warning: symbol 'tcp6_gro_receive' was not declared. Should it be static? net/ipv6/tcpv6_offload.c:29:29: warning: symbol 'tcp6_gro_complete' was not declared. Should it be static? CC net/ipv6/tcpv6_offload.o net/ipv6/tcpv6_offload.c:16:17: warning: no previous prototype for 'tcp6_gro_receive' [-Wmissing-prototypes] 16 | struct sk_buff *tcp6_gro_receive(struct list_head *head, struct sk_buff *skb) | ^~~~~~~~~~~~~~~~ net/ipv6/tcpv6_offload.c:29:29: warning: no previous prototype for 'tcp6_gro_complete' [-Wmissing-prototypes] 29 | INDIRECT_CALLABLE_SCOPE int tcp6_gro_complete(struct sk_buff *skb, int thoff) | ^~~~~~~~~~~~~~~~~ Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Declare ipv4_specific once, in tcp.h were it belongs. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
ipv6_specific should be declared in tcp include files, not mptcp. This removes the following warning : CHECK net/ipv6/tcp_ipv6.c net/ipv6/tcp_ipv6.c:78:42: warning: symbol 'ipv6_specific' was not declared. Should it be static? Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Remove these errors: net/ipv6/tcp_ipv6.c:1550:29: warning: symbol 'tcp_v6_rcv' was not declared. Should it be static? net/ipv6/tcp_ipv6.c:1770:30: warning: symbol 'tcp_v6_early_demux' was not declared. Should it be static? net/ipv6/tcp_ipv6.c:1550:29: warning: no previous prototype for 'tcp_v6_rcv' [-Wmissing-prototypes] 1550 | INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb) | ^~~~~~~~~~ net/ipv6/tcp_ipv6.c:1770:30: warning: no previous prototype for 'tcp_v6_early_demux' [-Wmissing-prototypes] 1770 | INDIRECT_CALLABLE_SCOPE void tcp_v6_early_demux(struct sk_buff *skb) | ^~~~~~~~~~~~~~~~~~ Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 6月, 2020 2 次提交
-
-
由 Eric Dumazet 提交于
Mitigate RETPOLINE costs in __tcp_transmit_skb() by using INDIRECT_CALL_INET() wrapper. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Mitigate RETPOLINE costs in __tcp_transmit_skb() by using INDIRECT_CALL_INET() wrapper. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 5月, 2020 1 次提交
-
-
由 Florian Westphal 提交于
As of commit 98fa6271 ("tcp: refactor setting the initial congestion window") this is called only from tcp_input.c, so it can be static. Signed-off-by: NFlorian Westphal <fw@strlen.de> Reviewed-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 5月, 2020 1 次提交
-
-
由 Eric Dumazet 提交于
Make tcp_ld_RTO_revert() helper available to IPv6, and implement RFC 6069 : Quoting this RFC : 3. Connectivity Disruption Indication For Internet Protocol version 6 (IPv6) [RFC2460], the counterpart of the ICMP destination unreachable message of code 0 (net unreachable) and of code 1 (host unreachable) is the ICMPv6 destination unreachable message of code 0 (no route to destination) [RFC4443]. As with IPv4, a router should generate an ICMPv6 destination unreachable message of code 0 in response to a packet that cannot be delivered to its destination address because it lacks a matching entry in its routing table. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NYuchung Cheng <ycheng@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 5月, 2020 1 次提交
-
-
由 Eric Dumazet 提交于
We autotune rcvbuf whenever SO_RCVLOWAT is set to account for 100% overhead in tcp_set_rcvlowat() This works well when skb->len/skb->truesize ratio is bigger than 0.5 But if we receive packets with small MSS, we can end up in a situation where not enough bytes are available in the receive queue to satisfy RCVLOWAT setting. As our sk_rcvbuf limit is hit, we send zero windows in ACK packets, preventing remote peer from sending more data. Even autotuning does not help, because it only triggers at the time user process drains the queue. If no EPOLLIN is generated, this can not happen. Note poll() has a similar issue, after commit c7004482 ("tcp: Respect SO_RCVLOWAT in tcp_poll().") Fixes: 03f45c88 ("tcp: avoid extra wakeups for SO_RCVLOWAT users") Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 5月, 2020 1 次提交
-
-
由 Eric Biggers 提交于
<linux/cryptohash.h> sounds very generic and important, like it's the header to include if you're doing cryptographic hashing in the kernel. But actually it only includes the library implementation of the SHA-1 compression function (not even the full SHA-1). This should basically never be used anymore; SHA-1 is no longer considered secure, and there are much better ways to do cryptographic hashing in the kernel. Most files that include this header don't actually need it. So in preparation for removing it, remove all these unneeded includes of it. Signed-off-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
- 07 5月, 2020 2 次提交
-
-
由 Maciej Żenczykowski 提交于
it doesn't actually exist... Test: builds and 'git grep tcp_default_init_rwnd' comes up empty Signed-off-by: NMaciej Żenczykowski <maze@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
With the addition of horizon feature to sch_fq, we noticed some suboptimal behavior of extremely low pacing rate TCP flows, especially when TCP is not aware of a drop happening in lower stacks. Back in commit 3f80e08f ("tcp: add tcp_reset_xmit_timer() helper"), tcp_pacing_delay() was added to estimate an extra delay to add to standard rto timers. This patch removes the skb argument from this helper and tcp_reset_xmit_timer() because it makes more sense to simply consider the time at which next packet is allowed to be sent, instead of the time of whatever packet has been sent. This avoids arming RTO timer too soon and removes spurious horizon drops. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 5月, 2020 1 次提交
-
-
由 Cambda Zhu 提交于
This patch changes the behavior of TCP_LINGER2 about its limit. The sysctl_tcp_fin_timeout used to be the limit of TCP_LINGER2 but now it's only the default value. A new macro named TCP_FIN_TIMEOUT_MAX is added as the limit of TCP_LINGER2, which is 2 minutes. Since TCP_LINGER2 used sysctl_tcp_fin_timeout as the default value and the limit in the past, the system administrator cannot set the default value for most of sockets and let some sockets have a greater timeout. It might be a mistake that let the sysctl to be the limit of the TCP_LINGER2. Maybe we can add a new sysctl to set the max of TCP_LINGER2, but FIN-WAIT-2 timeout is usually no need to be too long and 2 minutes are legal considering TCP specs. Changes in v3: - Remove the new socket option and change the TCP_LINGER2 behavior so that the timeout can be set to value between sysctl_tcp_fin_timeout and 2 minutes. Changes in v2: - Add int overflow check for the new socket option. Changes in v1: - Add a new socket option to set timeout greater than sysctl_tcp_fin_timeout. Signed-off-by: NCambda Zhu <cambda@linux.alibaba.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 4月, 2020 1 次提交
-
-
由 Eric Dumazet 提交于
TCP stack is dumb in how it cooks its output packets. Depending on MAX_HEADER value, we might chose a bad ending point for the headers. If we align the end of TCP headers to cache line boundary, we make sure to always use the smallest number of cache lines, which always help. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 3月, 2020 1 次提交
-
-
由 YueHaibing 提交于
After commit f747632b ("bpf: sockmap: Move generic sockmap hooks from BPF TCP"), tcp_bpf_recvmsg() is not used out of tcp_bpf.c, so make it static and remove it from tcp.h. Also move it to BPF_STREAM_PARSER #ifdef to fix unused function warnings. Fixes: f747632b ("bpf: sockmap: Move generic sockmap hooks from BPF TCP") Signed-off-by: NYueHaibing <yuehaibing@huawei.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200320023426.60684-3-yuehaibing@huawei.com
-