openeuler / Kernel
大约 1 年前同步成功

5

0

0

代码
- 文件
- 提交
- 分支
- Tags
- 贡献者
- 分支图
- Diff
Issue 0
- 列表
- 看板
- 标记
- 里程碑
合并请求 0
DevOps
Wiki 0
- Wiki
分析
- 仓库
- DevOps
项目成员
Pages

体验新版 GitCode，发现更多精彩内容 >>

28 10月, 2017 7 次提交

E

tcp: Namespace-ify sysctl_tcp_min_tso_segs · 26e9596e

由 Eric Dumazet 提交于 10月 27, 2017

Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

26e9596e

E

tcp: Namespace-ify sysctl_tcp_challenge_ack_limit · b530b681

由 Eric Dumazet 提交于 10月 27, 2017

Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b530b681

E

tcp: Namespace-ify sysctl_tcp_limit_output_bytes · 9184d8bb

由 Eric Dumazet 提交于 10月 27, 2017

Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9184d8bb

E
tcp: Namespace-ify sysctl_tcp_workaround_signed_windows · ceef9ab6
由 Eric Dumazet 提交于 10月 27, 2017
```
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
ceef9ab6

E

tcp: Namespace-ify sysctl_tcp_tso_win_divisor · d06a9904

由 Eric Dumazet 提交于 10月 27, 2017

Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d06a9904

E

tcp: Namespace-ify sysctl_tcp_moderate_rcvbuf · 4540c0cf

由 Eric Dumazet 提交于 10月 27, 2017

Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4540c0cf

E

tcp: Namespace-ify sysctl_tcp_nometrics_save · ec36e416

由 Eric Dumazet 提交于 10月 27, 2017

Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec36e416

27 10月, 2017 15 次提交

E

tcp: Namespace-ify sysctl_tcp_frto · af9b69a7

由 Eric Dumazet 提交于 10月 26, 2017

Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

af9b69a7

E

tcp: Namespace-ify sysctl_tcp_adv_win_scale · 94f0893e

由 Eric Dumazet 提交于 10月 26, 2017

Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

94f0893e

E

tcp: Namespace-ify sysctl_tcp_app_win · 0c12654a

由 Eric Dumazet 提交于 10月 26, 2017

Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c12654a

E

tcp: Namespace-ify sysctl_tcp_dsack · 6496f6bd

由 Eric Dumazet 提交于 10月 26, 2017

Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6496f6bd

E

tcp: Namespace-ify sysctl_tcp_max_reordering · c6e21803

由 Eric Dumazet 提交于 10月 26, 2017

Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c6e21803

E

tcp: remove stale sysctl_tcp_reordering · 773d4bb9

由 Eric Dumazet 提交于 10月 26, 2017

This extern is no longer used.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

773d4bb9

E

tcp: Namespace-ify sysctl_tcp_fack · 0bc65a28

由 Eric Dumazet 提交于 10月 26, 2017

Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0bc65a28

E

tcp: Namespace-ify sysctl_tcp_abort_on_overflow · 65c9410c

由 Eric Dumazet 提交于 10月 26, 2017

Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

65c9410c

E

tcp: Namespace-ify sysctl_tcp_rfc1337 · 625357aa

由 Eric Dumazet 提交于 10月 26, 2017

Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

625357aa

E

tcp: Namespace-ify sysctl_tcp_stdurg · 3f4c7c6f

由 Eric Dumazet 提交于 10月 26, 2017

Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f4c7c6f

E

tcp: Namespace-ify sysctl_tcp_retrans_collapse · e0a1e5b5

由 Eric Dumazet 提交于 10月 26, 2017

Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e0a1e5b5

E

tcp: Namespace-ify sysctl_tcp_slow_start_after_idle · b510f0d2

由 Eric Dumazet 提交于 10月 26, 2017

Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b510f0d2

E

tcp: Namespace-ify sysctl_tcp_thin_linear_timeouts · 2c04ac8a

由 Eric Dumazet 提交于 10月 26, 2017

Note that sysctl_tcp_thin_dupack was not used, I deleted it.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2c04ac8a

E

tcp: Namespace-ify sysctl_tcp_recovery · e20223f1

由 Eric Dumazet 提交于 10月 26, 2017

Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e20223f1

E

tcp: Namespace-ify sysctl_tcp_early_retrans · 2ae21cf5

由 Eric Dumazet 提交于 10月 26, 2017

Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2ae21cf5

26 10月, 2017 1 次提交

U

tcp: TCP experimental option for SMC · 60e2a778

由 Ursula Braun 提交于 10月 25, 2017

The SMC protocol [1] relies on the use of a new TCP experimental
option [2, 3]. With this option, SMC capabilities are exchanged
between peers during the TCP three way handshake. This patch adds
support for this experimental option to TCP.

References:
[1] SMC-R Informational RFC: http://www.rfc-editor.org/info/rfc7609
[2] Shared Use of TCP Experimental Options RFC 6994:
    https://tools.ietf.org/rfc/rfc6994.txt
[3] IANA ExID SMCR:
http://www.iana.org/assignments/tcp-parameters/tcp-parameters.xhtml#tcp-exidsSigned-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60e2a778

24 10月, 2017 1 次提交

C

tcp: Configure TFO without cookie per socket and/or per route · 71c02379

由 Christoph Paasch 提交于 10月 23, 2017

We already allow to enable TFO without a cookie by using the
fastopen-sysctl and setting it to TFO_SERVER_COOKIE_NOT_REQD (or
TFO_CLIENT_NO_COOKIE).
This is safe to do in certain environments where we know that there
isn't a malicous host (aka., data-centers) or when the
application-protocol already provides an authentication mechanism in the
first flight of data.

A server however might be providing multiple services or talking to both
sides (public Internet and data-center). So, this server would want to
enable cookie-less TFO for certain services and/or for connections that
go to the data-center.

This patch exposes a socket-option and a per-route attribute to enable such
fine-grained configurations.
Signed-off-by: NChristoph Paasch <cpaasch@apple.com>
Reviewed-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

71c02379

20 10月, 2017 2 次提交

Y

tcp: socket option to set TCP fast open key · 1fba70e5

由 Yuchung Cheng 提交于 10月 18, 2017

New socket option TCP_FASTOPEN_KEY to allow different keys per
listener.  The listener by default uses the global key until the
socket option is set.  The key is a 16 bytes long binary data. This
option has no effect on regular non-listener TCP sockets.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Reviewed-by: NChristoph Paasch <cpaasch@apple.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1fba70e5

J

bpf: avoid preempt enable/disable in sockmap using tcp_skb_cb region · 34f79502

由 John Fastabend 提交于 10月 18, 2017

SK_SKB BPF programs are run from the socket/tcp context but early in
the stack before much of the TCP metadata is needed in tcp_skb_cb. So
we can use some unused fields to place BPF metadata needed for SK_SKB
programs when implementing the redirect function.

This allows us to drop the preempt disable logic. It does however
require an API change so sk_redirect_map() has been updated to
additionally provide ctx_ptr to skb. Note, we do however continue to
disable/enable preemption around actual BPF program running to account
for map updates.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

34f79502

13 10月, 2017 1 次提交

E

tcp: remove obsolete helpers · 437d2762

由 Eric Dumazet 提交于 10月 11, 2017

Remove three inline helpers that are no longer needed.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

437d2762

12 10月, 2017 1 次提交

E

tcp: fix tcp_unlink_write_queue() · 4a269818

由 Eric Dumazet 提交于 10月 11, 2017

Yury reported crash with this signature :

[  554.034021] [<ffff80003ccd5a58>] 0xffff80003ccd5a58
[  554.034156] [<ffff00000888fd34>] skb_release_all+0x14/0x30
[  554.034288] [<ffff00000888fd64>] __kfree_skb+0x14/0x28
[  554.034409] [<ffff0000088ece6c>] tcp_sendmsg_locked+0x4dc/0xcc8
[  554.034541] [<ffff0000088ed68c>] tcp_sendmsg+0x34/0x58
[  554.034659] [<ffff000008919fd4>] inet_sendmsg+0x2c/0xf8
[  554.034783] [<ffff0000088842e8>] sock_sendmsg+0x18/0x30
[  554.034928] [<ffff0000088861fc>] SyS_sendto+0x84/0xf8

Problem is that skb->destructor contains garbage, and this is
because I accidentally removed tcp_skb_tsorted_anchor_cleanup()
from tcp_unlink_write_queue()

This would trigger with a write(fd, <invalid_memory>, len) attempt,
and we will add to packetdrill this capability to avoid future
regressions.

Fixes: 75c119af ("tcp: implement rb-tree based retransmit queue")
Reported-by: NYury Norov <ynorov@caviumnetworks.com>
Tested-by: NYury Norov <ynorov@caviumnetworks.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4a269818

07 10月, 2017 2 次提交

E

tcp: implement rb-tree based retransmit queue · 75c119af

由 Eric Dumazet 提交于 10月 05, 2017

Using a linear list to store all skbs in write queue has been okay
for quite a while : O(N) is not too bad when N < 500.

Things get messy when N is the order of 100,000 : Modern TCP stacks
want 10Gbit+ of throughput even with 200 ms RTT flows.

40 ns per cache line miss means a full scan can use 4 ms,
blowing away CPU caches.

SACK processing often can use various hints to avoid parsing
whole retransmit queue. But with high packet losses and/or high
reordering, hints no longer work.

Sender has to process thousands of unfriendly SACK, accumulating
a huge socket backlog, burning a cpu and massively dropping packets.

Using an rb-tree for retransmit queue has been avoided for years
because it added complexity and overhead, but now is the time
to be more resistant and say no to quadratic behavior.

1) RTX queue is no longer part of the write queue : already sent skbs
are stored in one rb-tree.

2) Since reaching the head of write queue no longer needs
sk->sk_send_head, we added an union of sk_send_head and tcp_rtx_queue

Tested:

 On receiver :
 netem on ingress : delay 150ms 200us loss 1
 GRO disabled to force stress and SACK storms.

for f in `seq 1 10`
do
 ./netperf -H lpaa6 -l30 -- -K bbr -o THROUGHPUT|tail -1
done | awk '{print $0} {sum += $0} END {printf "%7u\n",sum}'

Before patch :

323.87
351.48
339.59
338.62
306.72
204.07
304.93
291.88
202.47
176.88
   2840

After patch:

1700.83
2207.98
2070.17
1544.26
2114.76
2124.89
1693.14
1080.91
2216.82
1299.94
  18053
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

75c119af

E

tcp: uninline tcp_write_queue_purge() · ac3f09ba

由 Eric Dumazet 提交于 10月 05, 2017

Since the upcoming rtx rbtree will add some extra code,
it is time to not inline this fat function anymore.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac3f09ba

06 10月, 2017 2 次提交

E

tcp: new list for sent but unacked skbs for RACK recovery · e2080072

由 Eric Dumazet 提交于 10月 04, 2017

This patch adds a new queue (list) that tracks the sent but not yet
acked or SACKed skbs for a TCP connection. The list is chronologically
ordered by skb->skb_mstamp (the head is the oldest sent skb).

This list will be used to optimize TCP Rack recovery, which checks
an skb's timestamp to judge if it has been lost and needs to be
retransmitted. Since TCP write queue is ordered by sequence instead
of sent time, RACK has to scan over the write queue to catch all
eligible packets to detect lost retransmission, and iterates through
SACKed skbs repeatedly.

Special cares for rare events:
1. TCP repair fakes skb transmission so the send queue needs adjusted
2. SACK reneging would require re-inserting SACKed skbs into the
   send queue. For now I believe it's not worth the complexity to
   make RACK work perfectly on SACK reneging, so we do nothing here.
3. Fast Open: currently for non-TFO, send-queue correctly queues
   the pure SYN packet. For TFO which queues a pure SYN and
   then a data packet, send-queue only queues the data packet but
   not the pure SYN due to the structure of TFO code. This is okay
   because the SYN receiver would never respond with a SACK on a
   missing SYN (i.e. SYN is never fast-retransmitted by SACK/RACK).

In order to not grow sk_buff, we use an union for the new list and
_skb_refdst/destructor fields. This is a bit complicated because
we need to make sure _skb_refdst and destructor are properly zeroed
before skb is cloned/copied at transmit, and before being freed.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e2080072

W

tcp: uniform the set up of sockets after successful connection · 27204aaa

由 Wei Wang 提交于 10月 04, 2017

Currently in the TCP code, the initialization sequence for cached
metrics, congestion control, BPF, etc, after successful connection
is very inconsistent. This introduces inconsistent bevhavior and is
prone to bugs. The current call sequence is as follows:

(1) for active case (tcp_finish_connect() case):
        tcp_mtup_init(sk);
        icsk->icsk_af_ops->rebuild_header(sk);
        tcp_init_metrics(sk);
        tcp_call_bpf(sk, BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB);
        tcp_init_congestion_control(sk);
        tcp_init_buffer_space(sk);

(2) for passive case (tcp_rcv_state_process() TCP_SYN_RECV case):
        icsk->icsk_af_ops->rebuild_header(sk);
        tcp_call_bpf(sk, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
        tcp_init_congestion_control(sk);
        tcp_mtup_init(sk);
        tcp_init_buffer_space(sk);
        tcp_init_metrics(sk);

(3) for TFO passive case (tcp_fastopen_create_child()):
        inet_csk(child)->icsk_af_ops->rebuild_header(child);
        tcp_init_congestion_control(child);
        tcp_mtup_init(child);
        tcp_init_metrics(child);
        tcp_call_bpf(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
        tcp_init_buffer_space(child);

This commit uniforms the above functions to have the following sequence:
        tcp_mtup_init(sk);
        icsk->icsk_af_ops->rebuild_header(sk);
        tcp_init_metrics(sk);
        tcp_call_bpf(sk, BPF_SOCK_OPS_ACTIVE/PASSIVE_ESTABLISHED_CB);
        tcp_init_congestion_control(sk);
        tcp_init_buffer_space(sk);
This sequence is the same as the (1) active case. We pick this sequence
because this order correctly allows BPF to override the settings
including congestion control module and initial cwnd, etc from
the route, and then allows the CC module to see those settings.
Suggested-by: NNeal Cardwell <ncardwell@google.com>
Tested-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NWei Wang <weiwan@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Acked-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

27204aaa

02 10月, 2017 3 次提交

H

ipv4: Namespaceify tcp_fastopen_key knob · 43713848

由 Haishuang Yan 提交于 9月 27, 2017

Different namespace application might require different tcp_fastopen_key
independently of the host.

David Miller pointed out there is a leak without releasing the context
of tcp_fastopen_key during netns teardown. So add the release action in
exit_batch path.

Tested:
1. Container namespace:
# cat /proc/sys/net/ipv4/tcp_fastopen_key:
2817fff2-f803cf97-eadfd1f3-78c0992b

cookie key in tcp syn packets:
Fast Open Cookie
    Kind: TCP Fast Open Cookie (34)
    Length: 10
    Fast Open Cookie: 1e5dd82a8c492ca9

2. Host:
# cat /proc/sys/net/ipv4/tcp_fastopen_key:
107d7c5f-68eb2ac7-02fb06e6-ed341702

cookie key in tcp syn packets:
Fast Open Cookie
    Kind: TCP Fast Open Cookie (34)
    Length: 10
    Fast Open Cookie: e213c02bf0afbc8a
Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

43713848

H

ipv4: Remove the 'publish' logic in tcp_fastopen_init_key_once · dd000598

由 Haishuang Yan 提交于 9月 27, 2017

The 'publish' logic is not necessary after commit dfea2aa6 ("tcp:
Do not call tcp_fastopen_reset_cipher from interrupt context"), because
in tcp_fastopen_cookie_gen，it wouldn't call tcp_fastopen_init_key_once.
Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dd000598

H

ipv4: Namespaceify tcp_fastopen knob · e1cfcbe8

由 Haishuang Yan 提交于 9月 27, 2017

Different namespace application might require enable TCP Fast Open
feature independently of the host.

This patch series continues making more of the TCP Fast Open related
sysctl knobs be per net-namespace.
Reported-by: NLuca BRUNO <lucab@debian.org>
Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e1cfcbe8

01 10月, 2017 1 次提交

P

IPv4: early demux can return an error code · 7487449c

由 Paolo Abeni 提交于 9月 28, 2017

Currently no error is emitted, but this infrastructure will
used by the next patch to allow source address validation
for mcast sockets.
Since early demux can do a route lookup and an ipv4 route
lookup can return an error code this is consistent with the
current ipv4 route infrastructure.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7487449c

20 9月, 2017 1 次提交

E

net: sk_buff rbnode reorg · bffa72cf

由 Eric Dumazet 提交于 9月 19, 2017

skb->rbnode shares space with skb->next, skb->prev and skb->tstamp

Current uses (TCP receive ofo queue and netem) need to save/restore
tstamp, while skb->dev is either NULL (TCP) or a constant for a given
queue (netem).

Since we plan using an RB tree for TCP retransmit queue to speedup SACK
processing with large BDP, this patch exchanges skb->dev and
skb->tstamp.

This saves some overhead in both TCP and netem.

v2: removes the swtstamp field from struct tcp_skb_cb
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Wei Wang <weiwan@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bffa72cf

19 9月, 2017 1 次提交

Y

tcp: remove two unused functions · 4c712441

由 Yuchung Cheng 提交于 9月 18, 2017

remove tcp_may_send_now and tcp_snd_test that are no longer used

Fixes: 840a3cbe ("tcp: remove forward retransmit feature")
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4c712441

31 8月, 2017 2 次提交

F

tcp: Revert "tcp: remove header prediction" · 31770e34

由 Florian Westphal 提交于 8月 30, 2017

This reverts commit 45f119bf.

Eric Dumazet says:
  We found at Google a significant regression caused by
  45f119bf tcp: remove header prediction

  In typical RPC  (TCP_RR), when a TCP socket receives data, we now call
  tcp_ack() while we used to not call it.

  This touches enough cache lines to cause a slowdown.

so problem does not seem to be HP removal itself but the tcp_ack()
call.  Therefore, it might be possible to remove HP after all, provided
one finds a way to elide tcp_ack for most cases.
Reported-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31770e34

F

tcp: Revert "tcp: remove CA_ACK_SLOWPATH" · c1d2b4c3

由 Florian Westphal 提交于 8月 30, 2017

This change was a followup to the header prediction removal,
so first revert this as a prerequisite to back out hp removal.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c1d2b4c3