- 14 7月, 2018 2 次提交
-
-
由 Yuchung Cheng 提交于
After fixing the way DCTCP tracking delayed ACKs, the delayed-ACK related callbacks are no longer needed Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NLawrence Brakmo <brakmo@fb.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuchung Cheng 提交于
Previously, when a data segment was sent an ACK was piggybacked on the data segment without generating a CA_EVENT_NON_DELAYED_ACK event to notify congestion control modules. So the DCTCP ca->delayed_ack_reserved flag could incorrectly stay set when in fact there were no delayed ACKs being reserved. This could result in sending a special ECN notification ACK that carries an older ACK sequence, when in fact there was no need for such an ACK. DCTCP keeps track of the delayed ACK status with its own separate state ca->delayed_ack_reserved. Previously it may accidentally cancel the delayed ACK without updating this field upon sending a special ACK that carries a older ACK sequence. This inconsistency would lead to DCTCP receiver never acknowledging the latest data until the sender times out and retry in some cases. Packetdrill script (provided by Larry Brakmo) 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 0.000 setsockopt(3, SOL_TCP, TCP_CONGESTION, "dctcp", 5) = 0 0.000 bind(3, ..., ...) = 0 0.000 listen(3, 1) = 0 0.100 < [ect0] SEW 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> 0.100 > SE. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8> 0.110 < [ect0] . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 0.200 < [ect0] . 1:1001(1000) ack 1 win 257 0.200 > [ect01] . 1:1(0) ack 1001 0.200 write(4, ..., 1) = 1 0.200 > [ect01] P. 1:2(1) ack 1001 0.200 < [ect0] . 1001:2001(1000) ack 2 win 257 0.200 write(4, ..., 1) = 1 0.200 > [ect01] P. 2:3(1) ack 2001 0.200 < [ect0] . 2001:3001(1000) ack 3 win 257 0.200 < [ect0] . 3001:4001(1000) ack 3 win 257 0.200 > [ect01] . 3:3(0) ack 4001 0.210 < [ce] P. 4001:4501(500) ack 3 win 257 +0.001 read(4, ..., 4500) = 4500 +0 write(4, ..., 1) = 1 +0 > [ect01] PE. 3:4(1) ack 4501 +0.010 < [ect0] W. 4501:5501(1000) ack 4 win 257 // Previously the ACK sequence below would be 4501, causing a long RTO +0.040~+0.045 > [ect01] . 4:4(0) ack 5501 // delayed ack +0.311 < [ect0] . 5501:6501(1000) ack 4 win 257 // More data +0 > [ect01] . 4:4(0) ack 6501 // now acks everything +0.500 < F. 9501:9501(0) ack 4 win 257 Reported-by: NLarry Brakmo <brakmo@fb.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NLawrence Brakmo <brakmo@fb.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 7月, 2018 2 次提交
-
-
由 Stefan Baranoff 提交于
Under rare conditions where repair code may be used it is possible that window probes are either unnecessary or undesired. If the user knows that window probes are not wanted or needed this change allows them to skip sending them when a socket comes out of repair. Signed-off-by: NStefan Baranoff <sbaranoff@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stefan Baranoff 提交于
This patch fixes a bug where the sequence numbers of a socket created using TCP repair functionality are lower than set after connect is called. This occurs when the repair socket overlaps with a TIME-WAIT socket and triggers the re-use code. The amount lower is equal to the number of times that a particular IP/port set is re-used and then put back into TIME-WAIT. Re-using the first time the sequence number is 1 lower, closing that socket and then re-opening (with repair) a new socket with the same addresses/ports puts the sequence number 2 lower than set via setsockopt. The third time is 3 lower, etc. I have not tested what the limit of this acrewal is, if any. The fix is, if a socket is in repair mode, to respect the already set sequence number and timestamp when it would have already re-used the TIME-WAIT socket. Signed-off-by: NStefan Baranoff <sbaranoff@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 7月, 2018 4 次提交
-
-
由 Eric Dumazet 提交于
tcp_zerocopy_receive() relies on tcp_inq() to limit number of bytes requested by user. syzbot found that after tcp_disconnect(), tcp_inq() was returning a stale value (number of bytes in queue before the disconnect). Note that after this patch, ioctl(fd, SIOCINQ, &val) is also fixed and returns 0, so this might be a candidate for all known linux kernels. While we are at this, we probably also should clear urg_data to avoid other syzkaller reports after it discovers how to deal with urgent data. syzkaller repro : socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3 bind(3, {sa_family=AF_INET, sin_port=htons(20000), sin_addr=inet_addr("224.0.0.1")}, 16) = 0 connect(3, {sa_family=AF_INET, sin_port=htons(20000), sin_addr=inet_addr("127.0.0.1")}, 16) = 0 send(3, ..., 4096, 0) = 4096 connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 128) = 0 getsockopt(3, SOL_TCP, TCP_ZEROCOPY_RECEIVE, ..., [16]) = 0 // CRASH Fixes: 05255b82 ("tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: Nsyzbot <syzkaller@googlegroups.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> -
由 Paolo Abeni 提交于
Setting the low threshold to 0 has no effect on frags allocation, we need to clear high_thresh instead. The code was pre-existent to commit 648700f7 ("inet: frags: use rhashtables for reassembly units"), but before the above, such assignment had a different role: prevent concurrent eviction from the worker and the netns cleanup helper. Fixes: 648700f7 ("inet: frags: use rhashtables for reassembly units") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Lorenzo Colitti 提交于
When tcp_diag_destroy closes a TCP_NEW_SYN_RECV socket, it first frees it by calling inet_csk_reqsk_queue_drop_and_and_put in tcp_abort, and then frees it again by calling sock_gen_put. Since tcp_abort only has one caller, and all the other codepaths in tcp_abort don't free the socket, just remove the free in that function. Cc: David Ahern <dsa@cumulusnetworks.com> Tested: passes Android sock_diag_test.py, which exercises this codepath Fixes: d7226c7a ("net: diag: Fix refcnt leak in error path destroying socket") Signed-off-by: NLorenzo Colitti <lorenzo@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Reviewed-by: NDavid Ahern <dsa@cumulusnetworks.com> Tested-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Xin reported that icmp replies may not use the address on the device the echo request is received if the destination address is broadcast. Instead a route lookup is done without considering VRF context. Fix by setting oif in flow struct to the master device if it is enslaved. That directs the lookup to the VRF table. If the device is not enslaved, oif is still 0 so no affect. Fixes: cd2fbe1b ("net: Use VRF device index for lookups on RX") Reported-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 7月, 2018 2 次提交
-
-
由 Máté Eckl 提交于
This patch fixes a silent out-of-bound read possibility that was present because of the misuse of this function. Mostly it was called with a struct udphdr *hp which had only the udphdr part linearized by the skb_header_pointer, however nf_tproxy_get_sock_v{4,6} uses it as a tcphdr pointer, so some reads for tcp specific attributes may be invalid. Fixes: a583636a ("inet: refactor inet[6]_lookup functions to take skb") Signed-off-by: NMáté Eckl <ecklm94@gmail.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> -
由 Tyler Hicks 提交于
The low and high values of the net.ipv4.ping_group_range sysctl were being silently forced to the default disabled state when a write to the sysctl contained GIDs that didn't map to the associated user namespace. Confusingly, the sysctl's write operation would return success and then a subsequent read of the sysctl would indicate that the low and high values are the overflowgid. This patch changes the behavior by clearly returning an error when the sysctl write operation receives a GID range that doesn't map to the associated user namespace. In such a situation, the previous value of the sysctl is preserved and that range will be returned in a subsequent read of the sysctl. Signed-off-by: NTyler Hicks <tyhicks@canonical.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 7月, 2018 1 次提交
-
-
由 Florian Westphal 提交于
nft_compat relies on xt_request_find_match to increment refcount of the module that provides the match/target. The (builtin) icmp matches did't set the module owner so it was possible to rmmod ip(6)tables while icmp extensions were still in use. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 02 7月, 2018 1 次提交
-
-
由 Sabrina Dubroca 提交于
Since the addition of GRO for ESP, gro_receive can consume the skb and return -EINPROGRESS. In that case, the lower layer GRO handler cannot touch the skb anymore. Commit 5f114163 ("net: Add a skb_gro_flush_final helper.") converted some of the gro_receive handlers that can lead to ESP's gro_receive so that they wouldn't access the skb when -EINPROGRESS is returned, but missed other spots, mainly in tunneling protocols. This patch finishes the conversion to using skb_gro_flush_final(), and adds a new helper, skb_gro_flush_final_remcsum(), used in VXLAN and GUE. Fixes: 5f114163 ("net: Add a skb_gro_flush_final helper.") Signed-off-by: NSabrina Dubroca <sd@queasysnail.net> Reviewed-by: NStefano Brivio <sbrivio@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 7月, 2018 1 次提交
-
-
由 Ilpo Järvinen 提交于
If SACK is not enabled and the first cumulative ACK after the RTO retransmission covers more than the retransmitted skb, a spurious FRTO undo will trigger (assuming FRTO is enabled for that RTO). The reason is that any non-retransmitted segment acknowledged will set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is no indication that it would have been delivered for real (the scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK case so the check for that bit won't help like it does with SACK). Having FLAG_ORIG_SACK_ACKED set results in the spurious FRTO undo in tcp_process_loss. We need to use more strict condition for non-SACK case and check that none of the cumulatively ACKed segments were retransmitted to prove that progress is due to original transmissions. Only then keep FLAG_ORIG_SACK_ACKED set, allowing FRTO undo to proceed in non-SACK case. (FLAG_ORIG_SACK_ACKED is planned to be renamed to FLAG_ORIG_PROGRESS to better indicate its purpose but to keep this change minimal, it will be done in another patch). Besides burstiness and congestion control violations, this problem can result in RTO loop: When the loss recovery is prematurely undoed, only new data will be transmitted (if available) and the next retransmission can occur only after a new RTO which in case of multiple losses (that are not for consecutive packets) requires one RTO per loss to recover. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Tested-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 6月, 2018 1 次提交
-
-
由 Yuchung Cheng 提交于
Fast Open key could be stored in different endian based on the CPU. Previously hosts in different endianness in a server farm using the same key config (sysctl value) would produce different cookies. This patch fixes it by always storing it as little endian to keep same API for LE hosts. Reported-by: NDaniele Iamartino <danielei@google.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 6月, 2018 1 次提交
-
-
由 Linus Torvalds 提交于
The poll() changes were not well thought out, and completely unexplained. They also caused a huge performance regression, because "->poll()" was no longer a trivial file operation that just called down to the underlying file operations, but instead did at least two indirect calls. Indirect calls are sadly slow now with the Spectre mitigation, but the performance problem could at least be largely mitigated by changing the "->get_poll_head()" operation to just have a per-file-descriptor pointer to the poll head instead. That gets rid of one of the new indirections. But that doesn't fix the new complexity that is completely unwarranted for the regular case. The (undocumented) reason for the poll() changes was some alleged AIO poll race fixing, but we don't make the common case slower and more complex for some uncommon special case, so this all really needs way more explanations and most likely a fundamental redesign. [ This revert is a revert of about 30 different commits, not reverted individually because that would just be unnecessarily messy - Linus ] Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 28 6月, 2018 1 次提交
-
-
由 Eric Dumazet 提交于
Larry Brakmo proposal ( https://patchwork.ozlabs.org/patch/935233/ tcp: force cwnd at least 2 in tcp_cwnd_reduction) made us rethink about our recent patch removing ~16 quick acks after ECN events. tcp_enter_quickack_mode(sk, 1) makes sure one immediate ack is sent, but in the case the sender cwnd was lowered to 1, we do not want to have a delayed ack for the next packet we will receive. Fixes: 522040ea ("tcp: do not aggressively quick ack after ECN events") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NNeal Cardwell <ncardwell@google.com> Cc: Lawrence Brakmo <brakmo@fb.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 6月, 2018 2 次提交
-
-
由 Willem de Bruijn 提交于
The ipcm(6)_cookie field gso_size is set only in the udp path. The ip layer copies this to cork only if sk_type is SOCK_DGRAM. This check proved too permissive. Ping and l2tp sockets have the same type. Limit to sockets of type SOCK_DGRAM and protocol IPPROTO_UDP to exclude ping sockets. v1 -> v2 - remove irrelevant whitespace changes Fixes: bec1f6f6 ("udp: generate gso with UDP_SEGMENT") Reported-by: NMaciej Żenczykowski <maze@google.com> Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Similar to 69678bcd ("udp: fix SO_BINDTODEVICE"), TCP socket lookups need to fail if dev_match is not true. Currently, a packet to a given port can match a socket bound to device when it should not. In the VRF case, this causes the lookup to hit a VRF socket and not a global socket resulting in a response trying to go through the VRF when it should not. Fixes: 3fa6f616 ("net: ipv4: add second dif to inet socket lookups") Fixes: 4297a0ef ("net: ipv6: add second dif to inet6 socket lookups") Reported-by: NLou Berger <lberger@labn.net> Diagnosed-by: NRenato Westphal <renato@opensourcerouting.org> Tested-by: NRenato Westphal <renato@opensourcerouting.org> Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 6月, 2018 1 次提交
-
-
由 Frank van der Linden 提交于
commit 079096f1 ("tcp/dccp: install syn_recv requests into ehash table") introduced an optimization for the handling of child sockets created for a new TCP connection. But this optimization passes any data associated with the last ACK of the connection handshake up the stack without verifying its checksum, because it calls tcp_child_process(), which in turn calls tcp_rcv_state_process() directly. These lower-level processing functions do not do any checksum verification. Insert a tcp_checksum_complete call in the TCP_NEW_SYN_RECEIVE path to fix this. Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: NFrank van der Linden <fllinden@amazon.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Tested-by: NBalbir Singh <bsingharora@gmail.com> Reviewed-by: NBalbir Singh <bsingharora@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 6月, 2018 2 次提交
-
-
由 Kees Cook 提交于
The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(char) * COUNT + COUNT , ...) | kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) | kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) | kzalloc(sizeof(TYPE) * C2, ...) | kzalloc(C1 * C2 * C3, ...) | kzalloc(C1 * C2, ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) | - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: NKees Cook <keescook@chromium.org> -
由 Kees Cook 提交于
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This patch replaces cases of: kmalloc(a * b, gfp) with: kmalloc_array(a * b, gfp) as well as handling cases of: kmalloc(a * b * c, gfp) with: kmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The tools/ directory was manually excluded, since it has its own implementation of kmalloc(). The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(char) * COUNT + COUNT , ...) | kmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kmalloc + kmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kmalloc(C1 * C2 * C3, ...) | kmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kmalloc(sizeof(THING) * C2, ...) | kmalloc(sizeof(TYPE) * C2, ...) | kmalloc(C1 * C2 * C3, ...) | kmalloc(C1 * C2, ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - (E1) * E2 + E1, E2 , ...) | - kmalloc + kmalloc_array ( - (E1) * (E2) + E1, E2 , ...) | - kmalloc + kmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: NKees Cook <keescook@chromium.org>
-
- 12 6月, 2018 1 次提交
-
-
由 David Miller 提交于
This is not necessary. skb_gro_receive() will never change what 'head' points to. In it's original implementation (see commit 71d93b39 ("net: Add skb_gro_receive")), it did: ==================== + *head = nskb; + nskb->next = p->next; + p->next = NULL; ==================== This sequence was removed in commit 58025e46 ("net: gro: remove obsolete code from skb_gro_receive()") Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NEric Dumazet <edumazet@google.com>
-
- 11 6月, 2018 1 次提交
-
-
由 Soheil Hassas Yeganeh 提交于
The user-provided value to setsockopt(SO_RCVLOWAT) can be larger than the maximum possible receive buffer. Such values mute POLLIN signals on the socket which can stall progress on the socket. Limit the user-provided value to half of the maximum receive buffer, i.e., half of sk_rcvbuf when the receive buffer size is set by the user, or otherwise half of sysctl_tcp_rmem[2]. Fixes: d1361840 ("tcp: fix SO_RCVLOWAT and RCVBUF autotuning") Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Reviewed-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 6月, 2018 1 次提交
-
-
由 Paolo Abeni 提交于
After commit 6b229cf7 ("udp: add batching to udp_rmem_release()") the sk_rmem_alloc field does not measure exactly anymore the receive queue length, because we batch the rmem release. The issue is really apparent only after commit 0d4a6608 ("udp: do rmem bulk free even if the rx sk queue is empty"): the user space can easily check for an empty socket with not-0 queue length reported by the 'ss' tool or the procfs interface. We need to use a custom UDP helper to report the correct queue length, taking into account the forward allocation deficit. Reported-by: trevor.francis@46labs.com Fixes: 6b229cf7 ("UDP: add batching to udp_rmem_release()") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 6月, 2018 2 次提交
-
-
由 Florian Westphal 提交于
syzbot reports following splat: BUG: KMSAN: uninit-value in ebt_stp_mt_check+0x24b/0x450 net/bridge/netfilter/ebt_stp.c:162 ebt_stp_mt_check+0x24b/0x450 net/bridge/netfilter/ebt_stp.c:162 xt_check_match+0x1438/0x1650 net/netfilter/x_tables.c:506 ebt_check_match net/bridge/netfilter/ebtables.c:372 [inline] ebt_check_entry net/bridge/netfilter/ebtables.c:702 [inline] The uninitialised access is xt_mtchk_param->nft_compat ... which should be set to 0. Fix it by zeroing the struct beforehand, same for tgchk. ip(6)tables targetinfo uses c99-style initialiser, so no change needed there. Reported-by: syzbot+da4494182233c23a5fcf@syzkaller.appspotmail.com Fixes: 55917a21 ("netfilter: x_tables: add context to know if extension runs from nft_compat") Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Sultan Alsawaf 提交于
By passing a limit of 2 bytes to strncat, strncat is limited to writing fewer bytes than what it's supposed to append to the name here. Since the bounds are checked on the line above this, just remove the string bounds checks entirely since they're unneeded. Signed-off-by: NSultan Alsawaf <sultanxda@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 6月, 2018 2 次提交
-
-
由 Eric Dumazet 提交于
Before using nla_get_u32(), better make sure the attribute is of the proper size. Code recently was changed, but bug has been there from beginning of git. BUG: KMSAN: uninit-value in rtnetlink_put_metrics+0x553/0x960 net/core/rtnetlink.c:746 CPU: 1 PID: 14139 Comm: syz-executor6 Not tainted 4.17.0-rc5+ #103 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:113 kmsan_report+0x149/0x260 mm/kmsan/kmsan.c:1084 __msan_warning_32+0x6e/0xc0 mm/kmsan/kmsan_instr.c:686 rtnetlink_put_metrics+0x553/0x960 net/core/rtnetlink.c:746 fib_dump_info+0xc42/0x2190 net/ipv4/fib_semantics.c:1361 rtmsg_fib+0x65f/0x8c0 net/ipv4/fib_semantics.c:419 fib_table_insert+0x2314/0x2b50 net/ipv4/fib_trie.c:1287 inet_rtm_newroute+0x210/0x340 net/ipv4/fib_frontend.c:779 rtnetlink_rcv_msg+0xa32/0x1560 net/core/rtnetlink.c:4646 netlink_rcv_skb+0x378/0x600 net/netlink/af_netlink.c:2448 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4664 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0x1678/0x1750 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg net/socket.c:639 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117 __sys_sendmsg net/socket.c:2155 [inline] __do_sys_sendmsg net/socket.c:2164 [inline] __se_sys_sendmsg net/socket.c:2162 [inline] __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162 do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x455a09 RSP: 002b:00007faae5fd8c68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007faae5fd96d4 RCX: 0000000000455a09 RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000013 RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000000005d0 R14: 00000000006fdc20 R15: 0000000000000000 Uninit was stored to memory at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline] kmsan_save_stack mm/kmsan/kmsan.c:294 [inline] kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:685 __msan_chain_origin+0x69/0xc0 mm/kmsan/kmsan_instr.c:529 fib_convert_metrics net/ipv4/fib_semantics.c:1056 [inline] fib_create_info+0x2d46/0x9dc0 net/ipv4/fib_semantics.c:1150 fib_table_insert+0x3e4/0x2b50 net/ipv4/fib_trie.c:1146 inet_rtm_newroute+0x210/0x340 net/ipv4/fib_frontend.c:779 rtnetlink_rcv_msg+0xa32/0x1560 net/core/rtnetlink.c:4646 netlink_rcv_skb+0x378/0x600 net/netlink/af_netlink.c:2448 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4664 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0x1678/0x1750 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg net/socket.c:639 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117 __sys_sendmsg net/socket.c:2155 [inline] __do_sys_sendmsg net/socket.c:2164 [inline] __se_sys_sendmsg net/socket.c:2162 [inline] __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162 do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline] kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:189 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:315 kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan.c:322 slab_post_alloc_hook mm/slab.h:446 [inline] slab_alloc_node mm/slub.c:2753 [inline] __kmalloc_node_track_caller+0xb32/0x11b0 mm/slub.c:4395 __kmalloc_reserve net/core/skbuff.c:138 [inline] __alloc_skb+0x2cb/0x9e0 net/core/skbuff.c:206 alloc_skb include/linux/skbuff.h:988 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline] netlink_sendmsg+0x76e/0x1350 net/netlink/af_netlink.c:1876 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg net/socket.c:639 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117 __sys_sendmsg net/socket.c:2155 [inline] __do_sys_sendmsg net/socket.c:2164 [inline] __se_sys_sendmsg net/socket.c:2162 [inline] __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162 do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: a919525a ("net: Move fib_convert_metrics to metrics file") Fixes: 1da177e4 ("Linux-2.6.12-rc2") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: Nsyzbot <syzkaller@googlegroups.com> Cc: David Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sabrina Dubroca 提交于
commit 0bbbf0e7 ("ipmr, ip6mr: Unite creation of new mr_table") refactored ipmr_new_table, so that it now returns NULL when mr_table_alloc fails. Unfortunately, all callers of ipmr_new_table expect an ERR_PTR. This can result in NULL deref, for example when ipmr_rules_exit calls ipmr_free_table with NULL net->ipv4.mrt in the !CONFIG_IP_MROUTE_MULTIPLE_TABLES version. This patch makes mr_table_alloc return errors, and changes ip6mr_new_table and its callers to return/expect error pointers as well. It also removes the version of mr_table_alloc defined under !CONFIG_IP_MROUTE_COMMON, since it is never used. Fixes: 0bbbf0e7 ("ipmr, ip6mr: Unite creation of new mr_table") Signed-off-by: NSabrina Dubroca <sd@queasysnail.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 6月, 2018 4 次提交
-
-
由 Arnd Bergmann 提交于
It is now possible to enable the libified nf_tproxy modules without also enabling NETFILTER_XT_TARGET_TPROXY, which throws off the ifdef logic in the udp core code: net/ipv6/netfilter/nf_tproxy_ipv6.o: In function `nf_tproxy_get_sock_v6': nf_tproxy_ipv6.c:(.text+0x1a8): undefined reference to `udp6_lib_lookup' net/ipv4/netfilter/nf_tproxy_ipv4.o: In function `nf_tproxy_get_sock_v4': nf_tproxy_ipv4.c:(.text+0x3d0): undefined reference to `udp4_lib_lookup' We can actually simplify the conditions now to provide the two functions exactly when they are needed. Fixes: 45ca4e0c ("netfilter: Libify xt_TPROXY") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Acked-by: NPaolo Abeni <pabeni@redhat.com> Acked-by: NMáté Eckl <ecklm94@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Maciej Żenczykowski 提交于
Tested: 'git grep tw_timeout' comes up empty and it builds :-) Signed-off-by: NMaciej Żenczykowski <maze@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yousuk Seung 提交于
Refactor tcp_ecn_check_ce and __tcp_ecn_check_ce to accept struct sock* instead of tcp_sock* to clean up type casts. This is a pure refactor patch. Signed-off-by: NYousuk Seung <ysseung@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Maciej Żenczykowski 提交于
This changes the /proc/sys/net/ipv4/tcp_tw_reuse from a boolean to an integer. It now takes the values 0, 1 and 2, where 0 and 1 behave as before, while 2 enables timewait socket reuse only for sockets that we can prove are loopback connections: ie. bound to 'lo' interface or where one of source or destination IPs is 127.0.0.0/8, ::ffff:127.0.0.0/104 or ::1. This enables quicker reuse of ephemeral ports for loopback connections - where tcp_tw_reuse is 100% safe from a protocol perspective (this assumes no artificially induced packet loss on 'lo'). This also makes estblishing many loopback connections *much* faster (allocating ports out of the first half of the ephemeral port range is significantly faster, then allocating from the second half) Without this change in a 32K ephemeral port space my sample program (it just establishes and closes [::1]:ephemeral -> [::1]:server_port connections in a tight loop) fails after 32765 connections in 24 seconds. With it enabled 50000 connections only take 4.7 seconds. This is particularly problematic for IPv6 where we only have one local address and cannot play tricks with varying source IP from 127.0.0.0/8 pool. Signed-off-by: NMaciej Żenczykowski <maze@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Wei Wang <weiwan@google.com> Change-Id: I0377961749979d0301b7b62871a32a4b34b654e1 Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 6月, 2018 1 次提交
-
-
由 Máté Eckl 提交于
The extracted functions will likely be usefull to implement tproxy support in nf_tables. Extrancted functions: - nf_tproxy_sk_is_transparent - nf_tproxy_laddr4 - nf_tproxy_handle_time_wait4 - nf_tproxy_get_sock_v4 - nf_tproxy_laddr6 - nf_tproxy_handle_time_wait6 - nf_tproxy_get_sock_v6 (nf_)tproxy_handle_time_wait6 also needed some refactor as its current implementation was xtables-specific. Signed-off-by: NMáté Eckl <ecklm94@gmail.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 02 6月, 2018 1 次提交
-
-
由 Nicolas Dichtel 提交于
After commit f6cc9c05, the following conf is broken (note that the default loopback mtu is 65536, ie IP_MAX_MTU + 1): $ ip tunnel add gre1 mode gre local 10.125.0.1 remote 10.125.0.2 dev lo add tunnel "gre0" failed: Invalid argument $ ip l a type dummy $ ip l s dummy1 up $ ip l s dummy1 mtu 65535 $ ip tunnel add gre1 mode gre local 10.125.0.1 remote 10.125.0.2 dev dummy1 add tunnel "gre0" failed: Invalid argument dev_set_mtu() doesn't allow to set a mtu which is too large. First, let's cap the mtu returned by ip_tunnel_bind_dev(). Second, remove the magic value 0xFFF8 and use IP_MAX_MTU instead. 0xFFF8 seems to be there for ages, I don't know why this value was used. With a recent kernel, it's also possible to set a mtu > IP_MAX_MTU: $ ip l s dummy1 mtu 66000 After that patch, it's also possible to bind an ip tunnel on that kind of interface. CC: Petr Machata <petrm@mellanox.com> CC: Ido Schimmel <idosch@mellanox.com> Link: https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=e5afd356a411a Fixes: f6cc9c05 ("ip_tunnel: Emit events for post-register MTU changes") Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 6月, 2018 1 次提交
-
-
由 Yafang Shao 提交于
This is additional to the commit ea1627c2 ("tcp: minor optimizations around tcp_hdr() usage"). At this point, skb->data is same with tcp_hdr() as tcp header has not been pulled yet. So use the less expensive one to get the tcp header. Remove the third parameter of tcp_rcv_established() and put it into the function body. Furthermore, the local variables are listed as a reverse christmas tree :) Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: NYafang Shao <laoar.shao@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 5月, 2018 3 次提交
-
-
由 David Ahern 提交于
Add support for IFA_RT_PRIORITY to ipv4 addresses. If the metric is changed on an existing address then the new route is inserted before removing the old one. Since the metric is one of the route keys, the prefix route can not be replaced. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Wei Yongjun 提交于
Fixes the following sparse warnings: net/ipv4/bpfilter/sockopt.c:13:5: warning: symbol 'bpfilter_mbox_request' was not declared. Should it be static? Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
Instead of using extra modules for these, turn the config options into an implicit dependency that adds masq feature to the protocol specific nf_nat module. before: text data bss dec hex filename 2001 860 4 2865 b31 net/ipv4/netfilter/nf_nat_masquerade_ipv4.ko 5579 780 2 6361 18d9 net/ipv4/netfilter/nf_nat_ipv4.ko 2860 836 8 3704 e78 net/ipv6/netfilter/nf_nat_masquerade_ipv6.ko 6648 780 2 7430 1d06 net/ipv6/netfilter/nf_nat_ipv6.ko after: text data bss dec hex filename 7245 872 8 8125 1fbd net/ipv4/netfilter/nf_nat_ipv4.ko 9165 848 12 10025 2729 net/ipv6/netfilter/nf_nat_ipv6.ko Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 28 5月, 2018 1 次提交
-
-
由 Andrey Ignatov 提交于
In addition to already existing BPF hooks for sys_bind and sys_connect, the patch provides new hooks for sys_sendmsg. It leverages existing BPF program type `BPF_PROG_TYPE_CGROUP_SOCK_ADDR` that provides access to socket itlself (properties like family, type, protocol) and user-passed `struct sockaddr *` so that BPF program can override destination IP and port for system calls such as sendto(2) or sendmsg(2) and/or assign source IP to the socket. The hooks are implemented as two new attach types: `BPF_CGROUP_UDP4_SENDMSG` and `BPF_CGROUP_UDP6_SENDMSG` for UDPv4 and UDPv6 correspondingly. UDPv4 and UDPv6 separate attach types for same reason as sys_bind and sys_connect hooks, i.e. to prevent reading from / writing to e.g. user_ip6 fields when user passes sockaddr_in since it'd be out-of-bound. The difference with already existing hooks is sys_sendmsg are implemented only for unconnected UDP. For TCP it doesn't make sense to change user-provided `struct sockaddr *` at sendto(2)/sendmsg(2) time since socket either was already connected and has source/destination set or wasn't connected and call to sendto(2)/sendmsg(2) would lead to ENOTCONN anyway. Connected UDP is already handled by sys_connect hooks that can override source/destination at connect time and use fast-path later, i.e. these hooks don't affect UDP fast-path. Rewriting source IP is implemented differently than that in sys_connect hooks. When sys_sendmsg is used with unconnected UDP it doesn't work to just bind socket to desired local IP address since source IP can be set on per-packet basis by using ancillary data (cmsg(3)). So no matter if socket is bound or not, source IP has to be rewritten on every call to sys_sendmsg. To do so two new fields are added to UAPI `struct bpf_sock_addr`; * `msg_src_ip4` to set source IPv4 for UDPv4; * `msg_src_ip6` to set source IPv6 for UDPv6. Signed-off-by: NAndrey Ignatov <rdna@fb.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 26 5月, 2018 1 次提交
-
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-