- 21 3月, 2016 5 次提交
-
-
由 Jesse Gross 提交于
If a packet is either locally encapsulated or processed through GRO it is marked with the offloads that it requires. However, when it is decapsulated these tunnel offload indications are not removed. This means that if we receive an encapsulated TCP packet, aggregate it with GRO, decapsulate, and retransmit the resulting frame on a NIC that does not support encapsulation, we won't be able to take advantage of hardware offloads even though it is just a simple TCP packet at this point. This fixes the problem by stripping off encapsulation offload indications when packets are decapsulated. The performance impacts of this bug are significant. In a test where a Geneve encapsulated TCP stream is sent to a hypervisor, GRO'ed, decapsulated, and bridged to a VM performance is improved by 60% (5Gbps->8Gbps) as a result of avoiding unnecessary segmentation at the VM tap interface. Reported-by: NRamu Ramamurthy <sramamur@linux.vnet.ibm.com> Fixes: 68c33163 ("v4 GRE: Add TCP segmentation offload for GRE") Signed-off-by: NJesse Gross <jesse@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Marcelo Ricardo Leitner 提交于
If the user supply a different fragmentation point or if there is a network header that cause it to not be aligned, force it to be aligned. Fragmentation point at a value that is not aligned is not optimal. It causes extra padding to be used and has just no pros. v2: - Make use of the new WORD_TRUNC macro Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Marcelo Ricardo Leitner 提交于
SCTP is a protocol that is aligned to a word (4 bytes). Thus using bare MTU can sometimes return values that are not aligned, like for loopback, which is 65536 but ipv4_mtu() limits that to 65535. This mis-alignment will cause the last non-aligned bytes to never be used and can cause issues with congestion control. So it's better to just consider a lower MTU and keep congestion control calcs saner as they are based on PMTU. Same applies to icmp frag needed messages, which is also fixed by this patch. One other effect of this is the inability to send MTU-sized packet without queueing or fragmentation and without hitting Nagle. As the check performed at sctp_packet_can_append_data(): if (chunk->skb->len + q->out_qlen >= transport->pathmtu - packet->overhead) /* Enough data queued to fill a packet */ return SCTP_XMIT_OK; with the above example of MTU, if there are no other messages queued, one cannot send a packet that just fits one packet (65532 bytes) and without causing DATA chunk fragmentation or a delay. v2: - Added WORD_TRUNC macro Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
flowi6_tos of struct flowi6 is unused in IPv6, therefore dumping tos on that tracepoint will also give incorrect information wrt traffic class. If we want to fix it, we need to extract it via ip6_tclass(flp->flowlabel). While for the same test case I get a count of 0 non-zero tos values before the change, they now start to show up after the change: # ./perf record -e fib6:fib6_table_lookup -a sleep 10 # ./perf script | grep -v "tos 0" | wc -l 60 Since there's no user in the kernel tree anymore of flowi6_tos, remove the define to avoid any future confusion on this. Fixes: b811580d ("net: IPv6 fib lookup tracepoint") Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
Jiri mentioned that flowi6_tos of struct flowi6 is never used/read anywhere. In fact, rest of the kernel uses the flowi6's flowlabel, where the traffic class _and_ the flowlabel (aka flowinfo) is encoded. For example, for policy routing, fib6_rule_match() uses ip6_tclass() that is applied on the flowlabel member for matching on tclass. Similar fix is needed for geneve, where flowi6_tos is set as well. Installing a v6 blackhole rule that f.e. matches on tos is now working with vxlan. Fixes: 1400615d ("vxlan: allow setting ipv6 traffic class") Reported-by: NJiri Benc <jbenc@redhat.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 3月, 2016 3 次提交
-
-
由 Eric Dumazet 提交于
bond_get_stats() can be called from rtnetlink (with RTNL held) or from /proc/net/dev seq handler (with RCU held) The logic added in commit 5f0c5f73 ("bonding: make global bonding stats more reliable") kind of assumed only one cpu could run there. If multiple threads are reading /proc/net/dev, stats can be really messed up after a while. A second problem is that some fields are 32bit, so we need to properly handle the wrap around problem. Given that RTNL is not always held, we need to use bond_for_each_slave_rcu(). Fixes: 5f0c5f73 ("bonding: make global bonding stats more reliable") Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Andy Gospodarek <gospo@cumulusnetworks.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
eBPF defines this as BPF_TUNLEN_MAX and OVS just uses the hard-coded value inside struct sw_flow_key. Thus, add and use IP_TUNNEL_OPTS_MAX for this, which makes the code a bit more generic and allows to remove BPF_TUNLEN_MAX from eBPF code. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
We can just add a small helper dst_tclassid() for retrieving the dst->tclassid value. It makes the code a bit better in that we can get rid of the ifdef from filter.c by moving this into the header. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 3月, 2016 4 次提交
-
-
由 Vivien Didelot 提交于
netdev_upper_dev_unlink() which notifies NETDEV_CHANGEUPPER, returns void, as well as del_nbp(). So there's no advantage to catch an eventual error from the port_bridge_leave routine at the DSA level. Make this routine void for the DSA layer and its existing drivers. Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vivien Didelot 提交于
Rename DSA port_join_bridge and port_leave_bridge routines to respectively port_bridge_join and port_bridge_leave in order to respect an implicit Port::Bridge namespace. Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Martin KaFai Lau 提交于
Per RFC4898, they count segments sent/received containing a positive length data segment (that includes retransmission segments carrying data). Unlike tcpi_segs_out/in, tcpi_data_segs_out/in excludes segments carrying no data (e.g. pure ack). The patch also updates the segs_in in tcp_fastopen_add_skb() so that segs_in >= data_segs_in property is kept. Together with retransmission data, tcpi_data_segs_out gives a better signal on the rxmit rate. v6: Rebase on the latest net-next v5: Eric pointed out that checking skb->len is still needed in tcp_fastopen_add_skb() because skb can carry a FIN without data. Hence, instead of open coding segs_in and data_segs_in, tcp_segs_in() helper is used. Comment is added to the fastopen case to explain why segs_in has to be reset and tcp_segs_in() has to be called before __skb_pull(). v4: Add comment to the changes in tcp_fastopen_add_skb() and also add remark on this case in the commit message. v3: Add const modifier to the skb parameter in tcp_segs_in() v2: Rework based on recent fix by Eric: commit a9d99ce2 ("tcp: fix tcpi_segs_in after connection establishment") Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Cc: Chris Rapier <rapier@psc.edu> Cc: Eric Dumazet <edumazet@google.com> Cc: Marcelo Ricardo Leitner <mleitner@redhat.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Gregory CLEMENT 提交于
This basic implementation allows to share code between driver using hardware buffer management. As the code is hardware agnostic, there is few helpers, most of the optimization brought by the an HW BM has to be done at driver level. Tested-by: NSebastian Careba <nitroshift@yahoo.com> Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 3月, 2016 3 次提交
-
-
由 Alexander Duyck 提交于
This patch updates csum_ipv6_magic so that it correctly recognizes that protocol is a unsigned 8 bit value. This will allow us to better understand what limitations may or may not be present in how we handle the data. For example there are a number of places that call htonl on the protocol value. This is likely not necessary and can be replaced with a multiplication by ntohl(1) which will be converted to a shift by the compiler. Signed-off-by: NAlexander Duyck <aduyck@mirantis.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Marcelo Ricardo Leitner 提交于
Currently sctp_sendmsg() triggers some calls that will allocate memory with GFP_ATOMIC even when not necessary. In the case of sctp_packet_transmit it will allocate a linear skb that will be used to construct the packet and this may cause sends to fail due to ENOMEM more often than anticipated specially with big MTUs. This patch thus allows it to inherit gfp flags from upper calls so that it can use GFP_KERNEL if it was triggered by a sctp_sendmsg call or similar. All others, like retransmits or flushes started from BH, are still allocated using GFP_ATOMIC. In netperf tests this didn't result in any performance drawbacks when memory is not too fragmented and made it trigger ENOMEM way less often. Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
The code for csum_block_add was doing a funky byteswap to swap the even and odd bytes of the checksum if the offset was odd. Instead of doing this we can save ourselves some trouble and just shift by 8 as this should have the same effect in terms of the final checksum value and only requires one instruction. In addition we can update csum_block_sub to just use csum_block_add with a inverse value for csum2. This way we follow the same code path as csum_block_add without having to duplicate it. Signed-off-by: NAlexander Duyck <aduyck@mirantis.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 3月, 2016 3 次提交
-
-
由 Daniel Borkmann 提交于
This work adds support for setting the IPv6 flow label for vxlan per device and through collect metadata (ip_tunnel_key) frontends. The vxlan dst cache does not need any special considerations here, for the cases where caches can be used, the label is static per cache. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
This patch extends udp_tunnel6_xmit_skb() to pass in the IPv6 flow label from call sites. Currently, there's no such option and it's always set to zero when writing ip6_flow_hdr(). Add a label member to ip_tunnel_key, so that flow-based tunnels via collect metadata frontends can make use of it. vxlan and geneve will be converted to add flow label support separately. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Amir Vadai 提交于
Cast pointer to unsigned long instead of u64, to fix compilation warning on 32 bit arch, spotted by 0day build. Fixes: 5b33f488 ("net/flower: Introduce hardware offload support") Signed-off-by: NAmir Vadai <amir@vadai.me> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 3月, 2016 7 次提交
-
-
由 Amir Vadai 提交于
Enable device drivers to query the action, if and only if is a mark action and what value to use for marking. Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NAmir Vadai <amir@vadai.me> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Amir Vadai 提交于
Introduce the macros tc_no_actions and tc_for_each_action to make code clearer. Extracted struct tc_action out of the ifdef to make calls to is_tcf_gact_shot() and similar functions valid, even when it is a nop. Acked-by: NJiri Pirko <jiri@mellanox.com> Acked-by: NJohn Fastabend <john.r.fastabend@intel.com> Suggested-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NAmir Vadai <amir@vadai.me> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Amir Vadai 提交于
Will be used in a following patch to query if a key is being used, and what it's value in the target object. Acked-by: NJohn Fastabend <john.r.fastabend@intel.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NAmir Vadai <amir@vadai.me> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Amir Vadai 提交于
This patch is based on a patch made by John Fastabend. It adds support for offloading cls_flower. when NETIF_F_HW_TC is on: flags = 0 => Rule will be processed twice - by hardware, and if still relevant, by software. flags = SKIP_HW => Rull will be processed by software only If hardware fail/not capabale to apply the rule, operation will NOT fail. Filter will be processed by SW only. Acked-by: NJiri Pirko <jiri@mellanox.com> Suggested-by: NJohn Fastabend <john.r.fastabend@intel.com> Signed-off-by: NAmir Vadai <amir@vadai.me> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arnd Bergmann 提交于
The stub helper functions for the newly added kcm_proc_init/exit interfaces are defined as 'static' in a header file, which leads to build warnings for each file that includes them without calling them: include/net/kcm.h:183:12: error: 'kcm_proc_init' defined but not used [-Werror=unused-function] include/net/kcm.h:184:13: error: 'kcm_proc_exit' defined but not used [-Werror=unused-function] This marks the two functions as 'static inline' instead, which avoids the warnings and is obviously what was meant here. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Fixes: cd6e111b ("kcm: Add statistics and proc interfaces") Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Johan Hedberg 提交于
Introduce a limited privacy mode indicated by value 0x02 to the mgmt Set Privacy command. With value 0x02 the kernel will use privacy mode with a resolvable private address. In case the controller is bondable and discoverable the identity address will be used. Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com> Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
-
由 Alexander Aring 提交于
This patch removes the swap pointer and memmove functionality. Instead we use the well known put/get unaligned access with specific byte order handling. Signed-off-by: NAlexander Aring <aar@pengutronix.de> Suggested-by: NMarc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
-
- 10 3月, 2016 5 次提交
-
-
由 Tom Herbert 提交于
This patch adds receive timeout for message assembly on the attached TCP sockets. The timeout is set when a new messages is started and the whole message has not been received by TCP (not in the receive queue). If the completely message is subsequently received the timer is cancelled, if the timer expires the RX side is aborted. The timeout value is taken from the socket timeout (SO_RCVTIMEO) that is set on a TCP socket (i.e. set by get sockopt before attaching a TCP socket to KCM. Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
Message assembly is performed on the TCP socket. This is logically equivalent of an application that performs a peek on the socket to find out how much memory is needed for a receive buffer. The receive socket buffer also provides the maximum message size which is checked. The receive algorithm is something like: 1) Receive the first skbuf for a message (or skbufs if multiple are needed to determine message length). 2) Check the message length against the number of bytes in the TCP receive queue (tcp_inq()). - If all the bytes of the message are in the queue (incluing the skbuf received), then proceed with message assembly (it should complete with the tcp_read_sock) - Else, mark the psock with the number of bytes needed to complete the message. 3) In TCP data ready function, if the psock indicates that we are waiting for the rest of the bytes of a messages, check the number of queued bytes against that. - If there are still not enough bytes for the message, just return - Else, clear the waiting bytes and proceed to receive the skbufs. The message should now be received in one tcp_read_sock Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
This patch adds various counters for KCM. These include counters for messages and bytes received or sent, as well as counters for number of attached/unattached TCP sockets and other error or edge events. The statistics are exposed via a proc interface. /proc/net/kcm provides statistics per KCM socket and per psock (attached TCP sockets). /proc/net/kcm_stats provides aggregate statistics. Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
This module implements the Kernel Connection Multiplexor. Kernel Connection Multiplexor (KCM) is a facility that provides a message based interface over TCP for generic application protocols. With KCM an application can efficiently send and receive application protocol messages over TCP using datagram sockets. For more information see the included Documentation/networking/kcm.txt Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
Create a common kernel function to get the number of bytes available on a TCP socket. This is based on code in INQ getsockopt and we now call the function for that getsockopt. Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 3月, 2016 6 次提交
-
-
由 Daniel Borkmann 提交于
Helpers like ip_tunnel_info_opts_{get,set}() are only available if CONFIG_INET is set, thus add an empty definition into the header for the !CONFIG_INET case, where already other empty inline helpers are defined. This avoids ifdef kludge inside filter.c, but also vxlan and geneve themself where this facility can only be used with, depend on INET being set. For the !INET case TUNNEL_OPTIONS_PRESENT would never be set in flags. Fixes: 14ca0751 ("bpf: support for access to tunnel options") Reported-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michal Kubeček 提交于
One of our customers observed issues with FIB6 garbage collectors running in different network namespaces blocking each other, resulting in soft lockups (fib6_run_gc() initiated from timer runs always in forced mode). Now that FIB6 walkers are separated per namespace, there is no more need for instances of fib6_run_gc() in different namespaces blocking each other. There is still a call to icmp6_dst_gc() which operates on shared data but this function is protected by its own shared lock. Signed-off-by: NMichal Kubecek <mkubecek@suse.cz> Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michal Kubeček 提交于
The IPv6 FIB data structures are separated per network namespace but there is still only one global walkers list and one global walker list lock. This means changes in one namespace unnecessarily interfere with walkers in other namespaces. Replace the global list with per-netns lists (and give each its own lock). Signed-off-by: NMichal Kubecek <mkubecek@suse.cz> Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Marcelo Ricardo Leitner 提交于
Dmitry reported that sctp_add_bind_addr may read more bytes than expected in case the parameter is a IPv4 addr supplied by the user through calls such as sctp_bindx_add(), because it always copies sizeof(union sctp_addr) while the buffer may be just a struct sockaddr_in, which is smaller. This patch then fixes it by limiting the memcpy to the min between the union size and a (new parameter) provided addr size. Where possible this parameter still is the size of that union, except for reading from user-provided buffers, which then it accounts for protocol type. Reported-by: NDmitry Vyukov <dvyukov@google.com> Tested-by: NDmitry Vyukov <dvyukov@google.com> Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
The assumptions from commit 0c1d70af ("net: use dst_cache for vxlan device"), 468dfffc ("geneve: add dst caching support") and 3c1cb4d2 ("net/ipv4: add dst cache support for gre lwtunnels") on dst_cache usage when ip_tunnel_info is used is unfortunately not always valid as assumed. While it seems correct for ip_tunnel_info front-ends such as OVS, eBPF however can fill in ip_tunnel_info for consumers like vxlan, geneve or gre with different remote dsts, tos, etc, therefore they cannot be assumed as packet independent. Right now vxlan, geneve, gre would cache the dst for eBPF and every packet would reuse the same entry that was first created on the initial route lookup. eBPF doesn't store/cache the ip_tunnel_info, so each skb may have a different one. Fix it by adding a flag that checks the ip_tunnel_info. Also the !tos test in vxlan needs to be handeled differently in this context as it is currently inferred from ip_tunnel_info as well if present. ip_tunnel_dst_cache_usable() helper is added for the three tunnel cases, which checks if we can use dst cache. Fixes: 0c1d70af ("net: use dst_cache for vxlan device") Fixes: 468dfffc ("geneve: add dst caching support") Fixes: 3c1cb4d2 ("net/ipv4: add dst cache support for gre lwtunnels") Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NPaolo Abeni <pabeni@redhat.com> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
Commit 7d672345 ("bpf: add generic bpf_csum_diff helper") added a generic checksum diff helper that can feed bpf_l4_csum_replace() with a target __wsum diff that is to be applied to the L4 checksum. This facility is very flexible, can be cascaded, allows for adding, removing, or diffing data, or for calculating the pseudo header checksum from scratch, but it can also be reused for working with the IPv4 header checksum. Thus, analogous to bpf_l4_csum_replace(), add a case for header field value of 0 to change the checksum at a given offset through a new helper csum_replace_by_diff(). Also, in addition to that, this provides an easy to use interface for feeding precalculated diffs f.e. coming from a map. It nicely complements bpf_l3_csum_replace() that currently allows only for csum updates of 2 and 4 byte diffs. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 3月, 2016 1 次提交
-
-
由 Julian Anastasov 提交于
Jiri Bohac is reporting for a problem where the attempt to reschedule existing connection to another real server needs proper redirect for the conntrack used by the IPVS connection. For example, when IPVS connection is created to NAT-ed real server we alter the reply direction of conntrack. If we later decide to select different real server we can not alter again the conntrack. And if we expire the old connection, the new connection is left without conntrack. So, the only way to redirect both the IPVS connection and the Netfilter's conntrack is to drop the SYN packet that hits existing connection, to wait for the next jiffie to expire the old connection and its conntrack and to rely on client's retransmission to create new connection as usually. Jiri Bohac provided a fix that drops all SYNs on rescheduling, I extended his patch to do such drops only for connections that use conntrack. Here is the original report from Jiri Bohac: Since commit dc7b3eb9 ("ipvs: Fix reuse connection if real server is dead"), new connections to dead servers are redistributed immediately to new servers. The old connection is expired using ip_vs_conn_expire_now() which sets the connection timer to expire immediately. However, before the timer callback, ip_vs_conn_expire(), is run to clean the connection's conntrack entry, the new redistributed connection may already be established and its conntrack removed instead. Fix this by dropping the first packet of the new connection instead, like we do when the destination server is not available. The timer will have deleted the old conntrack entry long before the first packet of the new connection is retransmitted. Fixes: dc7b3eb9 ("ipvs: Fix reuse connection if real server is dead") Signed-off-by: NJiri Bohac <jbohac@suse.cz> Signed-off-by: NJulian Anastasov <ja@ssi.bg> Signed-off-by: NSimon Horman <horms@verge.net.au>
-
- 04 3月, 2016 1 次提交
-
-
由 Eric Dumazet 提交于
Some devices declare a high number of TX queues, then set a much lower real_num_tx_queues This cause setups using fq_codel, sfq or fq as the default qdisc to consume more memory than really needed. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 3月, 2016 1 次提交
-
-
由 Pablo Neira Ayuso 提交于
Complete masquerading support by allowing port range selection. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 02 3月, 2016 1 次提交
-
-
由 Deepa Dinamani 提交于
ICMP timestamp messages and IP source route options require timestamps to be in milliseconds modulo 24 hours from midnight UT format. Add inet_current_timestamp() function to support this. The function returns the required timestamp in network byte order. Timestamp calculation is also changed to call ktime_get_real_ts64() which uses struct timespec64. struct timespec64 is y2038 safe. Previously it called getnstimeofday() which uses struct timespec. struct timespec is not y2038 safe. Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: James Morris <jmorris@namei.org> Cc: Patrick McHardy <kaber@trash.net> Acked-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-