- 12 3月, 2015 6 次提交
-
-
由 Eric Dumazet 提交于
I forgot to use write_pnet() in three locations. Signed-off-by: NEric Dumazet <edumazet@google.com> Fixes: 33cf7c90 ("net: add real socket cookies") Reported-by: Nkbuild test robot <fengguang.wu@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
A long standing problem in netlink socket dumps is the use of kernel socket addresses as cookies. 1) It is a security concern. 2) Sockets can be reused quite quickly, so there is no guarantee a cookie is used once and identify a flow. 3) request sock, establish sock, and timewait socks for a given flow have different cookies. Part of our effort to bring better TCP statistics requires to switch to a different allocator. In this patch, I chose to use a per network namespace 64bit generator, and to use it only in the case a socket needs to be dumped to netlink. (This might be refined later if needed) Note that I tried to carry cookies from request sock, to establish sock, then timewait sockets. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Eric Salo <salo@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
When we merged the tries for local and main I had overlooked the iterator for /proc/net/route. As a result it was outputting both local and main when the two tries were merged. This patch resolves that by only providing output for aliases that are actually in the main trie. As a result we should go back to the original behavior which I assume will be necessary to maintain legacy support. Fixes: 0ddcf43d ("ipv4: FIB Local/MAIN table collapse") Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
The 0-day kernel test infrastructure reported a use of uninitialized variable warning for local_table due to the fact that the local and main allocations had been swapped from the original setup. This change corrects that by making it so that we free the main table if the local table allocation fails. Fixes: 0ddcf43d ("ipv4: FIB Local/MAIN table collapse") Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sabrina Dubroca 提交于
Move rtnl_lock() before the call to fib4_rules_exit so that fib_table_flush_external is called under RTNL. Fixes: 104616e7 ("switchdev: don't support custom ip rules, for now") Signed-off-by: NSabrina Dubroca <sd@queasysnail.net> Acked-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Reviewed-by: NJiri Pirko <jiri@resnulli.us> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
This patch is meant to collapse local and main into one by converting tb_data from an array to a pointer. Doing this allows us to point the local table into the main while maintaining the same variables in the table. As such the tb_data was converted from an array to a pointer, and a new array called data is added in order to still provide an object for tb_data to point to. In order to track the origin of the fib aliases a tb_id value was added in a hole that existed on 64b systems. Using this we can also reverse the merge in the event that custom FIB rules are enabled. With this patch I am seeing an improvement of 20ns to 30ns for routing lookups as long as custom rules are not enabled, with custom rules enabled we fall back to split tables and the original behavior. Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 3月, 2015 4 次提交
-
-
由 Alexander Duyck 提交于
If the inflate call failed it would return NULL. As a result tp would be set to NULL and cause use to trigger a NULL pointer dereference in should_halve if the inflate failed on the first attempt. In order to prevent this we should decrement max_work before we actually attempt to inflate as this will force us to exit before attempting to halve a node we should have inflated. In order to keep things symmetric between inflate and halve I went ahead and also moved the decrement of max_work for the halve case as well so we take care of that before we actually attempt to halve the tnode. Fixes: 88bae714 ("fib_trie: Add key vector to root, return parent key_vector in resize") Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
In the case of a trie that had no tnodes with a key of 0 the initial look-up would fail resulting in an out-of-bounds cindex on the first tnode. This resulted in an entire trie being skipped. In order resolve this I have updated the cindex logic in the initial look-up so that if the key is zero we will always traverse the child zero path. Fixes: 8be33e95 ("fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf") Reported-by: NSabrina Dubroca <sd@queasysnail.net> Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Tested-by: NSabrina Dubroca <sd@queasysnail.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
diag dumpers should not modify the request. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Remove all inline keywords, add some const, and cleanup style. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 3月, 2015 3 次提交
-
-
由 Florian Westphal 提交于
make C=1 CF=-D__CHECK_ENDIAN__ shows following: net/bridge/netfilter/nft_reject_bridge.c:65:50: warning: incorrect type in argument 3 (different base types) net/bridge/netfilter/nft_reject_bridge.c:65:50: expected restricted __be16 [usertype] protocol [..] net/bridge/netfilter/nft_reject_bridge.c:102:37: warning: cast from restricted __be16 net/bridge/netfilter/nft_reject_bridge.c:102:37: warning: incorrect type in argument 1 (different base types) [..] net/bridge/netfilter/nft_reject_bridge.c:121:50: warning: incorrect type in argument 3 (different base types) [..] net/bridge/netfilter/nft_reject_bridge.c:168:52: warning: incorrect type in argument 3 (different base types) [..] net/bridge/netfilter/nft_reject_bridge.c:233:52: warning: incorrect type in argument 3 (different base types) [..] Caused by two (harmless) errors: 1. htons() instead of ntohs() 2. __be16 for protocol in nf_reject_ipXhdr_put API, use u8 instead. Reported-by: Nkbuild test robot <fengguang.wu@intel.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Scott Feldman 提交于
Pass in the netlink flags (NLM_F_*) into switchdev driver for IPv4 FIB add op to allow driver to 1) optimize hardware updates, 2) handle ip route prepend and append commands correctly. Suggested-by: NJamal Hadi Salim <jhs@mojatatu.com> Suggested-by: NRoopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: NScott Feldman <sfeldma@gmail.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
After my change to neigh_hh_init to obtain the protocol from the neigh_table there are no more users of protocol in struct dst_ops. Remove the protocol field from dst_ops and all of it's initializers. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 3月, 2015 1 次提交
-
-
由 Willem de Bruijn 提交于
When reading from the error queue, msg_name and msg_control are only populated for some errors. A new exception for empty timestamp skbs added a false positive on icmp errors without payload. `traceroute -M udpconn` only displayed gateways that return payload with the icmp error: the embedded network headers are pulled before sock_queue_err_skb, leaving an skb with skb->len == 0 otherwise. Fix this regression by refining when msg_name and msg_control branches are taken. The solutions for the two fields are independent. msg_name only makes sense for errors that configure serr->port and serr->addr_offset. Test the first instead of skb->len. This also fixes another issue. saddr could hold the wrong data, as serr->addr_offset is not initialized in some code paths, pointing to the start of the network header. It is only valid when serr->port is set (non-zero). msg_control support differs between IPv4 and IPv6. IPv4 only honors requests for ICMP and timestamps with SOF_TIMESTAMPING_OPT_CMSG. The skb->len test can simply be removed, because skb->dev is also tested and never true for empty skbs. IPv6 honors requests for all errors aside from local errors and timestamps on empty skbs. In both cases, make the policy more explicit by moving this logic to a new function that decides whether to process msg_control and that optionally prepares the necessary fields in skb->cb[]. After this change, the IPv4 and IPv6 paths are more similar. The last case is rxrpc. Here, simply refine to only match timestamps. Fixes: 49ca0d8b ("net-timestamp: no-payload option") Reported-by: NJan Niehusmann <jan@gondor.com> Signed-off-by: NWillem de Bruijn <willemb@google.com> ---- Changes v1->v2 - fix local origin test inversion in ip6_datagram_support_cmsg - make v4 and v6 code paths more similar by introducing analogous ipv4_datagram_support_cmsg - fix compile bug in rxrpc Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 3月, 2015 12 次提交
-
-
由 Alexander Duyck 提交于
This change makes it so that the root of the trie contains a key_vector, by doing this we make room to essentially collapse the entire trie by at least one cache line as we can store the information about the tnode or leaf that is pointed to in the root. Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
This change pulls the parent pointer from the key_vector and places it in the tnode structure. Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
This pulls the information about the child array out of the key_vector and places it in the tnode since that is where it is needed. Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
RCU is only needed once for the entire node, not once per key_vector so we can pull that out and move it to the tnode structure. In addition add accessors to be used inside the RCU functions so that we can more easily get from the key vector to either the tnode or the trie pointers. Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
This change pulls the fields not explicitly needed in the key_vector and placed them in the new tnode structure. By doing this we will eventually be able to reduce the key_vector down to 16 bytes on 64 bit systems, and 12 bytes on 32 bit systems. Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
We are now checking the length of a key_vector instead of a tnode so it makes sense to probably just rename this to child_length since it would probably even be applicable to a leaf. Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
I am replacing the tnode_get_child call with get_child since we are techically pulling the child out of a key_vector now and not a tnode. Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
Rename the tnode to key_vector. The key_vector will be the eventual container for all of the information needed by either a leaf or a tnode. The final result should be much smaller than the 40 bytes currently needed for either one. This also updates the trie struct so that it contains an array of size 1 of tnode pointers. This is to bring the structure more inline with how an actual tnode itself is configured. Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
Resize related functions now all return a pointer to the pointer that references the object that was resized. Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
This change just does a couple of minor cleanups on fib_table_flush_external. Specifically it addresses the fact that resize was being called even though nothing was being removed from the table, and it drops an unecessary indent since we could just call continue on the inverse of the fi && flag check. Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Fan Du 提交于
As per RFC4821 7.3. Selecting Probe Size, a probe timer should be armed once probing has converged. Once this timer expired, probing again to take advantage of any path PMTU change. The recommended probing interval is 10 minutes per RFC1981. Probing interval could be sysctled by sysctl_tcp_probe_interval. Eric Dumazet suggested to implement pseudo timer based on 32bits jiffies tcp_time_stamp instead of using classic timer for such rare event. Signed-off-by: NFan Du <fan.du@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Fan Du 提交于
Current probe_size is chosen by doubling mss_cache, the probing process will end shortly with a sub-optimal mss size, and the link mtu will not be taken full advantage of, in return, this will make user to tweak tcp_base_mss with care. Use binary search to choose probe_size in a fine granularity manner, an optimal mss will be found to boost performance as its maxmium. In addition, introduce a sysctl_tcp_probe_threshold to control when probing will stop in respect to the width of search range. Test env: Docker instance with vxlan encapuslation(82599EB) iperf -c 10.0.0.24 -t 60 before this patch: 1.26 Gbits/sec After this patch: increase 26% 1.59 Gbits/sec Signed-off-by: NFan Du <fan.du@intel.com> Acked-by: NJohn Heffner <johnwheffner@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 3月, 2015 8 次提交
-
-
由 David S. Miller 提交于
net/ipv4/fib_trie.c: In function ‘fib_table_flush_external’: net/ipv4/fib_trie.c:1572:6: warning: unused variable ‘found’ [-Wunused-variable] int found = 0; ^ net/ipv4/fib_trie.c:1571:16: warning: unused variable ‘slen’ [-Wunused-variable] unsigned char slen; ^ Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Scott Feldman 提交于
Call into the switchdev driver any time an IPv4 fib entry is added/modified/deleted from the kernel's FIB. The switchdev driver may or may not install the route to the offload device. In the case where the driver tries to install the route and something goes wrong (device's routing table is full, etc), then all of the offloaded routes will be flushed from the device, route forwarding falls back to the kernel, and no more routes are offloading. We can refine this logic later. For now, use the simplist model of offloading routes up to the point of failure, and then on failure, undo everything and mark IPv4 offloading disabled. Signed-off-by: NScott Feldman <sfeldma@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Scott Feldman 提交于
Keep switchdev FIB offload model simple for now and don't allow custom ip rules. Signed-off-by: NScott Feldman <sfeldma@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
timewait sockets now share a common base with established sockets. inet_twsk_diag_dump() can use inet_diag_bc_sk() instead of duplicating code, granted that inet_diag_bc_sk() does proper userlocks initialization. twsk_build_assert() will catch any future changes that could break the assumptions. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
With some mss values, it is possible tcp_xmit_size_goal() puts one segment more in TSO packet than tcp_tso_autosize(). We send then one TSO packet followed by one single MSS. It is not a serious bug, but we can do slightly better, especially for drivers using netif_set_gso_max_size() to lower gso_max_size. Using same formula avoids these corner cases and makes tcp_xmit_size_goal() a bit faster. Signed-off-by: NEric Dumazet <edumazet@google.com> Fixes: 605ad7f1 ("tcp: refine TSO autosizing") Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Drozdov 提交于
ip_check_defrag() may be used by af_packet to defragment outgoing packets. skb_network_offset() of af_packet's outgoing packets is not zero. Signed-off-by: NAlexander Drozdov <al.drozdov@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pablo Neira Ayuso 提交于
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Pablo Neira Ayuso 提交于
xt_cluster supersedes ipt_CLUSTERIP since it can be also used in gateway configurations (not only from the backend side). ipt_CLUSTER is also known to leak the netdev that it uses on device removal, which requires a rather large fix to workaround the problem: http://patchwork.ozlabs.org/patch/358629/ So let's deprecate this so we can probably kill code this in the future. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 05 3月, 2015 6 次提交
-
-
由 Alexander Duyck 提交于
This patch adds code to prevent us from attempting to allocate a tnode with a size larger than what can be represented by size_t. Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
This change updates the fib_table_lookup function so that it is in sync with the fib_find_node function in terms of the explanation for the index check based on the bits value. I have also updated it from doing a mask to just doing a compare as I have found that seems to provide more options to the compiler as I have seen it turn this into a shift of the value and test under some circumstances. In addition I addressed one minor issue in which we kept computing the key ^ n->key when checking the fib aliases. I pulled the xor out of the loop in order to reduce the number of memory reads in the lookup. As a result we should save a couple cycles since the xor is only done once much earlier in the lookup. Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
The fib_table was wrapped in several places with an rcu_read_lock/rcu_read_unlock however after looking over the code I found several spots where the tables were being accessed as just standard pointers without any protections. This change fixes that so that all of the proper protections are in place when accessing the table to take RCU replacement or removal of the table into account. Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
If we are going to compact the leaf and tnode we first need to make sure the fields are all in the same place. In that regard I am moving the leaf pointer which represents the fib_alias hash list to occupy what is currently the first key_vector pointer. Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
This change makes it so that the insert and delete functions make use of the tnode pointer returned in the fib_find_node call. By doing this we will not have to rely on the parent pointer in the leaf which will be going away soon. Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
This change makes it so that the parent pointer is returned by reference in fib_find_node. By doing this I can use it to find the parent node when I am performing an insertion and I don't have to look for it again in fib_insert_node. Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-