- 19 3月, 2015 1 次提交
-
-
由 Eric Dumazet 提交于
const qualifiers ease code review by making clear which objects are not written in a function. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 3月, 2015 8 次提交
-
-
由 Eric Dumazet 提交于
While testing last patch series, I found req sock refcounting was wrong. We must set skc_refcnt to 1 for all request socks added in hashes, but also on request sockets created by FastOpen or syncookies. It is tricky because we need to defer this initialization so that future RCU lookups do not try to take a refcount on a not yet fully initialized request socket. Also get rid of ireq_refcnt alias. Signed-off-by: NEric Dumazet <edumazet@google.com> Fixes: 13854e5a ("inet: add proper refcounting to request sock") Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
It is not because a TCP listener is FastOpen ready that all incoming sockets actually used FastOpen. Avoid taking queue->fastopenq->lock if not needed. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
The listener field in struct tcp_request_sock is a pointer back to the listener. We now have req->rsk_listener, so TCP only needs one boolean and not a full pointer. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Once we'll be able to lookup request sockets in ehash table, we'll need to get access to listener which created this request. This avoid doing a lookup to find the listener, which benefits for a more solid SO_REUSEPORT, and is needed once we no longer queue request sock into a listener private queue. Note that 'struct tcp_request_sock'->listener could be reduced to a single bit, as TFO listener should match req->rsk_listener. TFO will no longer need to hold a reference on the listener. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
inet_reqsk_alloc() is becoming fat and should not be inlined. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
listener socket can be used to set net pointer, and will be later used to hold a reference on listener. Add a const qualifier to first argument (struct request_sock_ops *), and factorize all write_pnet(&ireq->ireq_net, sock_net(sk)); Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
tcp_oow_rate_limited() is hardly used in fast path, there is no point inlining it. Signed-of-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
This big helper is called once from tcp_conn_request(), there is no point having it in an include. Compiler will inline it anyway. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 3月, 2015 5 次提交
-
-
由 Eric Dumazet 提交于
Changes in tcp_metric hash table are protected by tcp_metrics_lock only, not by genl_mutex While we are at it use deref_locked() instead of rcu_dereference() in tcp_new() to avoid unnecessary barrier, as we hold tcp_metrics_lock as well. Reported-by: NAndrew Vagin <avagin@parallels.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Fixes: 098a697b ("tcp_metrics: Use a single hash table for all network namespaces.") Reviewed-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
reqsk_put() is the generic function that should be used to release a refcount (and automatically call reqsk_free()) reqsk_free() might be called if refcount is known to be 0 or undefined. refcnt is set to one in inet_csk_reqsk_queue_add() As request socks are not yet in global ehash table, I added temporary debugging checks in reqsk_put() and reqsk_free() Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
sock_edemux() is not used in fast path, and should really call sock_gen_put() to save some code. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
inet_diag_fill_req() is renamed to inet_req_diag_fill() and moved up, so that it can be called fom sk_diag_fill() inet_diag_bc_sk() is ready to handle request socks. inet_twsk_diag_dump() is no longer needed. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
When a request socket is created, we do not cache ip route dst entry, like for timewait sockets. Let's use sk_fullsock() helper. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 3月, 2015 3 次提交
-
-
由 Eric Dumazet 提交于
Now the three type of sockets share a common base, we can factorize code in inet_diag_msg_common_fill(). inet_diag_entry no longer requires saddr_storage & daddr_storage and the extra copies. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
inet_sk_diag_fill() only copes with non timewait and non request socks Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Once request socks will be in ehash table, they will need to have a valid ir_iff field. This is currently true only for IPv6. This patch extends support for IPv4 as well. This means inet_diag_fill_req() can now properly use ir_iif, which is better for IPv6 link locals anyway, as request sockets and established sockets will propagate consistent netlink idiag_if. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 3月, 2015 14 次提交
-
-
由 Eric W. Biederman 提交于
Now that all of the operations are safe on a single hash table accross network namespaces, allocate a single global hash table and update the code to use it. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
Rewrite tcp_metrics_flush_all so that it can cope with entries from different network namespaces on it's hash chain. This is based on the logic in tcp_metrics_nl_cmd_del for deleting a selection of entries from a tcp metrics hash chain. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
tcp_metrics_flush_all always returns 0. Remove the unnecessary return code. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
In preparation for using one tcp metrics hash table for all network namespaces add a field tcpm_net to struct tcp_metrics_block, and verify that field on all hash table lookups. Make the field tcpm_net of type possible_net_t so it takes no space when network namespaces are disabled. Further add a function tm_net to read that field so we can be efficient when network namespaces are disabled and concise the rest of the time. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
In preparation for using one hash table for all network namespaces mix the network namespace into the hash value. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
There is not a practical way to cleanup during boot so just panic if there is a problem initializing tcp_metrics. That will at least give us a clear place to start debugging if something does go wrong. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Before inserting request socks into general hash table, fill their socket family. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
ireq->ir_num contains local port, use it. Also, get_openreq4() dumping listen_sk->refcnt makes litle sense. inet_diag_fill_req() can also use ireq->ir_num Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
sock_edemux() & sock_gen_put() should be ready to cope with request socks. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
I forgot to update dccp_v6_conn_request() & cookie_v6_check(). They both need to set ireq->ireq_net and ireq->ir_cookie Lets clear ireq->ir_cookie in inet_reqsk_alloc() Signed-off-by: NEric Dumazet <edumazet@google.com> Fixes: 33cf7c90 ("net: add real socket cookies") Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
This change makes it so that we should always have a deterministic ordering for the main and local aliases within the merged table when two leaves overlap. So for example if we have a leaf with a key of 192.168.254.0. If we previously added two aliases with a prefix length of 24 from both local and main the first entry would be first and the second would be second. When I was coding this I had added a WARN_ON should such a situation occur as I wasn't sure how likely it would be. However this WARN_ON has been triggered so this is something that should be addressed. With this patch the ordering of the aliases is as follows. First they are sorted on prefix length, then on their table ID, then tos, and finally priority. This way what we end up doing is essentially interleaving the two tables on what used to be leaf_info structure boundaries. Fixes: 0ddcf43d ("ipv4: FIB Local/MAIN table collapse") Reported-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
The function fib_unmerge assumed the local table had already been allocated. If that is not the case however when custom rules are applied then this can result in a NULL pointer dereference. In order to prevent this we must check the value of the local table pointer and if it is NULL simply return 0 as there is no local table to separate from the main. Fixes: 0ddcf43d ("ipv4: FIB Local/MAIN table collapse") Reported-by: NMadhu Challa <challa@noironetworks.com> Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
Having to say > #ifdef CONFIG_NET_NS > struct net *net; > #endif in structures is a little bit wordy and a little bit error prone. Instead it is possible to say: > typedef struct { > #ifdef CONFIG_NET_NS > struct net *net; > #endif > } possible_net_t; And then in a header say: > possible_net_t net; Which is cleaner and easier to use and easier to test, as the possible_net_t is always there no matter what the compile options. Further this allows read_pnet and write_pnet to be functions in all cases which is better at catching typos. This change adds possible_net_t, updates the definitions of read_pnet and write_pnet, updates optional struct net * variables that write_pnet uses on to have the type possible_net_t, and finally fixes up the b0rked users of read_pnet and write_pnet. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
hold_net and release_net were an idea that turned out to be useless. The code has been disabled since 2008. Kill the code it is long past due. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 3月, 2015 6 次提交
-
-
由 Eric Dumazet 提交于
I forgot to use write_pnet() in three locations. Signed-off-by: NEric Dumazet <edumazet@google.com> Fixes: 33cf7c90 ("net: add real socket cookies") Reported-by: Nkbuild test robot <fengguang.wu@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
A long standing problem in netlink socket dumps is the use of kernel socket addresses as cookies. 1) It is a security concern. 2) Sockets can be reused quite quickly, so there is no guarantee a cookie is used once and identify a flow. 3) request sock, establish sock, and timewait socks for a given flow have different cookies. Part of our effort to bring better TCP statistics requires to switch to a different allocator. In this patch, I chose to use a per network namespace 64bit generator, and to use it only in the case a socket needs to be dumped to netlink. (This might be refined later if needed) Note that I tried to carry cookies from request sock, to establish sock, then timewait sockets. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Eric Salo <salo@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
When we merged the tries for local and main I had overlooked the iterator for /proc/net/route. As a result it was outputting both local and main when the two tries were merged. This patch resolves that by only providing output for aliases that are actually in the main trie. As a result we should go back to the original behavior which I assume will be necessary to maintain legacy support. Fixes: 0ddcf43d ("ipv4: FIB Local/MAIN table collapse") Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
The 0-day kernel test infrastructure reported a use of uninitialized variable warning for local_table due to the fact that the local and main allocations had been swapped from the original setup. This change corrects that by making it so that we free the main table if the local table allocation fails. Fixes: 0ddcf43d ("ipv4: FIB Local/MAIN table collapse") Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sabrina Dubroca 提交于
Move rtnl_lock() before the call to fib4_rules_exit so that fib_table_flush_external is called under RTNL. Fixes: 104616e7 ("switchdev: don't support custom ip rules, for now") Signed-off-by: NSabrina Dubroca <sd@queasysnail.net> Acked-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Reviewed-by: NJiri Pirko <jiri@resnulli.us> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
This patch is meant to collapse local and main into one by converting tb_data from an array to a pointer. Doing this allows us to point the local table into the main while maintaining the same variables in the table. As such the tb_data was converted from an array to a pointer, and a new array called data is added in order to still provide an object for tb_data to point to. In order to track the origin of the fib aliases a tb_id value was added in a hole that existed on 64b systems. Using this we can also reverse the merge in the event that custom FIB rules are enabled. With this patch I am seeing an improvement of 20ns to 30ns for routing lookups as long as custom rules are not enabled, with custom rules enabled we fall back to split tables and the original behavior. Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 3月, 2015 3 次提交
-
-
由 Alexander Duyck 提交于
If the inflate call failed it would return NULL. As a result tp would be set to NULL and cause use to trigger a NULL pointer dereference in should_halve if the inflate failed on the first attempt. In order to prevent this we should decrement max_work before we actually attempt to inflate as this will force us to exit before attempting to halve a node we should have inflated. In order to keep things symmetric between inflate and halve I went ahead and also moved the decrement of max_work for the halve case as well so we take care of that before we actually attempt to halve the tnode. Fixes: 88bae714 ("fib_trie: Add key vector to root, return parent key_vector in resize") Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
In the case of a trie that had no tnodes with a key of 0 the initial look-up would fail resulting in an out-of-bounds cindex on the first tnode. This resulted in an entire trie being skipped. In order resolve this I have updated the cindex logic in the initial look-up so that if the key is zero we will always traverse the child zero path. Fixes: 8be33e95 ("fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf") Reported-by: NSabrina Dubroca <sd@queasysnail.net> Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com> Tested-by: NSabrina Dubroca <sd@queasysnail.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
diag dumpers should not modify the request. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-