- 18 6月, 2017 3 次提交
-
-
由 Wei Wang 提交于
DST_NOCACHE flag check has been removed from dst_release() and dst_hold_safe() in a previous patch because all the dst are now ref counted properly and can be released based on refcnt only. Looking at the rest of the DST_NOCACHE use, all of them can now be removed or replaced with other checks. So this patch gets rid of all the DST_NOCACHE usage and remove this flag completely. Signed-off-by: NWei Wang <weiwan@google.com> Acked-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Wei Wang 提交于
Now that all the components have been changed to release dst based on refcnt only and not depend on dst gc anymore, we can remove the temporary flag DST_NOGC. Note that we also need to remove the DST_NOCACHE check in dst_release() and dst_hold_safe() because now all the dst are released based on refcnt and behaves as DST_NOCACHE. Signed-off-by: NWei Wang <weiwan@google.com> Acked-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Wei Wang 提交于
During the creation of xfrm_dst bundle, always take ref count when allocating the dst. This way, xfrm_bundle_create() will form a linked list of dst with dst->child pointing to a ref counted dst child. And the returned dst pointer is also ref counted. This makes the link from the flow cache to this dst now ref counted properly. As the dst is always ref counted properly, we can safely mark DST_NOGC flag so dst_release() will release dst based on refcnt only. And dst gc is no longer needed and all dst_free() and its related function calls should be replaced with dst_release() or dst_release_immediate(). The special handling logic for dst->child in dst_destroy() can be replaced with a simple dst_release_immediate() call on the child to release the whole list linked by dst->child pointer. Previously used DST_NOHASH flag is not needed anymore as well. The reason that DST_NOHASH is used in the existing code is mainly to prevent the dst inserted in the fib tree to be wrongly destroyed during the deletion of the xfrm_dst bundle. So in the existing code, DST_NOHASH flag is marked in all the dst children except the one which is in the fib tree. However, with this patch series to remove dst gc logic and release dst only based on ref count, it is safe to release all the children from a xfrm_dst bundle as long as the dst children are all ref counted properly which is already the case in the existing code. So, this patch removes the use of DST_NOHASH flag. Signed-off-by: NWei Wang <weiwan@google.com> Acked-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 6月, 2017 2 次提交
-
-
由 Antony Antony 提交于
Add XFRMA_ENCAP, UDP encapsulation port, to km_migrate announcement to userland. Only add if XFRMA_ENCAP was in user migrate request. Signed-off-by: NAntony Antony <antony@phenome.org> Reviewed-by: NRichard Guy Briggs <rgb@tricolour.ca> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Antony Antony 提交于
Add UDP encapsulation port to XFRM_MSG_MIGRATE using an optional netlink attribute XFRMA_ENCAP. The devices that support IKE MOBIKE extension (RFC-4555 Section 3.8) could go to sleep for a few minutes and wake up. When it wake up the NAT mapping could have expired, the device send a MOBIKE UPDATE_SA message to migrate the IPsec SA. The change could be a change UDP encapsulation port, IP address, or both. Reported-by: NPaul Wouters <pwouters@redhat.com> Signed-off-by: NAntony Antony <antony@phenome.org> Reviewed-by: NRichard Guy Briggs <rgb@tricolour.ca> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
- 04 5月, 2017 1 次提交
-
-
由 Sabrina Dubroca 提交于
When CONFIG_XFRM_SUB_POLICY=y, xfrm_dst stores a copy of the flowi for that dst. Unfortunately, the code that allocates and fills this copy doesn't care about what type of flowi (flowi, flowi4, flowi6) gets passed. In multiple code paths (from raw_sendmsg, from TCP when replying to a FIN, in vxlan, geneve, and gre), the flowi that gets passed to xfrm is actually an on-stack flowi4, so we end up reading stuff from the stack past the end of the flowi4 struct. Since xfrm_dst->origin isn't used anywhere following commit ca116922 ("xfrm: Eliminate "fl" and "pol" args to xfrm_bundle_ok()."), just get rid of it. xfrm_dst->partner isn't used either, so get rid of that too. Fixes: 9d6ec938 ("ipv4: Use flowi4 in public route lookup interfaces.") Signed-off-by: NSabrina Dubroca <sd@queasysnail.net> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
- 26 4月, 2017 1 次提交
-
-
由 Xin Long 提交于
Now xfrm garbage collection can be triggered by 'ip xfrm policy del'. These is no reason not to do it after flushing policies, especially considering that 'garbage collection deferred' is only triggered when it reaches gc_thresh. It's no good that the policy is gone but the xdst still hold there. The worse thing is that xdst->route/orig_dst is also hold and can not be released even if the orig_dst is already expired. This patch is to do the garbage collection if there is any policy removed in xfrm_policy_flush. Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
- 14 4月, 2017 2 次提交
-
-
由 Steffen Klassert 提交于
This patch adds all the bits that are needed to do IPsec hardware offload for IPsec states and ESP packets. We add xfrmdev_ops to the net_device. xfrmdev_ops has function pointers that are needed to manage the xfrm states in the hardware and to do a per packet offloading decision. Joint work with: Ilan Tayari <ilant@mellanox.com> Guy Shapiro <guysh@mellanox.com> Yossi Kuperman <yossiku@mellanox.com> Signed-off-by: NGuy Shapiro <guysh@mellanox.com> Signed-off-by: NIlan Tayari <ilant@mellanox.com> Signed-off-by: NYossi Kuperman <yossiku@mellanox.com> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Steffen Klassert 提交于
This is needed for the upcomming IPsec device offloading. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
- 27 2月, 2017 1 次提交
-
-
由 Julian Anastasov 提交于
Fix xfrm_neigh_lookup to provide dst->path to the neigh_lookup dst_ops method. When skb is provided, the IP address in packet should already match the dst->path address family. But for the non-skb case, we should consider the last tunnel address as nexthop address. Fixes: f894cbf8 ("net: Add optional SKB arg to dst_ops->neigh_lookup().") Signed-off-by: NJulian Anastasov <ja@ssi.bg> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 2月, 2017 1 次提交
-
-
由 Steffen Klassert 提交于
On IPv4-mapped IPv6 addresses sk_family is AF_INET6, but the flow informations are created based on AF_INET. So the routing set up 'struct flowi4' but we try to access 'struct flowi6' what leads to an out of bounds access. Fix this by using the family we get with the dst_entry, like we do it for the standard policy lookup. Reported-by: NDmitry Vyukov <dvyukov@google.com> Tested-by: NDmitry Vyukov <dvyukov@google.com> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
- 09 2月, 2017 7 次提交
-
-
由 Florian Westphal 提交于
Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Florian Westphal 提交于
Alternative is to keep it an make the (unused) afinfo arg const to avoid the compiler warnings once the afinfo structs get constified. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Florian Westphal 提交于
Only needed it to register the policy backend at init time. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Florian Westphal 提交于
Just call xfrm_garbage_collect_deferred() directly. This gets rid of a write to afinfo in register/unregister and allows to constify afinfo later on. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Florian Westphal 提交于
Nothing checks the return value. Also, the errors returned on unregister are impossible (we only support INET and INET6, so no way xfrm_policy_afinfo[afinfo->family] can be anything other than 'afinfo' itself). Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Florian Westphal 提交于
The comment makes it look like get_tos() is used to validate something, but it turns out the comment was about xfrm_find_bundle() which got removed years ago. xfrm_get_tos will return either the tos (ipv4) or 0 (ipv6). Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Florian Westphal 提交于
Dmitry reports following splat: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 0 PID: 13059 Comm: syz-executor1 Not tainted 4.10.0-rc7-next-20170207 #1 [..] spin_lock_bh include/linux/spinlock.h:304 [inline] xfrm_policy_flush+0x32/0x470 net/xfrm/xfrm_policy.c:963 xfrm_policy_fini+0xbf/0x560 net/xfrm/xfrm_policy.c:3041 xfrm_net_init+0x79f/0x9e0 net/xfrm/xfrm_policy.c:3091 ops_init+0x10a/0x530 net/core/net_namespace.c:115 setup_net+0x2ed/0x690 net/core/net_namespace.c:291 copy_net_ns+0x26c/0x530 net/core/net_namespace.c:396 create_new_namespaces+0x409/0x860 kernel/nsproxy.c:106 unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:205 SYSC_unshare kernel/fork.c:2281 [inline] Problem is that when we get error during xfrm_net_init we will call xfrm_policy_fini which will acquire xfrm_policy_lock before it was initialized. Just move it around so locks get set up first. Reported-by: NDmitry Vyukov <dvyukov@google.com> Fixes: 283bc9f3 ("xfrm: Namespacify xfrm state/policy locks") Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
- 08 2月, 2017 1 次提交
-
-
由 Julian Anastasov 提交于
Add confirm_neigh method to dst_ops and use it from IPv4 and IPv6 to lookup and confirm the neighbour. Its usage via the new helper dst_confirm_neigh() should be restricted to MSG_PROBE users for performance reasons. For XFRM prefer the last tunnel address, if present. With help from Steffen Klassert. Signed-off-by: NJulian Anastasov <ja@ssi.bg> Acked-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 1月, 2017 1 次提交
-
-
由 Alexander Alemayhu 提交于
o s/descentant/descendant o s/workarbound/workaround Signed-off-by: NAlexander Alemayhu <alexander@alemayhu.com> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
- 18 11月, 2016 1 次提交
-
-
由 Florian Westphal 提交于
if we succeed grabbing the refcount, then if (err && !xfrm_pol_hold_rcu) will evaluate to false so this hits last else branch which then sets policy to ERR_PTR(0). Fixes: ae33786f ("xfrm: policy: only use rcu in xfrm_sk_policy_lookup") Reported-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Tested-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
- 10 11月, 2016 1 次提交
-
-
Install the callbacks via the state machine. Use multi state support to avoid custom list handling for the multiple instances. Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: netdev@vger.kernel.org Cc: rt@linutronix.de Cc: "David S. Miller" <davem@davemloft.net> Link: http://lkml.kernel.org/r/20161103145021.28528-10-bigeasy@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 24 8月, 2016 1 次提交
-
-
由 Steffen Klassert 提交于
An earlier patch accidentally replaced a write_lock_bh with a spin_unlock_bh. Fix this by using spin_lock_bh instead. Fixes: 9d0380df ("xfrm: policy: convert policy_lock to spinlock") Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
- 12 8月, 2016 8 次提交
-
-
由 Florian Westphal 提交于
After earlier patches conversions all spots acquire the writer lock and we can now convert this to a normal spinlock. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Florian Westphal 提交于
It doesn't seem that important. We now get inconsistent view of the counters, but those are stale anyway right after we drop the lock. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Florian Westphal 提交于
Don't acquire the readlock anymore and rely on rcu alone. In case writer on other CPU changed policy at the wrong moment (after we obtained sk policy pointer but before we could obtain the reference) just repeat the lookup. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Florian Westphal 提交于
side effect: no longer disables BH (should be fine). Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Florian Westphal 提交于
If we don't hold the policy lock anymore the refcnt might already be 0, i.e. policy struct is about to be free'd. Switch to atomic_inc_not_zero to avoid this. On removal policies are already unlinked from the tables (lists) before the last _put occurs so we are not supposed to find the same 'dead' entry on the next loop, so its safe to just repeat the lookup. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Florian Westphal 提交于
Once xfrm_policy_lookup_bytype doesn't grab xfrm_policy_lock anymore its possible for a hash resize to occur in parallel. Use sequence counter to block lookup in case a resize is in progress and to also re-lookup in case hash table was altered in the mean time (might cause use to not find the best-match). Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Florian Westphal 提交于
Since commit 56f04730 ("xfrm: add rcu grace period in xfrm_policy_destroy()") xfrm policy objects are already free'd via rcu. In order to make more places lockless (i.e. use rcu_read_lock instead of grabbing read-side of policy rwlock) we only need to: - use rcu_assign_pointer to store address of new hash table backend memory - add rcu barrier so that freeing of old memory is delayed (expansion and free happens from system workqueue, so synchronize_rcu is fine) - use rcu_dereference to fetch current address of the hash table. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Florian Westphal 提交于
This is required once we allow lockless readers. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
- 29 7月, 2016 1 次提交
-
-
由 Tobias Brunner 提交于
Whenever thresholds are changed the hash tables are rebuilt. This is done by enumerating all policies and hashing and inserting them into the right table according to the thresholds and direction. Because socket policies are also contained in net->xfrm.policy_all but no hash tables are defined for their direction (dir + XFRM_POLICY_MAX) this causes a NULL or invalid pointer dereference after returning from policy_hash_bysel() if the rebuild is done while any socket policies are installed. Since the rebuild after changing thresholds is scheduled this crash could even occur if the userland sets thresholds seemingly before installing any socket policies. Fixes: 53c2e285 ("xfrm: Do not hash socket policies") Signed-off-by: NTobias Brunner <tobias@strongswan.org> Acked-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
- 12 12月, 2015 2 次提交
-
-
由 Eric Dumazet 提交于
XFRM can deal with SYNACK messages, sent while listener socket is not locked. We add proper rcu protection to __xfrm_sk_clone_policy() and xfrm_sk_policy_lookup() This might serve as the first step to remove xfrm.xfrm_policy_lock use in fast path. Fixes: fa76ce73 ("inet: get rid of central tcp/dccp listener timer") Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
We will soon switch sk->sk_policy[] to RCU protection, as SYNACK packets are sent while listener socket is not locked. This patch simply adds RCU grace period before struct xfrm_policy freeing, and the corresponding rcu_head in struct xfrm_policy. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 12月, 2015 1 次提交
-
-
由 Eric Dumazet 提交于
TCP SYNACK messages might now be attached to request sockets. XFRM needs to get back to a listener socket. Adds new helpers that might be used elsewhere : sk_to_full_sk() and sk_const_to_full_sk() Note: We also need to add RCU protection for xfrm lookups, now TCP/DCCP have lockless listener processing. This will be addressed in separate patches. Fixes: ca6fb065 ("tcp: attach SYNACK messages to request sockets instead of listener") Reported-by: NDave Jones <davej@codemonkey.org.uk> Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 11月, 2015 1 次提交
-
-
由 Dan Streetman 提交于
Remove the dst_entries_init/destroy calls for xfrm4 and xfrm6 dst_ops templates; their dst_entries counters will never be used. Move the xfrm dst_ops initialization from the common xfrm/xfrm_policy.c to xfrm4/xfrm4_policy.c and xfrm6/xfrm6_policy.c, and call dst_entries_init and dst_entries_destroy for each net namespace. The ipv4 and ipv6 xfrms each create dst_ops template, and perform dst_entries_init on the templates. The template values are copied to each net namespace's xfrm.xfrm*_dst_ops. The problem there is the dst_ops pcpuc_entries field is a percpu counter and cannot be used correctly by simply copying it to another object. The result of this is a very subtle bug; changes to the dst entries counter from one net namespace may sometimes get applied to a different net namespace dst entries counter. This is because of how the percpu counter works; it has a main count field as well as a pointer to the percpu variables. Each net namespace maintains its own main count variable, but all point to one set of percpu variables. When any net namespace happens to change one of the percpu variables to outside its small batch range, its count is moved to the net namespace's main count variable. So with multiple net namespaces operating concurrently, the dst_ops entries counter can stray from the actual value that it should be; if counts are consistently moved from one net namespace to another (which my testing showed is likely), then one net namespace winds up with a negative dst_ops count while another winds up with a continually increasing count, eventually reaching its gc_thresh limit, which causes all new traffic on the net namespace to fail with -ENOBUFS. Signed-off-by: NDan Streetman <dan.streetman@canonical.com> Signed-off-by: NDan Streetman <ddstreet@ieee.org> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
- 08 10月, 2015 3 次提交
-
-
由 Eric W. Biederman 提交于
The network namespace is already passed into dst_output pass it into dst->output lwt->output and friends. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
Replace dst_output_okfn with dst_output Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 9月, 2015 1 次提交
-
-
由 Eric Dumazet 提交于
Very soon, TCP stack might call inet_csk_route_req(), which calls inet_csk_route_req() with an unlocked listener socket, so we need to make sure ip_route_output_flow() is not trying to change any field from its socket argument. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-