提交 · e9a441b6e729e16092fcc18e3962b952a01d1e3c · openanolis / cloud-kernel

30 3月, 2018 1 次提交

xfrm: Register xfrm_dev_notifier in appropriate place · e9a441b6

由 Kirill Tkhai 提交于 3月 29, 2018

Currently, driver registers it from pernet_operations::init method,
and this breaks modularity, because initialization of net namespace
and netdevice notifiers are orthogonal actions. We don't have
per-namespace netdevice notifiers; all of them are global for all
devices in all namespaces.
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e9a441b6

28 3月, 2018 1 次提交

net: Drop pernet_operations::async · 2f635cee

由 Kirill Tkhai 提交于 3月 27, 2018

Synchronous pernet_operations are not allowed anymore.
All are asynchronous. So, drop the structure member.
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f635cee

07 3月, 2018 1 次提交

xfrm_policy: use true and false for boolean values · 415a1329

由 Gustavo A. R. Silva 提交于 3月 05, 2018

Assign true or false to boolean variables instead of an integer value.

This issue was detected with the help of Coccinelle.
Signed-off-by: NGustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

415a1329

27 2月, 2018 1 次提交

xfrm: mark kmem_caches as __ro_after_init · f8c3d0dd

由 Alexey Dobriyan 提交于 2月 24, 2018

Kmem caches aren't relocated once set up.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

f8c3d0dd

20 2月, 2018 1 次提交

xfrm: Fix infinite loop in xfrm_get_dst_nexthop with transport mode. · 013cb81e

由 Steffen Klassert 提交于 2月 19, 2018

On transport mode we forget to fetch the child dst_entry
before we continue the while loop, this leads to an infinite
loop. Fix this by fetching the child dst_entry before we
continue the while loop.

Fixes: 0f6c480f ("xfrm: Move dst->path into struct xfrm_dst")
Reported-by: syzbot+7d03c810e50aaedef98a@syzkaller.appspotmail.com
Tested-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

013cb81e

19 2月, 2018 1 次提交

xfrm: do not call rcu_read_unlock when afinfo is NULL in xfrm_get_tos · 143a4454

由 Xin Long 提交于 2月 17, 2018

When xfrm_policy_get_afinfo returns NULL, it will not hold rcu
read lock. In this case, rcu_read_unlock should not be called
in xfrm_get_tos, just like other places where it's calling
xfrm_policy_get_afinfo.

Fixes: f5e2bb4f ("xfrm: policy: xfrm_get_tos cannot fail")
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

143a4454

13 2月, 2018 2 次提交

net: Convert pernet_subsys, registered from inet_init() · f84c6821

由 Kirill Tkhai 提交于 2月 13, 2018

arp_net_ops just addr/removes /proc entry.

devinet_ops allocates and frees duplicate of init_net tables
and (un)registers sysctl entries.

fib_net_ops allocates and frees pernet tables, creates/destroys
netlink socket and (un)initializes /proc entries. Foreign
pernet_operations do not touch them.

ip_rt_proc_ops only modifies pernet /proc entries.

xfrm_net_ops creates/destroys /proc entries, allocates/frees
pernet statistics, hashes and tables, and (un)initializes
sysctl files. These are not touched by foreigh pernet_operations

xfrm4_net_ops allocates/frees private pernet memory, and
configures sysctls.

sysctl_route_ops creates/destroys sysctls.

rt_genid_ops only initializes fields of just allocated net.

ipv4_inetpeer_ops allocated/frees net private memory.

igmp_net_ops just creates/destroys /proc files and socket,
noone else interested in.

tcp_sk_ops seems to be safe, because tcp_sk_init() does not
depend on any other pernet_operations modifications. Iteration
over hash table in inet_twsk_purge() is made under RCU lock,
and it's safe to iterate the table this way. Removing from
the table happen from inet_twsk_deschedule_put(), but this
function is safe without any extern locks, as it's synchronized
inside itself. There are many examples, it's used in different
context. So, it's safe to leave tcp_sk_exit_batch() unlocked.

tcp_net_metrics_ops is synchronized on tcp_metrics_lock and safe.

udplite4_net_ops only creates/destroys pernet /proc file.

icmp_sk_ops creates percpu sockets, not touched by foreign
pernet_operations.

ipmr_net_ops creates/destroys pernet fib tables, (un)registers
fib rules and /proc files. This seem to be safe to execute
in parallel with foreign pernet_operations.

af_inet_ops just sets up default parameters of newly created net.

ipv4_mib_ops creates and destroys pernet percpu statistics.

raw_net_ops, tcp4_net_ops, udp4_net_ops, ping_v4_net_ops
and ip_proc_ops only create/destroy pernet /proc files.

ip4_frags_ops creates and destroys sysctl file.

So, it's safe to make the pernet_operations async.
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: NAndrei Vagin <avagin@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f84c6821

xfrm: Fix policy hold queue after flowcache removal. · 2471c981

由 Steffen Klassert 提交于 2月 01, 2018

Now that the flowcache is removed we need to generate
a new dummy bundle every time we check if the needed
SAs are in place because the dummy bundle is not cached
anymore. Fix it by passing the XFRM_LOOKUP_QUEUE flag
to xfrm_lookup(). This makes sure that we get a dummy
bundle in case the SAs are not yet in place.

Fixes: 3ca28286 ("xfrm_policy: bypass flow_cache_lookup")
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

2471c981

10 1月, 2018 1 次提交

xfrm: Fix a race in the xdst pcpu cache. · 76a42011

由 Steffen Klassert 提交于 1月 10, 2018

We need to run xfrm_resolve_and_create_bundle() with
bottom halves off. Otherwise we may reuse an already
released dst_enty when the xfrm lookup functions are
called from process context.

Fixes: c30d78c14a813db39a647b6a348b428 ("xfrm: add xdst pcpu cache")
Reported-by: NDarius Ski <darius.ski@gmail.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

76a42011

08 1月, 2018 1 次提交

xfrm: don't call xfrm_policy_cache_flush while holding spinlock · b1bdcb59

由 Florian Westphal 提交于 1月 06, 2018

xfrm_policy_cache_flush can sleep, so it cannot be called while holding
a spinlock.  We could release the lock first, but I don't see why we need
to invoke this function here in first place, the packet path won't reuse
an xdst entry unless its still valid.

While at it, add an annotation to xfrm_policy_cache_flush, it would
have probably caught this bug sooner.

Fixes: ec30d78c ("xfrm: add xdst pcpu cache")
Reported-by: syzbot+e149f7d1328c26f9c12f@syzkaller.appspotmail.com
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

b1bdcb59

30 12月, 2017 1 次提交

xfrm: skip policies marked as dead while rehashing · 862591bf

由 Florian Westphal 提交于 12月 27, 2017

syzkaller triggered following KASAN splat:

BUG: KASAN: slab-out-of-bounds in xfrm_hash_rebuild+0xdbe/0xf00 net/xfrm/xfrm_policy.c:618
read of size 2 at addr ffff8801c8e92fe4 by task kworker/1:1/23 [..]
Workqueue: events xfrm_hash_rebuild [..]
 __asan_report_load2_noabort+0x14/0x20 mm/kasan/report.c:428
 xfrm_hash_rebuild+0xdbe/0xf00 net/xfrm/xfrm_policy.c:618
 process_one_work+0xbbf/0x1b10 kernel/workqueue.c:2112
 worker_thread+0x223/0x1990 kernel/workqueue.c:2246 [..]

The reproducer triggers:
1016                 if (error) {
1017                         list_move_tail(&walk->walk.all, &x->all);
1018                         goto out;
1019                 }

in xfrm_policy_walk() via pfkey (it sets tiny rcv space, dump
callback returns -ENOBUFS).

In this case, *walk is located the pfkey socket struct, so this socket
becomes visible in the global policy list.

It looks like this is intentional -- phony walker has walk.dead set to 1
and all other places skip such "policies".

Ccing original authors of the two commits that seem to expose this
issue (first patch missed ->dead check, second patch adds pfkey
sockets to policies dumper list).

Fixes: 880a6fab ("xfrm: configure policy hash table thresholds by netlink")
Fixes: 12a169e7 ("ipsec: Put dumpers on the dump list")
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Timo Teras <timo.teras@iki.fi>
Cc: Christophe Gouault <christophe.gouault@6wind.com>
Reported-by: Nsyzbot <bot+c028095236fcb6f4348811565b75084c754dc729@syzkaller.appspotmail.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

862591bf

12 12月, 2017 1 次提交

xfrm: put policies when reusing pcpu xdst entry · d2950278

由 Florian Westphal 提交于 12月 11, 2017

We need to put the policies when re-using the pcpu xdst entry, else
this leaks the reference.

Fixes: ec30d78c ("xfrm: add xdst pcpu cache")
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

d2950278

01 12月, 2017 1 次提交

xfrm: Fix stack-out-of-bounds read on socket policy lookup. · ddc47e44

由 Steffen Klassert 提交于 11月 29, 2017

When we do tunnel or beet mode, we pass saddr and daddr from the
template to xfrm_state_find(), this is ok. On transport mode,
we pass the addresses from the flowi, assuming that the IP
addresses (and address family) don't change during transformation.
This assumption is wrong in the IPv4 mapped IPv6 case, packet
is IPv4 and template is IPv6.

Fix this by catching address family missmatches of the policy
and the flow already before we do the lookup.
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

ddc47e44

30 11月, 2017 5 次提交

xfrm: Stop using dst->next in bundle construction. · 5492093d

由 David Miller 提交于 11月 28, 2017

While building ipsec bundles, blocks of xfrm dsts are linked together
using dst->next from bottom to the top.

The only thing this is used for is initializing the pmtu values of the
xfrm stack, and for updating the mtu values at xfrm_bundle_ok() time.

The bundle pmtu entries must be processed in this order so that pmtu
values lower in the stack of routes can propagate up to the higher
ones.

Avoid using dst->next by simply maintaining an array of dst pointers
as we already do for the xfrm_state objects when building the bundle.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Reviewed-by: NEric Dumazet <edumazet@google.com>

5492093d

xfrm: Move dst->path into struct xfrm_dst · 0f6c480f

由 David Miller 提交于 11月 28, 2017

The first member of an IPSEC route bundle chain sets it's dst->path to
the underlying ipv4/ipv6 route that carries the bundle.

Stated another way, if one were to follow the xfrm_dst->child chain of
the bundle, the final non-NULL pointer would be the path and point to
either an ipv4 or an ipv6 route.

This is largely used to make sure that PMTU events propagate down to
the correct ipv4 or ipv6 route.

When we don't have the top of an IPSEC bundle 'dst->path == dst'.

Move it down into xfrm_dst and key off of dst->xfrm.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Reviewed-by: NEric Dumazet <edumazet@google.com>

0f6c480f

ipsec: Create and use new helpers for dst child access. · 45b018be

由 David Miller 提交于 11月 28, 2017

This will make a future change moving the dst->child pointer less
invasive.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Reviewed-by: NEric Dumazet <edumazet@google.com>

45b018be

net: Create and use new helper xfrm_dst_child(). · b92cf4aa

由 David Miller 提交于 11月 28, 2017

Only IPSEC routes have a non-NULL dst->child pointer.  And IPSEC
routes are identified by a non-NULL dst->xfrm pointer.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b92cf4aa

net: xfrm: allow clearing socket xfrm policies. · be8f8284

由 Lorenzo Colitti 提交于 11月 20, 2017

Currently it is possible to add or update socket policies, but
not clear them. Therefore, once a socket policy has been applied,
the socket cannot be used for unencrypted traffic.

This patch allows (privileged) users to clear socket policies by
passing in a NULL pointer and zero length argument to the
{IP,IPV6}_{IPSEC,XFRM}_POLICY setsockopts. This results in both
the incoming and outgoing policies being cleared.

The simple approach taken in this patch cannot clear socket
policies in only one direction. If desired this could be added
in the future, for example by continuing to pass in a length of
zero (which currently is guaranteed to return EMSGSIZE) and
making the policy be a pointer to an integer that contains one
of the XFRM_POLICY_{IN,OUT} enum values.

An alternative would have been to interpret the length as a
signed integer and use XFRM_POLICY_IN (i.e., 0) to clear the
input policy and -XFRM_POLICY_OUT (i.e., -1) to clear the output
policy.

Tested: https://android-review.googlesource.com/539816Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

be8f8284

15 11月, 2017 1 次提交

Revert "xfrm: Fix stack-out-of-bounds read in xfrm_state_find." · 94802151

由 Steffen Klassert 提交于 11月 15, 2017

This reverts commit c9f3f813.

This commit breaks transport mode when the policy template
has widlcard addresses configured, so revert it.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

94802151

14 11月, 2017 1 次提交

xfrm: Copy policy family in clone_policy · 0e74aa1d

由 Herbert Xu 提交于 11月 10, 2017

The syzbot found an ancient bug in the IPsec code.  When we cloned
a socket policy (for example, for a child TCP socket derived from a
listening socket), we did not copy the family field.  This results
in a live policy with a zero family field.  This triggers a BUG_ON
check in the af_key code when the cloned policy is retrieved.

This patch fixes it by copying the family field over.
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

0e74aa1d

03 11月, 2017 2 次提交

xfrm: Fix stack-out-of-bounds read in xfrm_state_find. · c9f3f813

由 Steffen Klassert 提交于 11月 02, 2017

When we do tunnel or beet mode, we pass saddr and daddr from the
template to xfrm_state_find(), this is ok. On transport mode,
we pass the addresses from the flowi, assuming that the IP
addresses (and address family) don't change during transformation.
This assumption is wrong in the IPv4 mapped IPv6 case, packet
is IPv4 and template is IPv6. Fix this by using the addresses
from the template unconditionally.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

c9f3f813

xfrm: do unconditional template resolution before pcpu cache check · cf379667

由 Florian Westphal 提交于 11月 02, 2017

Stephen Smalley says:
 Since 4.14-rc1, the selinux-testsuite has been encountering sporadic
 failures during testing of labeled IPSEC. git bisect pointed to
 commit ec30d ("xfrm: add xdst pcpu cache").
 The xdst pcpu cache is only checking that the policies are the same,
 but does not validate that the policy, state, and flow match with respect
 to security context labeling.
 As a result, the wrong SA could be used and the receiver could end up
 performing permission checking and providing SO_PEERSEC or SCM_SECURITY
 values for the wrong security context.

This fix makes it so that we always do the template resolution, and
then checks that the found states match those in the pcpu bundle.

This has the disadvantage of doing a bit more work (lookup in state hash
table) if we can reuse the xdst entry (we only avoid xdst alloc/free)
but we don't add a lot of extra work in case we can't reuse.

xfrm_pol_dead() check is removed, reasoning is that
xfrm_tmpl_resolve does all needed checks.

Cc: Paul Moore <paul@paul-moore.com>
Fixes: ec30d78c ("xfrm: add xdst pcpu cache")
Reported-by: NStephen Smalley <sds@tycho.nsa.gov>
Tested-by: NStephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Acked-by: NPaul Moore <paul@paul-moore.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

cf379667

24 10月, 2017 1 次提交

xfrm: Fix xfrm_dst_cache memleak · ec650b23

由 Steffen Klassert 提交于 10月 24, 2017

We have a memleak whenever a flow matches a policy without
a matching SA. In this case we generate a dummy bundle and
take an additional refcount on the dst_entry. This was needed
as long as we had the flowcache. The flowcache removal patches
deleted all related refcounts but forgot the one for the
dummy bundle case. Fix the memleak by removing this refcount.

Fixes: 3ca28286 ("xfrm_policy: bypass flow_cache_lookup")
Reported-by: NMaxime Bizon <mbizon@freebox.fr>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

ec650b23

18 10月, 2017 1 次提交

xfrm: Convert timers to use timer_setup() · c3aed709

由 Kees Cook 提交于 10月 16, 2017

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
helper to pass the timer pointer explicitly.

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c3aed709

11 10月, 2017 1 次提交

ipsec: Fix dst leak in xfrm_bundle_create(). · 10a7ef33

由 David Miller 提交于 10月 10, 2017

If we cannot find a suitable inner_mode value, we will leak
the currently allocated 'xdst'.

The fix is to make sure it is linked into the chain before
erroring out.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

10a7ef33

24 8月, 2017 1 次提交

net: xfrm: don't double-hold dst when sk_policy in use. · 8a4b5784

由 Lorenzo Colitti 提交于 8月 23, 2017

While removing dst_entry garbage collection, commit 52df157f
("xfrm: take refcnt of dst when creating struct xfrm_dst bundle")
changed xfrm_resolve_and_create_bundle so it returns an xdst with
a refcount of 1 instead of 0.

However, it did not delete the dst_hold performed by xfrm_lookup
when a per-socket policy is in use. This means that when a
socket policy is in use, dst entries returned by xfrm_lookup have
a refcount of 2, and are not freed when no longer in use.

Cc: Wei Wang <weiwan@google.com>
Fixes: 52df157f ("xfrm: take refcnt of dst when creating struct xfrm_dst bundle")
Tested: https://android-review.googlesource.com/417481
Tested: https://android-review.googlesource.com/418659
Tested: https://android-review.googlesource.com/424463
Tested: https://android-review.googlesource.com/452776 passes on net-next
Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
Acked-by: NWei Wang <weiwan@google.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

8a4b5784

11 8月, 2017 1 次提交

net: xfrm: support setting an output mark. · 077fbac4

由 Lorenzo Colitti 提交于 8月 11, 2017

On systems that use mark-based routing it may be necessary for
routing lookups to use marks in order for packets to be routed
correctly. An example of such a system is Android, which uses
socket marks to route packets via different networks.

Currently, routing lookups in tunnel mode always use a mark of
zero, making routing incorrect on such systems.

This patch adds a new output_mark element to the xfrm state and
a corresponding XFRMA_OUTPUT_MARK netlink attribute. The output
mark differs from the existing xfrm mark in two ways:

1. The xfrm mark is used to match xfrm policies and states, while
the xfrm output mark is used to set the mark (and influence
the routing) of the packets emitted by those states.
2. The existing mark is constrained to be a subset of the bits of
the originating socket or transformed packet, but the output
mark is arbitrary and depends only on the state.

The use of a separate mark provides additional flexibility. For
example:

- A packet subject to two transforms (e.g., transport mode inside
tunnel mode) can have two different output marks applied to it,
one for the transport mode SA and one for the tunnel mode SA.
- On a system where socket marks determine routing, the packets
emitted by an IPsec tunnel can be routed based on a mark that
is determined by the tunnel, not by the marks of the
unencrypted packets.
- Support for setting the output marks can be introduced without
breaking any existing setups that employ both mark-based
routing and xfrm tunnel mode. Simply changing the code to use
the xfrm mark for routing output packets could xfrm mark could
change behaviour in a way that breaks these setups.

If the output mark is unspecified or set to zero, the mark is not
set or changed.

Tested: make allyesconfig; make -j64
Tested: https://android-review.googlesource.com/452776Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

077fbac4

08 8月, 2017 1 次提交

xfrm: check that cached bundle is still valid · 13ead5c4

由 Florian Westphal 提交于 8月 06, 2017

Quoting Ilan Tayari:
  1. Set up a host-to-host IPSec tunnel (or transport, doesn't matter)
  2. Ping over IPSec, or do something to populate the pcpu cache
  3. Join a MC group, then leave MC group
  4. Try to ping again using same CPU as before -> traffic
     doesn't egress the machine at all

Ilan debugged the problem down to the fact that one of the path dsts
devices point to lo due to earlier dst_dev_put().
In this case, dst is marked as DEAD and we cannot reuse the bundle.

The cache only asserted that the requested policy and that of the cached
bundle match, but its not enough - also verify the path is still valid.

Fixes: ec30d78c ("xfrm: add xdst pcpu cache")
Reported-by: NAyham Masood <ayhamm@mellanox.com>
Tested-by: NIlan Tayari <ilant@mellanox.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13ead5c4

03 8月, 2017 1 次提交

xfrm: policy: check policy direction value · 7bab0963

由 Vladis Dronov 提交于 8月 02, 2017

The 'dir' parameter in xfrm_migrate() is a user-controlled byte which is used
as an array index. This can lead to an out-of-bound access, kernel lockup and
DoS. Add a check for the 'dir' value.

This fixes CVE-2017-11600.

References: https://bugzilla.redhat.com/show_bug.cgi?id=1474928
Fixes: 80c9abaa ("[XFRM]: Extension for dynamic update of endpoint address(es)")
Cc: <stable@vger.kernel.org> # v2.6.21-rc1
Reported-by: N"bo Zhang" <zhangbo5891001@gmail.com>
Signed-off-by: NVladis Dronov <vdronov@redhat.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

7bab0963

19 7月, 2017 7 次提交

xfrm: add xdst pcpu cache · ec30d78c

由 Florian Westphal 提交于 7月 17, 2017

retain last used xfrm_dst in a pcpu cache.
On next request, reuse this dst if the policies are the same.

The cache will not help with strict RR workloads as there is no hit.

The cache packet-path part is reasonably small, the notifier part is
needed so we do not add long hangs when a device is dismantled but some
pcpu xdst still holds a reference, there are also calls to the flush
operation when userspace deletes SAs so modules can be removed
(there is no hit.

We need to run the dst_release on the correct cpu to avoid races with
packet path.  This is done by adding a work_struct for each cpu and then
doing the actual test/release on each affected cpu via schedule_work_on().

Test results using 4 network namespaces and null encryption:

ns1           ns2          -> ns3           -> ns4
netperf -> xfrm/null enc   -> xfrm/null dec -> netserver

what                    TCP_STREAM      UDP_STREAM      UDP_RR
Flow cache:             14644.61        294.35          327231.64
No flow cache:		14349.81	242.64		202301.72
Pcpu cache:		14629.70	292.21		205595.22

UDP tests used 64byte packets, tests ran for one minute each,
value is average over ten iterations.

'Flow cache' is 'net-next', 'No flow cache' is net-next plus this
series but without this patch.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec30d78c

xfrm: remove flow cache · 09c75704

由 Florian Westphal 提交于 7月 17, 2017

After rcu conversions performance degradation in forward tests isn't that
noticeable anymore.

See next patch for some numbers.

A followup patcg could then also remove genid from the policies
as we do not cache bundles anymore.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

09c75704

xfrm_policy: make xfrm_bundle_lookup return xfrm dst object · bd45c539

由 Florian Westphal 提交于 7月 17, 2017

This allows to remove flow cache object embedded in struct xfrm_dst.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bd45c539

xfrm_policy: remove xfrm_policy_lookup · 86dc8ee0

由 Florian Westphal 提交于 7月 17, 2017

This removes the wrapper and renames the __xfrm_policy_lookup variant
to get rid of another place that used flow cache objects.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86dc8ee0

xfrm_policy: kill flow to policy dir conversion · aff669bc

由 Florian Westphal 提交于 7月 17, 2017

XFRM_POLICY_IN/OUT/FWD are identical to FLOW_DIR_*, so gcc already
removed this function as its just returns the argument.  Again, no
code change.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aff669bc

xfrm_policy: remove always true/false branches · 855dad99

由 Florian Westphal 提交于 7月 17, 2017

after previous change oldflo and xdst are always NULL.
These branches were already removed by gcc, this doesn't change code.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

855dad99

xfrm_policy: bypass flow_cache_lookup · 3ca28286

由 Florian Westphal 提交于 7月 17, 2017

Instead of consulting flow cache, call the xfrm bundle/policy lookup
functions directly.  This pretends the flow cache had no entry.

This helps to gradually remove flow cache integration,
followup commit will remove the dead code that this change adds.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ca28286

05 7月, 2017 1 次提交

net, xfrm: convert xfrm_policy.refcnt from atomic_t to refcount_t · 850a6212

由 Reshetova, Elena 提交于 7月 04, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

850a6212

18 6月, 2017 3 次提交

net: remove DST_NOCACHE flag · a4c2fd7f

由 Wei Wang 提交于 6月 17, 2017

DST_NOCACHE flag check has been removed from dst_release() and
dst_hold_safe() in a previous patch because all the dst are now ref
counted properly and can be released based on refcnt only.
Looking at the rest of the DST_NOCACHE use, all of them can now be
removed or replaced with other checks.
So this patch gets rid of all the DST_NOCACHE usage and remove this flag
completely.
Signed-off-by: NWei Wang <weiwan@google.com>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a4c2fd7f

net: remove DST_NOGC flag · b2a9c0ed

由 Wei Wang 提交于 6月 17, 2017

Now that all the components have been changed to release dst based on
refcnt only and not depend on dst gc anymore, we can remove the
temporary flag DST_NOGC.

Note that we also need to remove the DST_NOCACHE check in dst_release()
and dst_hold_safe() because now all the dst are released based on refcnt
and behaves as DST_NOCACHE.
Signed-off-by: NWei Wang <weiwan@google.com>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b2a9c0ed

xfrm: take refcnt of dst when creating struct xfrm_dst bundle · 52df157f

由 Wei Wang 提交于 6月 17, 2017

During the creation of xfrm_dst bundle, always take ref count when
allocating the dst. This way, xfrm_bundle_create() will form a linked
list of dst with dst->child pointing to a ref counted dst child. And
the returned dst pointer is also ref counted. This makes the link from
the flow cache to this dst now ref counted properly.
As the dst is always ref counted properly, we can safely mark
DST_NOGC flag so dst_release() will release dst based on refcnt only.
And dst gc is no longer needed and all dst_free() and its related
function calls should be replaced with dst_release() or
dst_release_immediate().

The special handling logic for dst->child in dst_destroy() can be
replaced with a simple dst_release_immediate() call on the child to
release the whole list linked by dst->child pointer.
Previously used DST_NOHASH flag is not needed anymore as well. The
reason that DST_NOHASH is used in the existing code is mainly to prevent
the dst inserted in the fib tree to be wrongly destroyed during the
deletion of the xfrm_dst bundle. So in the existing code, DST_NOHASH
flag is marked in all the dst children except the one which is in the
fib tree.
However, with this patch series to remove dst gc logic and release dst
only based on ref count, it is safe to release all the children from a
xfrm_dst bundle as long as the dst children are all ref counted
properly which is already the case in the existing code.
So, this patch removes the use of DST_NOHASH flag.
Signed-off-by: NWei Wang <weiwan@google.com>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52df157f

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功