提交 · 77dcc6233e0def71e104d728ab5a39c2fca51127 · openeuler / Kernel

05 2月, 2019 1 次提交

xfrm: destroy xfrm_state synchronously on net exit path · f75a2804

由 Cong Wang 提交于 1月 31, 2019

xfrm_state_put() moves struct xfrm_state to the GC list
and schedules the GC work to clean it up. On net exit call
path, xfrm_state_flush() is called to clean up and
xfrm_flush_gc() is called to wait for the GC work to complete
before exit.

However, this doesn't work because one of the ->destructor(),
ipcomp_destroy(), schedules the same GC work again inside
the GC work. It is hard to wait for such a nested async
callback. This is also why syzbot still reports the following
warning:

 WARNING: CPU: 1 PID: 33 at net/ipv6/xfrm6_tunnel.c:351 xfrm6_tunnel_net_exit+0x2cb/0x500 net/ipv6/xfrm6_tunnel.c:351
 ...
  ops_exit_list.isra.0+0xb0/0x160 net/core/net_namespace.c:153
  cleanup_net+0x51d/0xb10 net/core/net_namespace.c:551
  process_one_work+0xd0c/0x1ce0 kernel/workqueue.c:2153
  worker_thread+0x143/0x14a0 kernel/workqueue.c:2296
  kthread+0x357/0x430 kernel/kthread.c:246
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352

In fact, it is perfectly fine to bypass GC and destroy xfrm_state
synchronously on net exit call path, because it is in process context
and doesn't need a work struct to do any blocking work.

This patch introduces xfrm_state_put_sync() which simply bypasses
GC, and lets its callers to decide whether to use this synchronous
version. On net exit path, xfrm_state_fini() and
xfrm6_tunnel_net_exit() use it. And, as ipcomp_destroy() itself is
blocking, it can use xfrm_state_put_sync() directly too.

Also rename xfrm_state_gc_destroy() to ___xfrm_state_destroy() to
reflect this change.

Fixes: b48c05ab ("xfrm: Fix warning in xfrm6_tunnel_net_exit.")
Reported-and-tested-by: syzbot+e9aebef558e3ed673934@syzkaller.appspotmail.com
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

f75a2804

23 11月, 2018 1 次提交

xfrm_user: fix freeing of xfrm states on acquire · 4a135e53

由 Mathias Krause 提交于 11月 21, 2018

Commit 565f0fa9 ("xfrm: use a dedicated slab cache for struct
xfrm_state") moved xfrm state objects to use their own slab cache.
However, it missed to adapt xfrm_user to use this new cache when
freeing xfrm states.

Fix this by introducing and make use of a new helper for freeing
xfrm_state objects.

Fixes: 565f0fa9 ("xfrm: use a dedicated slab cache for struct xfrm_state")
Reported-by: NPan Bian <bianpan2016@163.com>
Cc: <stable@vger.kernel.org> # v4.18+
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

4a135e53

06 11月, 2018 1 次提交

xfrm: Fix bucket count reported to userspace · ca92e173

由 Benjamin Poirier 提交于 11月 05, 2018

sadhcnt is reported by `ip -s xfrm state count` as "buckets count", not the
hash mask.

Fixes: 28d8909b ("[XFRM]: Export SAD info.")
Signed-off-by: NBenjamin Poirier <bpoirier@suse.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

ca92e173

01 11月, 2018 1 次提交

compat: Cleanup in_compat_syscall() callers · 98f76206

由 Dmitry Safonov 提交于 10月 12, 2018

Now that in_compat_syscall() is consistent on all architectures and does
not longer report true on native i686, the workarounds (ifdeffery and
helpers) can be removed.
Signed-off-by: NDmitry Safonov <dima@arista.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-efi@vger.kernel.org
Cc: netdev@vger.kernel.org
Link: https://lkml.kernel.org/r/20181012134253.23266-3-dima@arista.com

98f76206

20 7月, 2018 2 次提交

xfrm: Allow xfrmi if_id to be updated by UPDSA · 5baf4f9c

由 Nathan Harold 提交于 7月 19, 2018

Allow attaching an SA to an xfrm interface id after
the creation of the SA, so that tasks such as keying
which must be done as the SA is created, can remain
separate from the decision on how to route traffic
from an SA. This permits SA creation to be decomposed
in to three separate steps:
1) allocation of a SPI
2) algorithm and key negotiation
3) insertion into the data path
Signed-off-by: NNathan Harold <nharold@google.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

5baf4f9c

xfrm: Remove xfrmi interface ID from flowi · bc56b334

由 Benedict Wong 提交于 7月 19, 2018

In order to remove performance impact of having the extra u32 in every
single flowi, this change removes the flowi_xfrm struct, prefering to
take the if_id as a method parameter where needed.

In the inbound direction, if_id is only needed during the
__xfrm_check_policy() function, and the if_id can be determined at that
point based on the skb. As such, xfrmi_decode_session() is only called
with the skb in __xfrm_check_policy().

In the outbound direction, the only place where if_id is needed is the
xfrm_lookup() call in xfrmi_xmit2(). With this change, the if_id is
directly passed into the xfrm_lookup_with_ifid() call. All existing
callers can still call xfrm_lookup(), which uses a default if_id of 0.

This change does not change any behavior of XFRMIs except for improving
overall system performance via flowi size reduction.

This change has been tested against the Android Kernel Networking Tests:

https://android.googlesource.com/kernel/tests/+/master/net/testSigned-off-by: NBenedict Wong <benedictwong@google.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

bc56b334

11 7月, 2018 1 次提交

xfrm: use time64_t for in-kernel timestamps · 386c5680

由 Arnd Bergmann 提交于 7月 11, 2018

The lifetime managment uses '__u64' timestamps on the user space
interface, but 'unsigned long' for reading the current time in the kernel
with get_seconds().

While this is probably safe beyond y2038, it will still overflow in 2106,
and the get_seconds() call is deprecated because fo that.

This changes the xfrm time handling to use time64_t consistently, along
with reading the time using the safer ktime_get_real_seconds(). It still
suffers from problems that can happen from a concurrent settimeofday()
call or (to a lesser degree) a leap second update, but since the time
stamps are part of the user API, there is nothing we can do to prevent
that.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

386c5680

01 7月, 2018 1 次提交

xfrm: Allow Set Mark to be Updated Using UPDSA · 6d8e85ff

由 Nathan Harold 提交于 6月 29, 2018

Allow UPDSA to change "set mark" to permit
policy separation of packet routing decisions from
SA keying in systems that use mark-based routing.

The set mark, used as a routing and firewall mark
for outbound packets, is made update-able which
allows routing decisions to be handled independently
of keying/SA creation. To maintain consistency with
other optional attributes, the set mark is only
updated if sent with a non-zero value.

The per-SA lock and the xfrm_state_lock are taken in
that order to avoid a deadlock with
xfrm_timer_handler(), which also takes the locks in
that order.
Signed-off-by: NNathan Harold <nharold@google.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

6d8e85ff

25 6月, 2018 1 次提交

xfrm: policy: remove pcpu policy cache · e4db5b61

由 Florian Westphal 提交于 6月 25, 2018

Kristian Evensen says:
  In a project I am involved in, we are running ipsec (Strongswan) on
  different mt7621-based routers. Each router is configured as an
  initiator and has around ~30 tunnels to different responders (running
  on misc. devices). Before the flow cache was removed (kernel 4.9), we
  got a combined throughput of around 70Mbit/s for all tunnels on one
  router. However, we recently switched to kernel 4.14 (4.14.48), and
  the total throughput is somewhere around 57Mbit/s (best-case). I.e., a
  drop of around 20%. Reverting the flow cache removal restores, as
  expected, performance levels to that of kernel 4.9.

When pcpu xdst exists, it has to be validated first before it can be
used.

A negative hit thus increases cost vs. no-cache.

As number of tunnels increases, hit rate decreases so this pcpu caching
isn't a viable strategy.

Furthermore, the xdst cache also needs to run with BH off, so when
removing this the bh disable/enable pairs can be removed too.

Kristian tested a 4.14.y backport of this change and reported
increased performance:

  In our tests, the throughput reduction has been reduced from around -20%
  to -5%. We also see that the overall throughput is independent of the
  number of tunnels, while before the throughput was reduced as the number
  of tunnels increased.
Reported-by: NKristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

e4db5b61

23 6月, 2018 1 次提交

xfrm: Add a new lookup key to match xfrm interfaces. · 7e652640

由 Steffen Klassert 提交于 6月 12, 2018

This patch adds the xfrm interface id as a lookup key
for xfrm states and policies. With this we can assign
states and policies to virtual xfrm interfaces.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Acked-by: NShannon Nelson <shannon.nelson@oracle.com>
Acked-by: NBenedict Wong <benedictwong@google.com>
Tested-by: NBenedict Wong <benedictwong@google.com>
Tested-by: NAntony Antony <antony@phenome.org>
Reviewed-by: NEyal Birger <eyal.birger@gmail.com>

7e652640

04 5月, 2018 1 次提交

xfrm: use a dedicated slab cache for struct xfrm_state · 565f0fa9

由 Mathias Krause 提交于 5月 03, 2018

struct xfrm_state is rather large (768 bytes here) and therefore wastes
quite a lot of memory as it falls into the kmalloc-1024 slab cache,
leaving 256 bytes of unused memory per XFRM state object -- a net waste
of 25%.

Using a dedicated slab cache for struct xfrm_state reduces the level of
internal fragmentation to a minimum.

On my configuration SLUB chooses to create a slab cache covering 4
pages holding 21 objects, resulting in an average memory waste of ~13
bytes per object -- a net waste of only 1.6%.

In my tests this led to memory savings of roughly 2.3MB for 10k XFRM
states.
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

565f0fa9

16 4月, 2018 1 次提交

xfrm: Fix warning in xfrm6_tunnel_net_exit. · b48c05ab

由 Steffen Klassert 提交于 4月 16, 2018

We need to make sure that all states are really deleted
before we check that the state lists are empty. Otherwise
we trigger a warning.

Fixes: baeb0dbb ("xfrm6_tunnel: exit_net cleanup check added")
Reported-and-tested-by:syzbot+777bf170a89e7b326405@syzkaller.appspotmail.com
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

b48c05ab

02 2月, 2018 1 次提交

xfrm: Refuse to insert 32 bit userspace socket policies on 64 bit systems · 19d7df69

由 Steffen Klassert 提交于 2月 01, 2018

We don't have a compat layer for xfrm, so userspace and kernel
structures have different sizes in this case. This results in
a broken configuration, so refuse to configure socket policies
when trying to insert from 32 bit userspace as we do it already
with policies inserted via netlink.

Reported-and-tested-by: syzbot+e1a1577ca8bcb47b769a@syzkaller.appspotmail.com
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

19d7df69

23 1月, 2018 1 次提交

xfrm: fix boolean assignment in xfrm_get_type_offload · 545d8ae7

由 Gustavo A. R. Silva 提交于 1月 22, 2018

Assign true or false to boolean variables instead of an integer value.

This issue was detected with the help of Coccinelle.

Fixes: ffdb5211 ("xfrm: Auto-load xfrm offload modules")
Signed-off-by: NGustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

545d8ae7

18 1月, 2018 1 次提交

xfrm: Add SA to hardware at the end of xfrm_state_construct() · cc01572e

由 Yossi Kuperman 提交于 1月 17, 2018

Current code configures the hardware with a new SA before the state has been
fully initialized. During this time interval, an incoming ESP packet can cause
a crash due to a NULL dereference. More specifically, xfrm_input() considers
the packet as valid, and yet, anti-replay mechanism is not initialized.

Move hardware configuration to the end of xfrm_state_construct(), and mark
the state as valid once the SA is fully initialized.

Fixes: d77e38e6 ("xfrm: Add an IPsec hardware offloading API")
Signed-off-by: NAviad Yehezkel <aviadye@mellnaox.com>
Signed-off-by: NAviv Heller <avivh@mellanox.com>
Signed-off-by: NYossi Kuperman <yossiku@mellanox.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

cc01572e

31 12月, 2017 1 次提交

xfrm: fix rcu usage in xfrm_get_type_offload · 2f10a61c

由 Sabrina Dubroca 提交于 12月 31, 2017

request_module can sleep, thus we cannot hold rcu_read_lock() while
calling it. The function also jumps back and takes rcu_read_lock()
again (in xfrm_state_get_afinfo()), resulting in an imbalance.

This codepath is triggered whenever a new offloaded state is created.

Fixes: ffdb5211 ("xfrm: Auto-load xfrm offload modules")
Reported-by: syzbot+ca425f44816d749e8eb49755567a75ee48cf4a30@syzkaller.appspotmail.com
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

2f10a61c

30 12月, 2017 1 次提交

xfrm: Forbid state updates from changing encap type · 257a4b01

由 Herbert Xu 提交于 12月 26, 2017

Currently we allow state updates to competely replace the contents
of x->encap.  This is bad because on the user side ESP only sets up
header lengths depending on encap_type once when the state is first
created.  This could result in the header lengths getting out of
sync with the actual state configuration.

In practice key managers will never do a state update to change the
encapsulation type.  Only the port numbers need to be changed as the
peer NAT entry is updated.

Therefore this patch adds a check in xfrm_state_update to forbid
any changes to the encap_type.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

257a4b01

08 12月, 2017 1 次提交

xfrm: fix xfrm_do_migrate() with AEAD e.g(AES-GCM) · 75bf50f4

由 Antony Antony 提交于 12月 07, 2017

copy geniv when cloning the xfrm state.

x->geniv was not copied to the new state and migration would fail.

xfrm_do_migrate
  ..
  xfrm_state_clone()
   ..
   ..
   esp_init_aead()
   crypto_alloc_aead()
    crypto_alloc_tfm()
     crypto_find_alg() return EAGAIN and failed
Signed-off-by: NAntony Antony <antony@phenome.org>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

75bf50f4

30 11月, 2017 1 次提交

net: xfrm: allow clearing socket xfrm policies. · be8f8284

由 Lorenzo Colitti 提交于 11月 20, 2017

Currently it is possible to add or update socket policies, but
not clear them. Therefore, once a socket policy has been applied,
the socket cannot be used for unencrypted traffic.

This patch allows (privileged) users to clear socket policies by
passing in a NULL pointer and zero length argument to the
{IP,IPV6}_{IPSEC,XFRM}_POLICY setsockopts. This results in both
the incoming and outgoing policies being cleared.

The simple approach taken in this patch cannot clear socket
policies in only one direction. If desired this could be added
in the future, for example by continuing to pass in a length of
zero (which currently is guaranteed to return EMSGSIZE) and
making the policy be a pointer to an integer that contains one
of the XFRM_POLICY_{IN,OUT} enum values.

An alternative would have been to interpret the length as a
signed integer and use XFRM_POLICY_IN (i.e., 0) to clear the
input policy and -XFRM_POLICY_OUT (i.e., -1) to clear the output
policy.

Tested: https://android-review.googlesource.com/539816Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

be8f8284

22 11月, 2017 1 次提交

treewide: setup_timer() -> timer_setup() · e99e88a9

由 Kees Cook 提交于 10月 16, 2017

This converts all remaining cases of the old setup_timer() API into using
timer_setup(), where the callback argument is the structure already
holding the struct timer_list. These should have no behavioral changes,
since they just change which pointer is passed into the callback with
the same available pointers after conversion. It handles the following
examples, in addition to some other variations.

Casting from unsigned long:

    void my_callback(unsigned long data)
    {
        struct something *ptr = (struct something *)data;
    ...
    }
    ...
    setup_timer(&ptr->my_timer, my_callback, ptr);

and forced object casts:

    void my_callback(struct something *ptr)
    {
    ...
    }
    ...
    setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr);

become:

    void my_callback(struct timer_list *t)
    {
        struct something *ptr = from_timer(ptr, t, my_timer);
    ...
    }
    ...
    timer_setup(&ptr->my_timer, my_callback, 0);

Direct function assignments:

    void my_callback(unsigned long data)
    {
        struct something *ptr = (struct something *)data;
    ...
    }
    ...
    ptr->my_timer.function = my_callback;

have a temporary cast added, along with converting the args:

    void my_callback(struct timer_list *t)
    {
        struct something *ptr = from_timer(ptr, t, my_timer);
    ...
    }
    ...
    ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback;

And finally, callbacks without a data assignment:

    void my_callback(unsigned long data)
    {
    ...
    }
    ...
    setup_timer(&ptr->my_timer, my_callback, 0);

have their argument renamed to verify they're unused during conversion:

    void my_callback(struct timer_list *unused)
    {
    ...
    }
    ...
    timer_setup(&ptr->my_timer, my_callback, 0);

The conversion is done with the following Coccinelle script:

spatch --very-quiet --all-includes --include-headers \
	-I ./arch/x86/include -I ./arch/x86/include/generated \
	-I ./include -I ./arch/x86/include/uapi \
	-I ./arch/x86/include/generated/uapi -I ./include/uapi \
	-I ./include/generated/uapi --include ./include/linux/kconfig.h \
	--dir . \
	--cocci-file ~/src/data/timer_setup.cocci

@fix_address_of@
expression e;
@@

 setup_timer(
-&(e)
+&e
 , ...)

// Update any raw setup_timer() usages that have a NULL callback, but
// would otherwise match change_timer_function_usage, since the latter
// will update all function assignments done in the face of a NULL
// function initialization in setup_timer().
@change_timer_function_usage_NULL@
expression _E;
identifier _timer;
type _cast_data;
@@

(
-setup_timer(&_E->_timer, NULL, _E);
+timer_setup(&_E->_timer, NULL, 0);
|
-setup_timer(&_E->_timer, NULL, (_cast_data)_E);
+timer_setup(&_E->_timer, NULL, 0);
|
-setup_timer(&_E._timer, NULL, &_E);
+timer_setup(&_E._timer, NULL, 0);
|
-setup_timer(&_E._timer, NULL, (_cast_data)&_E);
+timer_setup(&_E._timer, NULL, 0);
)

@change_timer_function_usage@
expression _E;
identifier _timer;
struct timer_list _stl;
identifier _callback;
type _cast_func, _cast_data;
@@

(
-setup_timer(&_E->_timer, _callback, _E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, &_callback, _E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, _callback, (_cast_data)_E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, &_callback, (_cast_data)_E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, (_cast_func)_callback, _E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, (_cast_func)&_callback, _E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, (_cast_data)_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, (_cast_data)&_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, &_callback, (_cast_data)_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, &_callback, (_cast_data)&_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E);
+timer_setup(&_E._timer, _callback, 0);
|
 _E->_timer@_stl.function = _callback;
|
 _E->_timer@_stl.function = &_callback;
|
 _E->_timer@_stl.function = (_cast_func)_callback;
|
 _E->_timer@_stl.function = (_cast_func)&_callback;
|
 _E._timer@_stl.function = _callback;
|
 _E._timer@_stl.function = &_callback;
|
 _E._timer@_stl.function = (_cast_func)_callback;
|
 _E._timer@_stl.function = (_cast_func)&_callback;
)

// callback(unsigned long arg)
@change_callback_handle_cast
 depends on change_timer_function_usage@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._timer;
type _origtype;
identifier _origarg;
type _handletype;
identifier _handle;
@@

 void _callback(
-_origtype _origarg
+struct timer_list *t
 )
 {
(
	... when != _origarg
	_handletype *_handle =
-(_handletype *)_origarg;
+from_timer(_handle, t, _timer);
	... when != _origarg
|
	... when != _origarg
	_handletype *_handle =
-(void *)_origarg;
+from_timer(_handle, t, _timer);
	... when != _origarg
|
	... when != _origarg
	_handletype *_handle;
	... when != _handle
	_handle =
-(_handletype *)_origarg;
+from_timer(_handle, t, _timer);
	... when != _origarg
|
	... when != _origarg
	_handletype *_handle;
	... when != _handle
	_handle =
-(void *)_origarg;
+from_timer(_handle, t, _timer);
	... when != _origarg
)
 }

// callback(unsigned long arg) without existing variable
@change_callback_handle_cast_no_arg
 depends on change_timer_function_usage &&
                     !change_callback_handle_cast@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._timer;
type _origtype;
identifier _origarg;
type _handletype;
@@

 void _callback(
-_origtype _origarg
+struct timer_list *t
 )
 {
+	_handletype *_origarg = from_timer(_origarg, t, _timer);
+
	... when != _origarg
-	(_handletype *)_origarg
+	_origarg
	... when != _origarg
 }

// Avoid already converted callbacks.
@match_callback_converted
 depends on change_timer_function_usage &&
            !change_callback_handle_cast &&
	    !change_callback_handle_cast_no_arg@
identifier change_timer_function_usage._callback;
identifier t;
@@

 void _callback(struct timer_list *t)
 { ... }

// callback(struct something *handle)
@change_callback_handle_arg
 depends on change_timer_function_usage &&
	    !match_callback_converted &&
            !change_callback_handle_cast &&
            !change_callback_handle_cast_no_arg@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._timer;
type _handletype;
identifier _handle;
@@

 void _callback(
-_handletype *_handle
+struct timer_list *t
 )
 {
+	_handletype *_handle = from_timer(_handle, t, _timer);
	...
 }

// If change_callback_handle_arg ran on an empty function, remove
// the added handler.
@unchange_callback_handle_arg
 depends on change_timer_function_usage &&
	    change_callback_handle_arg@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._timer;
type _handletype;
identifier _handle;
identifier t;
@@

 void _callback(struct timer_list *t)
 {
-	_handletype *_handle = from_timer(_handle, t, _timer);
 }

// We only want to refactor the setup_timer() data argument if we've found
// the matching callback. This undoes changes in change_timer_function_usage.
@unchange_timer_function_usage
 depends on change_timer_function_usage &&
            !change_callback_handle_cast &&
            !change_callback_handle_cast_no_arg &&
	    !change_callback_handle_arg@
expression change_timer_function_usage._E;
identifier change_timer_function_usage._timer;
identifier change_timer_function_usage._callback;
type change_timer_function_usage._cast_data;
@@

(
-timer_setup(&_E->_timer, _callback, 0);
+setup_timer(&_E->_timer, _callback, (_cast_data)_E);
|
-timer_setup(&_E._timer, _callback, 0);
+setup_timer(&_E._timer, _callback, (_cast_data)&_E);
)

// If we fixed a callback from a .function assignment, fix the
// assignment cast now.
@change_timer_function_assignment
 depends on change_timer_function_usage &&
            (change_callback_handle_cast ||
             change_callback_handle_cast_no_arg ||
             change_callback_handle_arg)@
expression change_timer_function_usage._E;
identifier change_timer_function_usage._timer;
identifier change_timer_function_usage._callback;
type _cast_func;
typedef TIMER_FUNC_TYPE;
@@

(
 _E->_timer.function =
-_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E->_timer.function =
-&_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E->_timer.function =
-(_cast_func)_callback;
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E->_timer.function =
-(_cast_func)&_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E._timer.function =
-_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E._timer.function =
-&_callback;
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E._timer.function =
-(_cast_func)_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E._timer.function =
-(_cast_func)&_callback
+(TIMER_FUNC_TYPE)_callback
 ;
)

// Sometimes timer functions are called directly. Replace matched args.
@change_timer_function_calls
 depends on change_timer_function_usage &&
            (change_callback_handle_cast ||
             change_callback_handle_cast_no_arg ||
             change_callback_handle_arg)@
expression _E;
identifier change_timer_function_usage._timer;
identifier change_timer_function_usage._callback;
type _cast_data;
@@

 _callback(
(
-(_cast_data)_E
+&_E->_timer
|
-(_cast_data)&_E
+&_E._timer
|
-_E
+&_E->_timer
)
 )

// If a timer has been configured without a data argument, it can be
// converted without regard to the callback argument, since it is unused.
@match_timer_function_unused_data@
expression _E;
identifier _timer;
identifier _callback;
@@

(
-setup_timer(&_E->_timer, _callback, 0);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, _callback, 0L);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, _callback, 0UL);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, 0);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, 0L);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, 0UL);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_timer, _callback, 0);
+timer_setup(&_timer, _callback, 0);
|
-setup_timer(&_timer, _callback, 0L);
+timer_setup(&_timer, _callback, 0);
|
-setup_timer(&_timer, _callback, 0UL);
+timer_setup(&_timer, _callback, 0);
|
-setup_timer(_timer, _callback, 0);
+timer_setup(_timer, _callback, 0);
|
-setup_timer(_timer, _callback, 0L);
+timer_setup(_timer, _callback, 0);
|
-setup_timer(_timer, _callback, 0UL);
+timer_setup(_timer, _callback, 0);
)

@change_callback_unused_data
 depends on match_timer_function_unused_data@
identifier match_timer_function_unused_data._callback;
type _origtype;
identifier _origarg;
@@

 void _callback(
-_origtype _origarg
+struct timer_list *unused
 )
 {
	... when != _origarg
 }
Signed-off-by: NKees Cook <keescook@chromium.org>

e99e88a9

26 10月, 2017 1 次提交

xfrm: Clear sk_dst_cache when applying per-socket policy. · 2b06cdf3

由 Jonathan Basseri 提交于 10月 25, 2017

If a socket has a valid dst cache, then xfrm_lookup_route will get
skipped. However, the cache is not invalidated when applying policy to a
socket (i.e. IPV6_XFRM_POLICY). The result is that new policies are
sometimes ignored on those sockets. (Note: This was broken for IPv4 and
IPv6 at different times.)

This can be demonstrated like so,
1. Create UDP socket.
2. connect() the socket.
3. Apply an outbound XFRM policy to the socket. (setsockopt)
4. send() data on the socket.

Packets will continue to be sent in the clear instead of matching an
xfrm or returning a no-match error (EAGAIN). This affects calls to
send() and not sendto().

Invalidating the sk_dst_cache is necessary to correctly apply xfrm
policies. Since we do this in xfrm_user_policy(), the sk_lock was
already acquired in either do_ip_setsockopt() or do_ipv6_setsockopt(),
and we may call __sk_dst_reset().

Performance impact should be negligible, since this code is only called
when changing xfrm policy, and only affects the socket in question.

Fixes: 00bc0ef5 ("ipv6: Skip XFRM lookup if dst_entry in socket cache is valid")
Tested: https://android-review.googlesource.com/517555
Tested: https://android-review.googlesource.com/418659Signed-off-by: NJonathan Basseri <misterikkit@google.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

2b06cdf3

28 9月, 2017 1 次提交

xfrm: don't call xfrm_policy_cache_flush under xfrm_state_lock · dd269db8

由 Artem Savkov 提交于 9月 27, 2017

I might be wrong but it doesn't look like xfrm_state_lock is required
for xfrm_policy_cache_flush and calling it under this lock triggers both
"sleeping function called from invalid context" and "possible circular
locking dependency detected" warnings on flush.

Fixes: ec30d78c xfrm: add xdst pcpu cache
Signed-off-by: NArtem Savkov <asavkov@redhat.com>
Acked-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

dd269db8

02 8月, 2017 2 次提交

xfrm: fix null pointer dereference on state and tmpl sort · 3f5a95ad

由 Koichiro Den 提交于 8月 01, 2017

Creating sub policy that matches the same outer flow as main policy does
leads to a null pointer dereference if the outer mode's family is ipv4.
For userspace compatibility, this patch just eliminates the crash i.e.,
does not introduce any new sorting rule, which would fruitlessly affect
all but the aforementioned case.
Signed-off-by: NKoichiro Den <den@klaipeden.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

3f5a95ad

xfrm: Auto-load xfrm offload modules · ffdb5211

由 Ilan Tayari 提交于 8月 01, 2017

IPSec crypto offload depends on the protocol-specific
offload module (such as esp_offload.ko).

When the user installs an SA with crypto-offload, load
the offload module automatically, in the same way
that the protocol module is loaded (such as esp.ko)
Signed-off-by: NIlan Tayari <ilant@mellanox.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

ffdb5211

19 7月, 2017 1 次提交

xfrm: add xdst pcpu cache · ec30d78c

由 Florian Westphal 提交于 7月 17, 2017

retain last used xfrm_dst in a pcpu cache.
On next request, reuse this dst if the policies are the same.

The cache will not help with strict RR workloads as there is no hit.

The cache packet-path part is reasonably small, the notifier part is
needed so we do not add long hangs when a device is dismantled but some
pcpu xdst still holds a reference, there are also calls to the flush
operation when userspace deletes SAs so modules can be removed
(there is no hit.

We need to run the dst_release on the correct cpu to avoid races with
packet path.  This is done by adding a work_struct for each cpu and then
doing the actual test/release on each affected cpu via schedule_work_on().

Test results using 4 network namespaces and null encryption:

ns1           ns2          -> ns3           -> ns4
netperf -> xfrm/null enc   -> xfrm/null dec -> netserver

what                    TCP_STREAM      UDP_STREAM      UDP_RR
Flow cache:             14644.61        294.35          327231.64
No flow cache:		14349.81	242.64		202301.72
Pcpu cache:		14629.70	292.21		205595.22

UDP tests used 64byte packets, tests ran for one minute each,
value is average over ten iterations.

'Flow cache' is 'net-next', 'No flow cache' is net-next plus this
series but without this patch.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec30d78c

05 7月, 2017 1 次提交

net, xfrm: convert xfrm_state.refcnt from atomic_t to refcount_t · 88755e9c

由 Reshetova, Elena 提交于 7月 04, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

88755e9c

30 6月, 2017 1 次提交
- A
  xfrm_user_policy(): don't open-code memdup_user() · 7f2d17c6
  由 Al Viro 提交于 5月 13, 2017
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  7f2d17c6
07 6月, 2017 2 次提交

xfrm: add UDP encapsulation port in migrate message · 8bafd730

由 Antony Antony 提交于 6月 06, 2017

Add XFRMA_ENCAP, UDP encapsulation port, to km_migrate announcement
to userland. Only add if XFRMA_ENCAP was in user migrate request.
Signed-off-by: NAntony Antony <antony@phenome.org>
Reviewed-by: NRichard Guy Briggs <rgb@tricolour.ca>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

8bafd730

xfrm: extend MIGRATE with UDP encapsulation port · 4ab47d47

由 Antony Antony 提交于 6月 06, 2017

Add UDP encapsulation port to XFRM_MSG_MIGRATE using an optional
netlink attribute XFRMA_ENCAP.

The devices that support IKE MOBIKE extension (RFC-4555 Section 3.8)
could go to sleep for a few minutes and wake up. When it wake up the
NAT mapping could have expired, the device send a MOBIKE UPDATE_SA
message to migrate the IPsec SA. The change could be a change UDP
encapsulation port, IP address, or both.
Reported-by: NPaul Wouters <pwouters@redhat.com>
Signed-off-by: NAntony Antony <antony@phenome.org>
Reviewed-by: NRichard Guy Briggs <rgb@tricolour.ca>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

4ab47d47

19 5月, 2017 1 次提交

xfrm: fix state migration copy replay sequence numbers · a486cd23

由 Antony Antony 提交于 5月 19, 2017

During xfrm migration copy replay and preplay sequence numbers
from the previous state.

Here is a tcpdump output showing the problem.
10.0.10.46 is running vanilla kernel, is the IKE/IPsec responder.
After the migration it sent wrong sequence number, reset to 1.
The migration is from 10.0.0.52 to 10.0.0.53.

IP 10.0.0.52.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7cf), length 136
IP 10.0.10.46.4500 > 10.0.0.52.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x7cf), length 136
IP 10.0.0.52.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7d0), length 136
IP 10.0.10.46.4500 > 10.0.0.52.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x7d0), length 136

IP 10.0.0.53.4500 > 10.0.10.46.4500: NONESP-encap: isakmp: child_sa inf2[I]
IP 10.0.10.46.4500 > 10.0.0.53.4500: NONESP-encap: isakmp: child_sa inf2[R]
IP 10.0.0.53.4500 > 10.0.10.46.4500: NONESP-encap: isakmp: child_sa inf2[I]
IP 10.0.10.46.4500 > 10.0.0.53.4500: NONESP-encap: isakmp: child_sa inf2[R]

IP 10.0.0.53.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7d1), length 136

NOTE: next sequence is wrong 0x1

IP 10.0.10.46.4500 > 10.0.0.53.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x1), length 136
IP 10.0.0.53.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7d2), length 136
IP 10.0.10.46.4500 > 10.0.0.53.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x2), length 136
Signed-off-by: NAntony Antony <antony@phenome.org>
Reviewed-by: NRichard Guy Briggs <rgb@tricolour.ca>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

a486cd23

16 5月, 2017 1 次提交

xfrm: use memdup_user · a133d930

由 Geliang Tang 提交于 5月 06, 2017

Use memdup_user() helper instead of open-coding to simplify the code.
Signed-off-by: NGeliang Tang <geliangtang@gmail.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

a133d930

14 4月, 2017 2 次提交

xfrm: Add an IPsec hardware offloading API · d77e38e6

由 Steffen Klassert 提交于 4月 14, 2017

This patch adds all the bits that are needed to do
IPsec hardware offload for IPsec states and ESP packets.
We add xfrmdev_ops to the net_device. xfrmdev_ops has
function pointers that are needed to manage the xfrm
states in the hardware and to do a per packet
offloading decision.

Joint work with:
Ilan Tayari <ilant@mellanox.com>
Guy Shapiro <guysh@mellanox.com>
Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: NGuy Shapiro <guysh@mellanox.com>
Signed-off-by: NIlan Tayari <ilant@mellanox.com>
Signed-off-by: NYossi Kuperman <yossiku@mellanox.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

d77e38e6

xfrm: Add a xfrm type offload. · 9d389d7f

由 Steffen Klassert 提交于 4月 14, 2017

We add a struct  xfrm_type_offload so that we have the offloaded
codepath separated to the non offloaded codepath. With this the
non offloade and the offloaded codepath can coexist.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

9d389d7f

16 1月, 2017 1 次提交

xfrm: fix possible null deref in xfrm_init_tempstate · 3819a35f

由 Florian Westphal 提交于 1月 13, 2017

Dan reports following smatch warning:
 net/xfrm/xfrm_state.c:659
 error: we previously assumed 'afinfo' could be null (see line 651)

 649  struct xfrm_state_afinfo *afinfo = xfrm_state_afinfo_get_rcu(family);
 651  if (afinfo)
		...
 658  }
 659  afinfo->init_temprop(x, tmpl, daddr, saddr);

I am resonably sure afinfo cannot be NULL here.

xfrm_state4.c and state6.c are both part of ipv4/ipv6 (depends on
CONFIG_XFRM, a boolean) but even if ipv6 is a module state6.c can't
be removed (ipv6 lacks module_exit so it cannot be removed).

The only callers for xfrm6_fini that leads to state backend unregister
are error unwinding paths that can be called during ipv6 init function.

So after ipv6 module is loaded successfully the state backend cannot go
away anymore.

The family value from policy lookup path is taken from dst_entry, so
that should always be AF_INET(6).

However, since this silences the warning and avoids readers of this
code wondering about possible null deref it seems preferrable to
be defensive and just add the old check back.

Fixes: 711059b9 ("xfrm: add and use xfrm_state_afinfo_get_rcu")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

3819a35f

10 1月, 2017 4 次提交

xfrm: state: simplify rcu_read_unlock handling in two spots · 75cda62d

由 Florian Westphal 提交于 1月 09, 2017

Instead of:
  if (foo) {
      unlock();
      return bar();
   }
   unlock();
do:
   unlock();
   if (foo)
       return bar();

This is ok because rcu protected structure is only dereferenced before
the conditional.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

75cda62d

xfrm: add and use xfrm_state_afinfo_get_rcu · 711059b9

由 Florian Westphal 提交于 1月 09, 2017

xfrm_init_tempstate is always called from within rcu read side section.
We can thus use a simpler function that doesn't call rcu_read_lock
again.

While at it, also make xfrm_init_tempstate return value void, the
return value was never tested.

A followup patch will replace remaining callers of xfrm_state_get_afinfo
with xfrm_state_afinfo_get_rcu variant and then remove the 'old'
get_afinfo interface.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

711059b9

xfrm: remove xfrm_state_put_afinfo · af5d27c4

由 Florian Westphal 提交于 1月 09, 2017

commit 44abdc30
("xfrm: replace rwlock on xfrm_state_afinfo with rcu") made
xfrm_state_put_afinfo equivalent to rcu_read_unlock.

Use spatch to replace it with direct calls to rcu_read_unlock:

@@
struct xfrm_state_afinfo *a;
@@

-  xfrm_state_put_afinfo(a);
+  rcu_read_unlock();

old:
 text    data     bss     dec     hex filename
22570      72     424   23066    5a1a xfrm_state.o
 1612       0       0    1612     64c xfrm_output.o
new:
22554      72     424   23050    5a0a xfrm_state.o
 1596       0       0    1596     63c xfrm_output.o
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

af5d27c4

xfrm: avoid rcu sparse warning · 423826a7

由 Florian Westphal 提交于 1月 09, 2017

xfrm/xfrm_state.c:1973:21: error: incompatible types in comparison expression (different address spaces)

Harmless, but lets fix it to reduce the noise.

While at it, get rid of unneeded NULL check, its never hit:

net/ipv4/xfrm4_state.c: xfrm_state_register_afinfo(&xfrm4_state_afinfo);
net/ipv6/xfrm6_state.c: return xfrm_state_register_afinfo(&xfrm6_state_afinfo);
net/ipv6/xfrm6_state.c: xfrm_state_unregister_afinfo(&xfrm6_state_afinfo);

... are the only callsites.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

423826a7

06 1月, 2017 1 次提交

xfrm: state: do not acquire lock in get_mtu helpers · b3b73b8e

由 Florian Westphal 提交于 1月 05, 2017

Once flow cache gets removed the mtu initialisation happens for every skb
that gets an xfrm attached, so this lock starts to show up in perf.

It is not obvious why this lock is required -- the caller holds
reference on the state struct, type->destructor is only called from the
state gc worker (all state structs on gc list must have refcount 0).

xfrm_init_state already has been called (else private data accessed
by type->get_mtu() would not be set up).

So just remove the lock -- the race on the state (DEAD?) doesn't
matter (could change right after dropping the lock too).
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

b3b73b8e

04 1月, 2017 1 次提交

xfrm: trivial typos · 1365e547

由 Alexander Alemayhu 提交于 1月 03, 2017

o s/descentant/descendant
o s/workarbound/workaround
Signed-off-by: NAlexander Alemayhu <alexander@alemayhu.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

1365e547

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功