提交 · 809d5fd3afdc7fc6531894e14a892ac1d9f1689a · openeuler / Kernel

10 11月, 2022 2 次提交

net/tls: Remove the context from the list in tls_device_down · 809d5fd3

由 Maxim Mikityanskiy 提交于 11月 10, 2022

stable inclusion
from stable-v5.10.135
commit 4c1318dabeb98ad9650909e9abb650b523aada15
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZWFM

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4c1318dabeb98ad9650909e9abb650b523aada15

--------------------------------

commit f6336724 upstream.

tls_device_down takes a reference on all contexts it's going to move to
the degraded state (software fallback). If sk_destruct runs afterwards,
it can reduce the reference counter back to 1 and return early without
destroying the context. Then tls_device_down will release the reference
it took and call tls_device_free_ctx. However, the context will still
stay in tls_device_down_list forever. The list will contain an item,
memory for which is released, making a memory corruption possible.

Fix the above bug by properly removing the context from all lists before
any call to tls_device_free_ctx.

Fixes: 3740651b ("tls: Fix context leak on tls_device_down")
Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

809d5fd3

net/tls: Fix race in TLS device down flow · c5ba5836

由 Tariq Toukan 提交于 11月 10, 2022

stable inclusion
from stable-v5.10.134
commit e80ff0b9661384d40e97a0a7d5cc8ae2a00c785d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZVR7

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e80ff0b9661384d40e97a0a7d5cc8ae2a00c785d

--------------------------------

[ Upstream commit f08d8c1b ]

Socket destruction flow and tls_device_down function sync against each
other using tls_device_lock and the context refcount, to guarantee the
device resources are freed via tls_dev_del() by the end of
tls_device_down.

In the following unfortunate flow, this won't happen:
- refcount is decreased to zero in tls_device_sk_destruct.
- tls_device_down starts, skips the context as refcount is zero, going
  all the way until it flushes the gc work, and returns without freeing
  the device resources.
- only then, tls_device_queue_ctx_destruction is called, queues the gc
  work and frees the context's device resources.

Solve it by decreasing the refcount in the socket's destruction flow
under the tls_device_lock, for perfect synchronization.  This does not
slow down the common likely destructor flow, in which both the refcount
is decreased and the spinlock is acquired, anyway.

Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
Reviewed-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

c5ba5836

02 11月, 2022 1 次提交

net/tls: Check for errors in tls_device_init · 503aef46

由 Tariq Toukan 提交于 11月 02, 2022

stable inclusion
from stable-v5.10.132
commit c713de1d80a5d7035dc7f667b485bded83b4e74a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5YS3T

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c713de1d80a5d7035dc7f667b485bded83b4e74a

--------------------------------

[ Upstream commit 3d8c51b2 ]

Add missing error checks in tls_device_init.

Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
Reported-by: NJakub Kicinski <kuba@kernel.org>
Reviewed-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220714070754.1428-1-tariqt@nvidia.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

503aef46

16 8月, 2022 1 次提交

tls: Fix context leak on tls_device_down · a4f4e195

由 Maxim Mikityanskiy 提交于 8月 16, 2022

stable inclusion
from stable-v5.10.117
commit fccf4bf3f25dff4aa47d80926298f3707b8d1072
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5L66B

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=fccf4bf3f25dff4aa47d80926298f3707b8d1072

--------------------------------

[ Upstream commit 3740651b ]

The commit cited below claims to fix a use-after-free condition after
tls_device_down. Apparently, the description wasn't fully accurate. The
context stayed alive, but ctx->netdev became NULL, and the offload was
torn down without a proper fallback, so a bug was present, but a
different kind of bug.

Due to misunderstanding of the issue, the original patch dropped the
refcount_dec_and_test line for the context to avoid the alleged
premature deallocation. That line has to be restored, because it matches
the refcount_inc_not_zero from the same function, otherwise the contexts
that survived tls_device_down are leaked.

This patch fixes the described issue by restoring refcount_dec_and_test.
After this change, there is no leak anymore, and the fallback to
software kTLS still works.

Fixes: c55dcdd4 ("net/tls: Fix use-after-free after the TLS device goes down and up")
Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220512091830.678684-1-maximmi@nvidia.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

a4f4e195

02 8月, 2022 1 次提交

tls: Skip tls_append_frag on zero copy size · cd846279

由 Maxim Mikityanskiy 提交于 8月 02, 2022

stable inclusion
from stable-v5.10.114
commit 1781beb87935d39f47af6553e4b3581f834ea79b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5IY1V

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=1781beb87935d39f47af6553e4b3581f834ea79b

--------------------------------

[ Upstream commit a0df7194 ]

Calling tls_append_frag when max_open_record_len == record->len might
add an empty fragment to the TLS record if the call happens to be on the
page boundary. Normally tls_append_frag coalesces the zero-sized
fragment to the previous one, but not if it's on page boundary.

If a resync happens then, the mlx5 driver posts dump WQEs in
tx_post_resync_dump, and the empty fragment may become a data segment
with byte_count == 0, which will confuse the NIC and lead to a CQE
error.

This commit fixes the described issue by skipping tls_append_frag on
zero size to avoid adding empty fragments. The fix is not in the driver,
because an empty fragment is hardly the desired behavior.

Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220426154949.159055-1-maximmi@nvidia.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

cd846279

11 11月, 2021 1 次提交

skbuff: add a parameter to __skb_frag_unref · 2556a6f0

由 Matteo Croce 提交于 11月 11, 2021

mainline inclusion
from mainline-v5.14-rc1
commit c420c989
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4CVS3
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c420c98982fa

----------------------------------------------------------------------

This is a prerequisite patch, the next one is enabling recycling of
skbs and fragments. Add an extra argument on __skb_frag_unref() to
handle recycling, and update the current users of the function with that.
Signed-off-by: NMatteo Croce <mcroce@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Reviewed-by: NYongxin Li <liyongxin1@huawei.com>
Signed-off-by: NJunxin Chen <chenjunxin1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2556a6f0

15 6月, 2021 2 次提交

net/tls: Fix use-after-free after the TLS device goes down and up · aa3905c0

由 Maxim Mikityanskiy 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit f1d4184f128dede82a59a841658ed40d4e6d3aa2
bugzilla: 109284
CVE: NA

--------------------------------

[ Upstream commit c55dcdd4 ]

When a netdev with active TLS offload goes down, tls_device_down is
called to stop the offload and tear down the TLS context. However, the
socket stays alive, and it still points to the TLS context, which is now
deallocated. If a netdev goes up, while the connection is still active,
and the data flow resumes after a number of TCP retransmissions, it will
lead to a use-after-free of the TLS context.

This commit addresses this bug by keeping the context alive until its
normal destruction, and implements the necessary fallbacks, so that the
connection can resume in software (non-offloaded) kTLS mode.

On the TX side tls_sw_fallback is used to encrypt all packets. The RX
side already has all the necessary fallbacks, because receiving
non-decrypted packets is supported. The thing needed on the RX side is
to block resync requests, which are normally produced after receiving
non-decrypted packets.

The necessary synchronization is implemented for a graceful teardown:
first the fallbacks are deployed, then the driver resources are released
(it used to be possible to have a tls_dev_resync after tls_dev_del).

A new flag called TLS_RX_DEV_DEGRADED is added to indicate the fallback
mode. It's used to skip the RX resync logic completely, as it becomes
useless, and some objects may be released (for example, resync_async,
which is allocated and freed by the driver).

Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

aa3905c0

net/tls: Replace TLS_RX_SYNC_RUNNING with RCU · ea55ff3c

由 Maxim Mikityanskiy 提交于 6月 15, 2021

stable inclusion
from stable-5.10.43
commit 874ece252ed269f5ac1f55167a3f2735ab0f249f
bugzilla: 109284
CVE: NA

--------------------------------

[ Upstream commit 05fc8b6c ]

RCU synchronization is guaranteed to finish in finite time, unlike a
busy loop that polls a flag. This patch is a preparation for the bugfix
in the next patch, where the same synchronize_net() call will also be
used to sync with the TX datapath.
Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ea55ff3c

26 11月, 2020 1 次提交

net/tls: Protect from calling tls_dev_del for TLS RX twice · 025cc2fb

由 Maxim Mikityanskiy 提交于 11月 25, 2020

tls_device_offload_cleanup_rx doesn't clear tls_ctx->netdev after
calling tls_dev_del if TLX TX offload is also enabled. Clearing
tls_ctx->netdev gets postponed until tls_device_gc_task. It leaves a
time frame when tls_device_down may get called and call tls_dev_del for
RX one extra time, confusing the driver, which may lead to a crash.

This patch corrects this racy behavior by adding a flag to prevent
tls_device_down from calling tls_dev_del the second time.

Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20201125221810.69870-1-saeedm@nvidia.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

025cc2fb

18 11月, 2020 1 次提交

net/tls: Fix wrong record sn in async mode of device resync · 138559b9

由 Tariq Toukan 提交于 11月 15, 2020

In async_resync mode, we log the TCP seq of records until the async request
is completed. Later, in case one of the logged seqs matches the resync
request, we return it, together with its record serial number. Before this
fix, we mistakenly returned the serial number of the current record
instead.

Fixes: ed9b7646 ("net/tls: Add asynchronous resync")
Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
Reviewed-by: NBoris Pismenny <borisp@nvidia.com>
Link: https://lore.kernel.org/r/20201115131448.2702-1-tariqt@nvidia.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

138559b9

10 10月, 2020 1 次提交

net/tls: sendfile fails with ktls offload · ea1dd3e9

由 Rohit Maheshwari 提交于 10月 08, 2020

At first when sendpage gets called, if there is more data, 'more' in
tls_push_data() gets set which later sets pending_open_record_frags, but
when there is no more data in file left, and last time tls_push_data()
gets called, pending_open_record_frags doesn't get reset. And later when
2 bytes of encrypted alert comes as sendmsg, it first checks for
pending_open_record_frags, and since this is set, it creates a record with
0 data bytes to encrypt, meaning record length is prepend_size + tag_size
only, which causes problem.
We should set/reset pending_open_record_frags based on more bit.

Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: NRohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

ea1dd3e9

12 8月, 2020 1 次提交

net/tls: Fix kmap usage · b06c19d9

由 Ira Weiny 提交于 8月 10, 2020

When MSG_OOB is specified to tls_device_sendpage() the mapped page is
never unmapped.

Hold off mapping the page until after the flags are checked and the page
is actually needed.

Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b06c19d9

28 6月, 2020 2 次提交

net/tls: Add asynchronous resync · ed9b7646

由 Boris Pismenny 提交于 6月 08, 2020

This patch adds support for asynchronous resynchronization in tls_device.
Async resync follows two distinct stages:

1. The NIC driver indicates that it would like to resync on some TLS
record within the received packet (P), but the driver does not
know (yet) which of the TLS records within the packet.
At this stage, the NIC driver will query the device to find the exact
TCP sequence for resync (tcpsn), however, the driver does not wait
for the device to provide the response.

2. Eventually, the device responds, and the driver provides the tcpsn
within the resync packet to KTLS. Now, KTLS can check the tcpsn against
any processed TLS records within packet P, and also against any record
that is processed in the future within packet P.

The asynchronous resync path simplifies the device driver, as it can
save bits on the packet completion (32-bit TCP sequence), and pass this
information on an asynchronous command instead.
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

ed9b7646

Revert "net/tls: Add force_resync for driver resync" · acb5a07a

由 Boris Pismenny 提交于 6月 08, 2020

This reverts commit b3ae2459.
Revert the force resync API.
Not in use. To be replaced by a better async resync API downstream.
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Reviewed-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

acb5a07a

28 5月, 2020 1 次提交

net/tls: Add force_resync for driver resync · b3ae2459

由 Tariq Toukan 提交于 5月 27, 2020

This patch adds a field to the tls rx offload context which enables
drivers to force a send_resync call.

This field can be used by drivers to request a resync at the next
possible tls record. It is beneficial for hardware that provides the
resync sequence number asynchronously. In such cases, the packet that
triggered the resync does not contain the information required for a
resync. Instead, the driver requests resync for all the following
TLS record until the asynchronous notification with the resync request
TCP sequence arrives.

A following series for mlx5e ConnectX-6DX TLS RX offload support will
use this mechanism.
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Reviewed-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b3ae2459

22 3月, 2020 1 次提交

net/tls: Annotate access to sk_prot with READ_ONCE/WRITE_ONCE · d5bee737

由 Jakub Sitnicki 提交于 3月 17, 2020

sockmap performs lockless writes to sk->sk_prot on the following paths:

tcp_bpf_{recvmsg|sendmsg} / sock_map_unref
  sk_psock_put
    sk_psock_drop
      sk_psock_restore_proto
        WRITE_ONCE(sk->sk_prot, proto)

To prevent load/store tearing [1], and to make tooling aware of intentional
shared access [2], we need to annotate other sites that access sk_prot with
READ_ONCE/WRITE_ONCE macros.

Change done with Coccinelle with following semantic patch:

@@
expression E;
identifier I;
struct sock *sk;
identifier sk_prot =~ "^sk_prot$";
@@
(
 E =
-sk->sk_prot
+READ_ONCE(sk->sk_prot)
|
-sk->sk_prot = E
+WRITE_ONCE(sk->sk_prot, E)
|
-sk->sk_prot
+READ_ONCE(sk->sk_prot)
 ->I
)
Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d5bee737

20 2月, 2020 1 次提交

net/tls: Fix to avoid gettig invalid tls record · 06f5201c

由 Rohit Maheshwari 提交于 2月 19, 2020

Current code doesn't check if tcp sequence number is starting from (/after)
1st record's start sequnce number. It only checks if seq number is before
1st record's end sequnce number. This problem will always be a possibility
in re-transmit case. If a record which belongs to a requested seq number is
already deleted, tls_get_record will start looking into list and as per the
check it will look if seq number is before the end seq of 1st record, which
will always be true and will return 1st record always, it should in fact
return NULL.
As part of the fix, start looking each record only if the sequence number
lies in the list else return NULL.
There is one more check added, driver look for the start marker record to
handle tcp packets which are before the tls offload start sequence number,
hence return 1st record if the record is tls start marker and seq number is
before the 1st record's starting sequence number.

Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: NRohit Maheshwari <rohitm@chelsio.com>
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

06f5201c

20 12月, 2019 1 次提交

net/tls: add helper for testing if socket is RX offloaded · 8d5a49e9

由 Jakub Kicinski 提交于 12月 17, 2019

There is currently no way for driver to reliably check that
the socket it has looked up is in fact RX offloaded. Add
a helper. This allows drivers to catch misbehaving firmware.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d5a49e9

07 12月, 2019 1 次提交

net/tls: Fix return values to avoid ENOTSUPP · 4a5cdc60

由 Valentin Vidic 提交于 12月 05, 2019

ENOTSUPP is not available in userspace, for example:

  setsockopt failed, 524, Unknown error 524
Signed-off-by: NValentin Vidic <vvidic@valentin-vidic.from.hr>
Acked-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4a5cdc60

07 11月, 2019 2 次提交

net/tls: add a TX lock · 79ffe608

由 Jakub Kicinski 提交于 11月 05, 2019

TLS TX needs to release and re-acquire the socket lock if send buffer
fills up.

TLS SW TX path currently depends on only allowing one thread to enter
the function by the abuse of sk_write_pending. If another writer is
already waiting for memory no new ones are allowed in.

This has two problems:
 - writers don't wake other threads up when they leave the kernel;
   meaning that this scheme works for single extra thread (second
   application thread or delayed work) because memory becoming
   available will send a wake up request, but as Mallesham and
   Pooja report with larger number of threads it leads to threads
   being put to sleep indefinitely;
 - the delayed work does not get _scheduled_ but it may _run_ when
   other writers are present leading to crashes as writers don't
   expect state to change under their feet (same records get pushed
   and freed multiple times); it's hard to reliably bail from the
   work, however, because the mere presence of a writer does not
   guarantee that the writer will push pending records before exiting.

Ensuring wakeups always happen will make the code basically open
code a mutex. Just use a mutex.

The TLS HW TX path does not have any locking (not even the
sk_write_pending hack), yet it uses a per-socket sg_tx_data
array to push records.

Fixes: a42055e8 ("net/tls: Add support for async encryption of records for performance")
Reported-by: NMallesham  Jatharakonda <mallesh537@gmail.com>
Reported-by: NPooja Trivedi <poojatrivedi@gmail.com>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79ffe608

net/tls: don't pay attention to sk_write_pending when pushing partial records · 02b1fa07

由 Jakub Kicinski 提交于 11月 05, 2019

sk_write_pending being not zero does not guarantee that partial
record will be pushed. If the thread waiting for memory times out
the pending record may get stuck.

In case of tls_device there is no path where parial record is
set and writer present in the first place. Partial record is
set only in tls_push_sg() and tls_push_sg() will return an
error immediately. All tls_device callers of tls_push_sg()
will return (and not wait for memory) if it failed.

Fixes: a42055e8 ("net/tls: Add support for async encryption of records for performance")
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02b1fa07

07 10月, 2019 3 次提交

net/tls: pass context to tls_device_decrypted() · 4de30a8d

由 Jakub Kicinski 提交于 10月 06, 2019

Avoid unnecessary pointer chasing and calculations, callers already
have most of the state tls_device_decrypted() needs.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4de30a8d

net/tls: make allocation failure unlikely · 34ef1ed1

由 Jakub Kicinski 提交于 10月 06, 2019

Make sure GCC realizes it's unlikely that allocations will fail.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

34ef1ed1

net/tls: mark sk->err being set as unlikely · 93277b25

由 Jakub Kicinski 提交于 10月 06, 2019

Tell GCC sk->err is not likely to be set.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93277b25

06 10月, 2019 3 次提交

net/tls: add TlsDeviceRxResync statistic · a4d26fdb

由 Jakub Kicinski 提交于 10月 04, 2019

Add a statistic for number of RX resyncs sent down to the NIC.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a4d26fdb

net/tls: add device decrypted trace point · 9ec1c6ac

由 Jakub Kicinski 提交于 10月 04, 2019

Add a tracepoint to the TLS offload's fast path. This tracepoint
can be used to track the decrypted and encrypted status of received
records. Records decrypted by the device should have decrypted set
to 1, records which have neither decrypted nor decrypted set are
partially decrypted, require re-encryption and therefore are most
expensive to deal with.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ec1c6ac

net/tls: add tracing for device/offload events · 8538d29c

由 Jakub Kicinski 提交于 10月 04, 2019

Add tracing of device-related interaction to aid performance
analysis, especially around resync:

 tls:tls_device_offload_set
 tls:tls_device_rx_resync_send
 tls:tls_device_rx_resync_nh_schedule
 tls:tls_device_rx_resync_nh_delay
 tls:tls_device_tx_resync_req
 tls:tls_device_tx_resync_send
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8538d29c

08 9月, 2019 4 次提交

net/tls: align non temporal copy to cache lines · e681cc60

由 Jakub Kicinski 提交于 9月 06, 2019

Unlike normal TCP code TLS has to touch the cache lines
it copies into to fill header info. On memory-heavy workloads
having non temporal stores and normal accesses targeting
the same cache line leads to significant overhead.

Measured 3% overhead running 3600 round robin connections
with additional memory heavy workload.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e681cc60

net/tls: remove the record tail optimization · e7b159a4

由 Jakub Kicinski 提交于 9月 06, 2019

For TLS device offload the tag/message authentication code are
filled in by the device. The kernel merely reserves space for
them. Because device overwrites it, the contents of the tag make
do no matter. Current code tries to save space by reusing the
header as the tag. This, however, leads to an additional frag
being created and defeats buffer coalescing (which trickles
all the way down to the drivers).

Remove this optimization, and try to allocate the space for
the tag in the usual way, leave the memory uninitialized.
If memory allocation fails rewind the record pointer so that
we use the already copied user data as tag.

Note that the optimization was actually buggy, as the tag
for TLS 1.2 is 16 bytes, but header is just 13, so the reuse
may had looked past the end of the page..
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e7b159a4

net/tls: use RCU for the adder to the offload record list · d4774ac0

由 Jakub Kicinski 提交于 9月 06, 2019

All modifications to TLS record list happen under the socket
lock. Since records form an ordered queue readers are only
concerned about elements being removed, additions can happen
concurrently.

Use RCU primitives to ensure the correct access types
(READ_ONCE/WRITE_ONCE).
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4774ac0

net/tls: unref frags in order · 7ccd4519

由 Jakub Kicinski 提交于 9月 06, 2019

It's generally more cache friendly to walk arrays in order,
especially those which are likely not in cache.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7ccd4519

05 9月, 2019 3 次提交

net/tls: dedup the record cleanup · 6e3d02b6

由 Jakub Kicinski 提交于 9月 02, 2019

If retransmit record hint fall into the cleanup window we will
free it by just walking the list. No need to duplicate the code.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NJohn Hurley <john.hurley@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e3d02b6

net/tls: narrow down the critical area of device_offload_lock · 3544c98a

由 Jakub Kicinski 提交于 9月 02, 2019

On setsockopt path we need to hold device_offload_lock from
the moment we check netdev is up until the context is fully
ready to be added to the tls_device_list.

No need to hold it around the get_netdev_for_sock().
Change the code and remove the confusing comment.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NJohn Hurley <john.hurley@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3544c98a

net/tls: don't jump to return · 90962b48

由 Jakub Kicinski 提交于 9月 02, 2019

Reusing parts of error path for normal exit will make
next commit harder to read, untangle the two.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NJohn Hurley <john.hurley@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

90962b48

01 9月, 2019 1 次提交

net/tls: use RCU protection on icsk->icsk_ulp_data · 15a7dea7

由 Jakub Kicinski 提交于 8月 30, 2019

We need to make sure context does not get freed while diag
code is interrogating it. Free struct tls_context with
kfree_rcu().

We add the __rcu annotation directly in icsk, and cast it
away in the datapath accessor. Presumably all ULPs will
do a similar thing.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

15a7dea7

09 8月, 2019 1 次提交

net/tls: prevent skb_orphan() from leaking TLS plain text with offload · 41477662

由 Jakub Kicinski 提交于 8月 07, 2019

sk_validate_xmit_skb() and drivers depend on the sk member of
struct sk_buff to identify segments requiring encryption.
Any operation which removes or does not preserve the original TLS
socket such as skb_orphan() or skb_clone() will cause clear text
leaks.

Make the TCP socket underlying an offloaded TLS connection
mark all skbs as decrypted, if TLS TX is in offload mode.
Then in sk_validate_xmit_skb() catch skbs which have no socket
(or a socket with no validation) and decrypted flag set.

Note that CONFIG_SOCK_VALIDATE_XMIT, CONFIG_TLS_DEVICE and
sk->sk_validate_xmit_skb are slightly interchangeable right now,
they all imply TLS offload. The new checks are guarded by
CONFIG_TLS_DEVICE because that's the option guarding the
sk_buff->decrypted member.

Second, smaller issue with orphaning is that it breaks
the guarantee that packets will be delivered to device
queues in-order. All TLS offload drivers depend on that
scheduling property. This means skb_orphan_partial()'s
trick of preserving partial socket references will cause
issues in the drivers. We need a full orphan, and as a
result netem delay/throttling will cause all TLS offload
skbs to be dropped.

Reusing the sk_buff->decrypted flag also protects from
leaking clear text when incoming, decrypted skb is redirected
(e.g. by TC).

See commit 0608c69c ("bpf: sk_msg, sock{map|hash} redirect
through ULP") for justification why the internal flag is safe.
The only location which could leak the flag in is tcp_bpf_sendmsg(),
which is taken care of by clearing the previously unused bit.

v2:
 - remove superfluous decrypted mark copy (Willem);
 - remove the stale doc entry (Boris);
 - rely entirely on EOR marking to prevent coalescing (Boris);
 - use an internal sendpages flag instead of marking the socket
   (Boris).
v3 (Willem):
 - reorganize the can_skb_orphan_partial() condition;
 - fix the flag leak-in through tcp_bpf_sendmsg.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Reviewed-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

41477662

31 7月, 2019 1 次提交

net: Use skb_frag_off accessors · b54c9d5b

由 Jonathan Lemon 提交于 7月 30, 2019

Use accessor functions for skb fragment's page_offset instead
of direct references, in preparation for bvec conversion.
Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b54c9d5b

23 7月, 2019 1 次提交

net: Use skb accessors in network core · d8e18a51

由 Matthew Wilcox (Oracle) 提交于 7月 22, 2019

In preparation for unifying the skb_frag and bio_vec, use the fine
accessors which already exist and use skb_frag_t instead of
struct skb_frag_struct.
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d8e18a51

22 7月, 2019 2 次提交

net/tls: remove sock unlock/lock around strp_done() · 313ab004

由 John Fastabend 提交于 7月 19, 2019

The tls close() callback currently drops the sock lock to call
strp_done(). Split up the RX cleanup into stopping the strparser
and releasing most resources, syncing strparser and finally
freeing the context.

To avoid the need for a strp_done() call on the cleanup path
of device offload make sure we don't arm the strparser until
we are sure init will be successful.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

313ab004

net/tls: don't arm strparser immediately in tls_set_sw_offload() · 318892ac

由 Jakub Kicinski 提交于 7月 19, 2019

In tls_set_device_offload_rx() we prepare the software context
for RX fallback and proceed to add the connection to the device.
Unfortunately, software context prep includes arming strparser
so in case of a later error we have to release the socket lock
to call strp_done().

In preparation for not releasing the socket lock half way through
callbacks move arming strparser into a separate function.
Following patches will make use of that.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

318892ac

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功