提交 · 85ddd9c3173102930c16b0cfe8dbb771af434532 · openeuler / Kernel

16 1月, 2020 3 次提交

bpf: Sockmap/tls, fix pop data with SK_DROP return code · 7361d448

由 John Fastabend 提交于 1月 11, 2020

When user returns SK_DROP we need to reset the number of copied bytes
to indicate to the user the bytes were dropped and not sent. If we
don't reset the copied arg sendmsg will return as if those bytes were
copied giving the user a positive return value.

This works as expected today except in the case where the user also
pops bytes. In the pop case the sg.size is reduced but we don't correctly
account for this when copied bytes is reset. The popped bytes are not
accounted for and we return a small positive value potentially confusing
the user.

The reason this happens is due to a typo where we do the wrong comparison
when accounting for pop bytes. In this fix notice the if/else is not
needed and that we have a similar problem if we push data except its not
visible to the user because if delta is larger the sg.size we return a
negative value so it appears as an error regardless.

Fixes: 7246d8ed ("bpf: helper to pop data from messages")
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/bpf/20200111061206.8028-9-john.fastabend@gmail.com

7361d448

bpf: Sockmap/tls, skmsg can have wrapped skmsg that needs extra chaining · 9aaaa568

由 John Fastabend 提交于 1月 11, 2020

Its possible through a set of push, pop, apply helper calls to construct
a skmsg, which is just a ring of scatterlist elements, with the start
value larger than the end value. For example,

      end       start
  |_0_|_1_| ... |_n_|_n+1_|

Where end points at 1 and start points and n so that valid elements is
the set {n, n+1, 0, 1}.

Currently, because we don't build the correct chain only {n, n+1} will
be sent. This adds a check and sg_chain call to correctly submit the
above to the crypto and tls send path.

Fixes: d3b18ad3 ("tls: add bpf support to sk_msg handling")
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/bpf/20200111061206.8028-8-john.fastabend@gmail.com

9aaaa568

bpf: Sockmap/tls, tls_sw can create a plaintext buf > encrypt buf · d468e477

由 John Fastabend 提交于 1月 11, 2020

It is possible to build a plaintext buffer using push helper that is larger
than the allocated encrypt buffer. When this record is pushed to crypto
layers this can result in a NULL pointer dereference because the crypto
API expects the encrypt buffer is large enough to fit the plaintext
buffer. Kernel splat below.

To resolve catch the cases this can happen and split the buffer into two
records to send individually. Unfortunately, there is still one case to
handle where the split creates a zero sized buffer. In this case we merge
the buffers and unmark the split. This happens when apply is zero and user
pushed data beyond encrypt buffer. This fixes the original case as well
because the split allocated an encrypt buffer larger than the plaintext
buffer and the merge simply moves the pointers around so we now have
a reference to the new (larger) encrypt buffer.

Perhaps its not ideal but it seems the best solution for a fixes branch
and avoids handling these two cases, (a) apply that needs split and (b)
non apply case. The are edge cases anyways so optimizing them seems not
necessary unless someone wants later in next branches.

[  306.719107] BUG: kernel NULL pointer dereference, address: 0000000000000008
[...]
[  306.747260] RIP: 0010:scatterwalk_copychunks+0x12f/0x1b0
[...]
[  306.770350] Call Trace:
[  306.770956]  scatterwalk_map_and_copy+0x6c/0x80
[  306.772026]  gcm_enc_copy_hash+0x4b/0x50
[  306.772925]  gcm_hash_crypt_remain_continue+0xef/0x110
[  306.774138]  gcm_hash_crypt_continue+0xa1/0xb0
[  306.775103]  ? gcm_hash_crypt_continue+0xa1/0xb0
[  306.776103]  gcm_hash_assoc_remain_continue+0x94/0xa0
[  306.777170]  gcm_hash_assoc_continue+0x9d/0xb0
[  306.778239]  gcm_hash_init_continue+0x8f/0xa0
[  306.779121]  gcm_hash+0x73/0x80
[  306.779762]  gcm_encrypt_continue+0x6d/0x80
[  306.780582]  crypto_gcm_encrypt+0xcb/0xe0
[  306.781474]  crypto_aead_encrypt+0x1f/0x30
[  306.782353]  tls_push_record+0x3b9/0xb20 [tls]
[  306.783314]  ? sk_psock_msg_verdict+0x199/0x300
[  306.784287]  bpf_exec_tx_verdict+0x3f2/0x680 [tls]
[  306.785357]  tls_sw_sendmsg+0x4a3/0x6a0 [tls]

test_sockmap test signature to trigger bug,

[TEST]: (1, 1, 1, sendmsg, pass,redir,start 1,end 2,pop (1,2),ktls,):

Fixes: d3b18ad3 ("tls: add bpf support to sk_msg handling")
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/bpf/20200111061206.8028-7-john.fastabend@gmail.com

d468e477

07 12月, 2019 1 次提交

net/tls: Fix return values to avoid ENOTSUPP · 4a5cdc60

由 Valentin Vidic 提交于 12月 05, 2019

ENOTSUPP is not available in userspace, for example:

  setsockopt failed, 524, Unknown error 524
Signed-off-by: NValentin Vidic <vvidic@valentin-vidic.from.hr>
Acked-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4a5cdc60

29 11月, 2019 4 次提交

net/tls: use sg_next() to walk sg entries · c5daa6cc

由 Jakub Kicinski 提交于 11月 27, 2019

Partially sent record cleanup path increments an SG entry
directly instead of using sg_next(). This should not be a
problem today, as encrypted messages should be always
allocated as arrays. But given this is a cleanup path it's
easy to miss was this ever to change. Use sg_next(), and
simplify the code.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c5daa6cc

net/tls: remove the dead inplace_crypto code · 9e5ffed3

由 Jakub Kicinski 提交于 11月 27, 2019

Looks like when BPF support was added by commit d3b18ad3
("tls: add bpf support to sk_msg handling") and
commit d829e9c4 ("tls: convert to generic sk_msg interface")
it broke/removed the support for in-place crypto as added by
commit 4e6d4720 ("tls: Add support for inplace records
encryption").

The inplace_crypto member of struct tls_rec is dead, inited
to zero, and sometimes set to zero again. It used to be
set to 1 when record was allocated, but the skmsg code doesn't
seem to have been written with the idea of in-place crypto
in mind.

Since non trivial effort is required to bring the feature back
and we don't really have the HW to measure the benefit just
remove the left over support for now to avoid confusing readers.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e5ffed3

net/tls: free the record on encryption error · d10523d0

由 Jakub Kicinski 提交于 11月 27, 2019

When tls_do_encryption() fails the SG lists are left with the
SG_END and SG_CHAIN marks in place. One could hope that once
encryption fails we will never see the record again, but that
is in fact not true. Commit d3b18ad3 ("tls: add bpf support
to sk_msg handling") added special handling to ENOMEM and ENOSPC
errors which mean we may see the same record re-submitted.

As suggested by John free the record, the BPF code is already
doing just that.

Reported-by: syzbot+df0d4ec12332661dd1f9@syzkaller.appspotmail.com
Fixes: d3b18ad3 ("tls: add bpf support to sk_msg handling")
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d10523d0

net/tls: take into account that bpf_exec_tx_verdict() may free the record · c329ef96

由 Jakub Kicinski 提交于 11月 27, 2019

bpf_exec_tx_verdict() may free the record if tls_push_record()
fails, or if the entire record got consumed by BPF. Re-check
ctx->open_rec before touching the data.

Fixes: d3b18ad3 ("tls: add bpf support to sk_msg handling")
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c329ef96

20 11月, 2019 1 次提交

net/tls: enable sk_msg redirect to tls socket egress · d4ffb02d

由 Willem de Bruijn 提交于 11月 18, 2019

Bring back tls_sw_sendpage_locked. sk_msg redirection into a socket
with TLS_TX takes the following path:

  tcp_bpf_sendmsg_redir
    tcp_bpf_push_locked
      tcp_bpf_push
        kernel_sendpage_locked
          sock->ops->sendpage_locked

Also update the flags test in tls_sw_sendpage_locked to allow flag
MSG_NO_SHARED_FRAGS. bpf_tcp_sendmsg sets this.

Link: https://lore.kernel.org/netdev/CA+FuTSdaAawmZ2N8nfDDKu3XLpXBbMtcCT0q4FntDD2gn8ASUw@mail.gmail.com/T/#t
Link: https://github.com/wdebruij/kerneltools/commits/icept.2
Fixes: 0608c69c ("bpf: sk_msg, sock{map|hash} redirect through ULP")
Fixes: f3de19af ("Revert \"net/tls: remove unused function tls_sw_sendpage_locked\"")
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4ffb02d

07 11月, 2019 2 次提交

net/tls: add a TX lock · 79ffe608

由 Jakub Kicinski 提交于 11月 05, 2019

TLS TX needs to release and re-acquire the socket lock if send buffer
fills up.

TLS SW TX path currently depends on only allowing one thread to enter
the function by the abuse of sk_write_pending. If another writer is
already waiting for memory no new ones are allowed in.

This has two problems:
 - writers don't wake other threads up when they leave the kernel;
   meaning that this scheme works for single extra thread (second
   application thread or delayed work) because memory becoming
   available will send a wake up request, but as Mallesham and
   Pooja report with larger number of threads it leads to threads
   being put to sleep indefinitely;
 - the delayed work does not get _scheduled_ but it may _run_ when
   other writers are present leading to crashes as writers don't
   expect state to change under their feet (same records get pushed
   and freed multiple times); it's hard to reliably bail from the
   work, however, because the mere presence of a writer does not
   guarantee that the writer will push pending records before exiting.

Ensuring wakeups always happen will make the code basically open
code a mutex. Just use a mutex.

The TLS HW TX path does not have any locking (not even the
sk_write_pending hack), yet it uses a per-socket sg_tx_data
array to push records.

Fixes: a42055e8 ("net/tls: Add support for async encryption of records for performance")
Reported-by: NMallesham  Jatharakonda <mallesh537@gmail.com>
Reported-by: NPooja Trivedi <poojatrivedi@gmail.com>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79ffe608

net/tls: don't pay attention to sk_write_pending when pushing partial records · 02b1fa07

由 Jakub Kicinski 提交于 11月 05, 2019

sk_write_pending being not zero does not guarantee that partial
record will be pushed. If the thread waiting for memory times out
the pending record may get stuck.

In case of tls_device there is no path where parial record is
set and writer present in the first place. Partial record is
set only in tls_push_sg() and tls_push_sg() will return an
error immediately. All tls_device callers of tls_push_sg()
will return (and not wait for memory) if it failed.

Fixes: a42055e8 ("net/tls: Add support for async encryption of records for performance")
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02b1fa07

07 10月, 2019 3 次提交

net/tls: store decrypted on a single bit · bc76e5bb

由 Jakub Kicinski 提交于 10月 06, 2019

Use a single bit instead of boolean to remember if packet
was already decrypted.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc76e5bb

net/tls: store async_capable on a single bit · 5c5458ec

由 Jakub Kicinski 提交于 10月 06, 2019

Store async_capable on a single bit instead of a full integer
to save space.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c5458ec

net/tls: pass context to tls_device_decrypted() · 4de30a8d

由 Jakub Kicinski 提交于 10月 06, 2019

Avoid unnecessary pointer chasing and calculations, callers already
have most of the state tls_device_decrypted() needs.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4de30a8d

06 10月, 2019 1 次提交

net/tls: add TlsDecryptError stat · 5c5ec668

由 Jakub Kicinski 提交于 10月 04, 2019

Add a statistic for TLS record decryption errors.

Since devices are supposed to pass records as-is when they
encounter errors this statistic will count bad records in
both pure software and inline crypto configurations.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c5ec668

05 9月, 2019 1 次提交

net/tls: clean up the number of #ifdefs for CONFIG_TLS_DEVICE · be2fbc15

由 Jakub Kicinski 提交于 9月 02, 2019

TLS code has a number of #ifdefs which make the code a little
harder to follow. Recent fixes removed the ifdef around the
TLS_HW define, so we can switch to the often used pattern
of defining tls_device functions as empty static inlines
in the header when CONFIG_TLS_DEVICE=n.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NJohn Hurley <john.hurley@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be2fbc15

22 7月, 2019 3 次提交

net/tls: remove sock unlock/lock around strp_done() · 313ab004

由 John Fastabend 提交于 7月 19, 2019

The tls close() callback currently drops the sock lock to call
strp_done(). Split up the RX cleanup into stopping the strparser
and releasing most resources, syncing strparser and finally
freeing the context.

To avoid the need for a strp_done() call on the cleanup path
of device offload make sure we don't arm the strparser until
we are sure init will be successful.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

313ab004

net/tls: remove close callback sock unlock/lock around TX work flush · f87e62d4

由 John Fastabend 提交于 7月 19, 2019

The tls close() callback currently drops the sock lock, makes a
cancel_delayed_work_sync() call, and then relocks the sock.

By restructuring the code we can avoid droping lock and then
reclaiming it. To simplify this we do the following,

 tls_sk_proto_close
 set_bit(CLOSING)
 set_bit(SCHEDULE)
 cancel_delay_work_sync() <- cancel workqueue
 lock_sock(sk)
 ...
 release_sock(sk)
 strp_done()

Setting the CLOSING bit prevents the SCHEDULE bit from being
cleared by any workqueue items e.g. if one happens to be
scheduled and run between when we set SCHEDULE bit and cancel
work. Then because SCHEDULE bit is set now no new work will
be scheduled.

Tested with net selftests and bpf selftests.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

f87e62d4

net/tls: don't arm strparser immediately in tls_set_sw_offload() · 318892ac

由 Jakub Kicinski 提交于 7月 19, 2019

In tls_set_device_offload_rx() we prepare the software context
for RX fallback and proceed to add the connection to the device.
Unfortunately, software context prep includes arming strparser
so in case of a later error we have to release the socket lock
to call strp_done().

In preparation for not releasing the socket lock half way through
callbacks move arming strparser into a separate function.
Following patches will make use of that.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

318892ac

08 7月, 2019 1 次提交

net/tls: fix poll ignoring partially copied records · 13aecb17

由 Jakub Kicinski 提交于 7月 04, 2019

David reports that RPC applications which use epoll() occasionally
get stuck, and that TLS ULP causes the kernel to not wake applications,
even though read() will return data.

This is indeed true. The ctx->rx_list which holds partially copied
records is not consulted when deciding whether socket is readable.

Note that SO_RCVLOWAT with epoll() is and has always been broken for
kernel TLS. We'd need to parse all records from the TCP layer, instead
of just the first one.

Fixes: 692d7b5d ("tls: Fix recvmsg() to be able to peek across multiple records")
Reported-by: NDavid Beckett <david.beckett@netronome.com>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13aecb17

13 6月, 2019 1 次提交

net: tls, correctly account for copied bytes with multiple sk_msgs · 648ee6ce

由 John Fastabend 提交于 6月 12, 2019

tls_sw_do_sendpage needs to return the total number of bytes sent
regardless of how many sk_msgs are allocated. Unfortunately, copied
(the value we return up the stack) is zero'd before each new sk_msg
is allocated so we only return the copied size of the last sk_msg used.

The caller (splice, etc.) of sendpage will then believe only part
of its data was sent and send the missing chunks again. However,
because the data actually was sent the receiver will get multiple
copies of the same data.

To reproduce this do multiple sendfile calls with a length close to
the max record size. This will in turn call splice/sendpage, sendpage
may use multiple sk_msg in this case and then returns the incorrect
number of bytes. This will cause splice to resend creating duplicate
data on the receiver. Andre created a C program that can easily
generate this case so we will push a similar selftest for this to
bpf-next shortly.

The fix is to _not_ zero the copied field so that the total sent
bytes is returned.
Reported-by: NSteinar H. Gunderson <steinar+kernel@gunderson.no>
Reported-by: NAndre Tomt <andre@tomt.net>
Tested-by: NAndre Tomt <andre@tomt.net>
Fixes: d829e9c4 ("tls: convert to generic sk_msg interface")
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

648ee6ce

12 6月, 2019 3 次提交

net/tls: add kernel-driven TLS RX resync · f953d33b

由 Jakub Kicinski 提交于 6月 10, 2019

TLS offload device may lose sync with the TCP stream if packets
arrive out of order.  Drivers can currently request a resync at
a specific TCP sequence number.  When a record is found starting
at that sequence number kernel will inform the device of the
corresponding record number.

This requires the device to constantly scan the stream for a
known pattern (constant bytes of the header) after sync is lost.

This patch adds an alternative approach which is entirely under
the control of the kernel.  Kernel tracks records it had to fully
decrypt, even though TLS socket is in TLS_HW mode.  If multiple
records did not have any decrypted parts - it's a pretty strong
indication that the device is out of sync.

We choose the min number of fully encrypted records to be 2,
which should hopefully be more than will get retransmitted at
a time.

After kernel decides the device is out of sync it schedules a
resync request.  If the TCP socket is empty the resync gets
performed immediately.  If socket is not empty we leave the
record parser to resync when next record comes.

Before resync in message parser we peek at the TCP socket and
don't attempt the sync if the socket already has some of the
next record queued.

On resync failure (encrypted data continues to flow in) we
retry with exponential backoff, up to once every 128 records
(with a 16k record thats at most once every 2M of data).
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f953d33b

net/tls: rename handle_device_resync() · fe58a5a0

由 Jakub Kicinski 提交于 6月 10, 2019

handle_device_resync() doesn't describe the function very well.
The function checks if resync should be issued upon parsing of
a new record.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fe58a5a0

net/tls: pass record number as a byte array · 89fec474

由 Jakub Kicinski 提交于 6月 10, 2019

TLS offload code casts record number to a u64.  The buffer
should be aligned to 8 bytes, but its actually a __be64, and
the rest of the TLS code treats it as big int.  Make the
offload callbacks take a byte array, drivers can make the
choice to do the ugly cast if they want to.

Prepare for copying the record number onto the stack by
defining a constant for max size of the byte array.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

89fec474

05 6月, 2019 2 次提交

net/tls: don't pass version to tls_advance_record_sn() · fb0f886f

由 Jakub Kicinski 提交于 6月 03, 2019

All callers pass prot->version as the last parameter
of tls_advance_record_sn(), yet tls_advance_record_sn()
itself needs a pointer to prot.  Pass prot from callers.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb0f886f

net/tls: don't look for decrypted frames on non-offloaded sockets · b9d8fec9

由 Jakub Kicinski 提交于 6月 03, 2019

If the RX config of a TLS socket is SW, there is no point iterating
over the fragments and checking if frame is decrypted.  It will
always be fully encrypted.  Note that in fully encrypted case
the function doesn't actually touch any offload-related state,
so it's safe to call for TLS_SW, today.  Soon we will introduce
code which can only be called for offloaded contexts.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b9d8fec9

27 5月, 2019 2 次提交

net/tls: fix no wakeup on partial reads · 04b25a54

由 Jakub Kicinski 提交于 5月 24, 2019

When tls_sw_recvmsg() partially copies a record it pops that
record from ctx->recv_pkt and places it on rx_list.

Next iteration of tls_sw_recvmsg() reads from rx_list via
process_rx_list() before it enters the decryption loop.
If there is no more records to be read tls_wait_data()
will put the process on the wait queue and got to sleep.
This is incorrect, because some data was already copied
in process_rx_list().

In case of RPC connections process may never get woken up,
because peer also simply blocks in read().

I think this may also fix a similar issue when BPF is at
play, because after __tcp_bpf_recvmsg() returns some data
we subtract it from len and use continue to restart the
loop, but len could have just reached 0, so again we'd
sleep unnecessarily. That's added by:
commit d3b18ad3 ("tls: add bpf support to sk_msg handling")

Fixes: 692d7b5d ("tls: Fix recvmsg() to be able to peek across multiple records")
Reported-by: NDavid Beckett <david.beckett@netronome.com>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Tested-by: NDavid Beckett <david.beckett@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04b25a54

net/tls: fix lowat calculation if some data came from previous record · 46a16959

由 Jakub Kicinski 提交于 5月 24, 2019

If some of the data came from the previous record, i.e. from
the rx_list it had already been decrypted, so it's not counted
towards the "decrypted" variable, but the "copied" variable.
Take that into account when checking lowat.

When calculating lowat target we need to pass the original len.
E.g. if lowat is at 80, len is 100 and we had 30 bytes on rx_list
target would currently be incorrectly calculated as 70, even though
we only need 50 more bytes to make up the 80.

Fixes: 692d7b5d ("tls: Fix recvmsg() to be able to peek across multiple records")
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Tested-by: NDavid Beckett <david.beckett@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46a16959

10 5月, 2019 1 次提交

net/tls: handle errors from padding_length() · b53f4976

由 Jakub Kicinski 提交于 5月 09, 2019

At the time padding_length() is called the record header
is still part of the message.  If malicious TLS 1.3 peer
sends an all-zero record padding_length() will stop at
the record header, and return full length of the data
including the tail_size.

Subsequent subtraction of prot->overhead_size from rxm->full_len
will cause rxm->full_len to turn negative.  skb accessors,
however, will always catch resulting out-of-bounds operation,
so in practice this fix comes down to returning the correct
error code.  It also fixes a set but not used warning.

This code was added by commit 130b392c ("net: tls: Add tls 1.3 support").

CC: Dave Watson <davejwatson@fb.com>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b53f4976

21 4月, 2019 1 次提交

net/tls: don't leak IV and record seq when offload fails · 12c76861

由 Jakub Kicinski 提交于 4月 19, 2019

When device refuses the offload in tls_set_device_offload_rx()
it calls tls_sw_free_resources_rx() to clean up software context
state.

Unfortunately, tls_sw_free_resources_rx() does not free all
the state tls_set_sw_offload() allocated - it leaks IV and
sequence number buffers.  All other code paths which lead to
tls_sw_release_resources_rx() (which tls_sw_free_resources_rx()
calls) free those right before the call.

Avoid the leak by moving freeing of iv and rec_seq into
tls_sw_release_resources_rx().

Fixes: 4799ac81 ("tls: Add rx inline crypto offload")
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

12c76861

11 4月, 2019 1 次提交

net/tls: don't leak partially sent record in device mode · 35b71a34

由 Jakub Kicinski 提交于 4月 10, 2019

David reports that tls triggers warnings related to
sk->sk_forward_alloc not being zero at destruction time:

WARNING: CPU: 5 PID: 6831 at net/core/stream.c:206 sk_stream_kill_queues+0x103/0x110
WARNING: CPU: 5 PID: 6831 at net/ipv4/af_inet.c:160 inet_sock_destruct+0x15b/0x170

When sender fills up the write buffer and dies from
SIGPIPE. This is due to the device implementation
not cleaning up the partially_sent_record.

This is because commit a42055e8 ("net/tls: Add support for async encryption of records for performance")
moved the partial record cleanup to the SW-only path.

Fixes: a42055e8 ("net/tls: Add support for async encryption of records for performance")
Reported-by: NDavid Beckett <david.beckett@netronome.com>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35b71a34

30 3月, 2019 1 次提交

net: tls: prevent false connection termination with offload · c43ac97b

由 Jakub Kicinski 提交于 3月 28, 2019

Only decrypt_internal() performs zero copy on rx, all paths
which don't hit decrypt_internal() must set zc to false,
otherwise tls_sw_recvmsg() may return 0 causing the application
to believe that that connection got closed.

Currently this happens with device offload when new record
is first read from.

Fixes: d069b780 ("tls: Fix tls_device receive")
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Reported-by: NDavid Beckett <david.beckett@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c43ac97b

22 3月, 2019 1 次提交

net/tls: Replace kfree_skb() with consume_skb() · a88c26f6

由 Vakul Garg 提交于 3月 21, 2019

To free the skb in normal course of processing, consume_skb() should be
used. Only for failure paths, skb_free() is intended to be used.

https://www.kernel.org/doc/htmldocs/networking/API-consume-skb.htmlSigned-off-by: NVakul Garg <vakul.garg@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a88c26f6

21 3月, 2019 1 次提交

net/tls: Add support of AES128-CCM based ciphers · f295b3ae

由 Vakul Garg 提交于 3月 20, 2019

Added support for AES128-CCM based record encryption. AES128-CCM is
similar to AES128-GCM. Both of them have same salt/iv/mac size. The
notable difference between the two is that while invoking AES128-CCM
operation, the salt||nonce (which is passed as IV) has to be prefixed
with a hardcoded value '2'. Further, CCM implementation in kernel
requires IV passed in crypto_aead_request() to be full '16' bytes.
Therefore, the record structure 'struct tls_rec' has been modified to
reserve '16' bytes for IV. This works for both GCM and CCM based cipher.
Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f295b3ae

04 3月, 2019 3 次提交

tls: Fix tls_device receive · d069b780

由 Boris Pismenny 提交于 2月 27, 2019

Currently, the receive function fails to handle records already
decrypted by the device due to the commit mentioned below.

This commit advances the TLS record sequence number and prepares the context
to handle the next record.

Fixes: fedf201e ("net: tls: Refactor control message handling on recv")
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Reviewed-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d069b780

tls: Fix mixing between async capable and async · 7754bd63

由 Eran Ben Elisha 提交于 2月 27, 2019

Today, tls_sw_recvmsg is capable of using asynchronous mode to handle
application data TLS records. Moreover, it assumes that if the cipher
can be handled asynchronously, then all packets will be processed
asynchronously.

However, this assumption is not always true. Specifically, for AES-GCM
in TLS1.2, it causes data corruption, and breaks user applications.

This patch fixes this problem by separating the async capability from
the decryption operation result.

Fixes: c0ab4732 ("net/tls: Do not use async crypto for non-data records")
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7754bd63

tls: Fix write space handling · 7463d3a2

由 Boris Pismenny 提交于 2月 27, 2019

TLS device cannot use the sw context. This patch returns the original
tls device write space handler and moves the sw/device specific portions
to the relevant files.

Also, we remove the write_space call for the tls_sw flow, because it
handles partial records in its delayed tx work handler.

Fixes: a42055e8 ("net/tls: Add support for async encryption of records for performance")
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Reviewed-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7463d3a2

25 2月, 2019 1 次提交

tls: Return type of non-data records retrieved using MSG_PEEK in recvmsg · 2b794c40

由 Vakul Garg 提交于 2月 23, 2019

The patch enables returning 'type' in msghdr for records that are
retrieved with MSG_PEEK in recvmsg. Further it prevents records peeked
from socket from getting clubbed with any other record of different
type when records are subsequently dequeued from strparser.

For each record, we now retain its type in sk_buff's control buffer
cb[]. Inside control buffer, record's full length and offset are already
stored by strparser in 'struct strp_msg'. We store record type after
'struct strp_msg' inside 'struct tls_msg'. For tls1.2, the type is
stored just after record dequeue. For tls1.3, the type is stored after
record has been decrypted.

Inside process_rx_list(), before processing a non-data record, we check
that we must be able to return back the record type to the user
application. If not, the decrypted records in tls context's rx_list is
left there without consuming any data.

Fixes: 692d7b5d ("tls: Fix recvmsg() to be able to peek across multiple records")
Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2b794c40

20 2月, 2019 1 次提交

net/tls: Move protocol constants from cipher context to tls context · 4509de14

由 Vakul Garg 提交于 2月 14, 2019

Each tls context maintains two cipher contexts (one each for tx and rx
directions). For each tls session, the constants such as protocol
version, ciphersuite, iv size, associated data size etc are same for
both the directions and need to be stored only once per tls context.
Hence these are moved from 'struct cipher_context' to 'struct
tls_prot_info' and stored only once in 'struct tls_context'.
Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4509de14

13 2月, 2019 1 次提交

net/tls: Do not use async crypto for non-data records · c0ab4732

由 Vakul Garg 提交于 2月 11, 2019

Addition of tls1.3 support broke tls1.2 handshake when async crypto
accelerator is used. This is because the record type for non-data
records is not propagated to user application. Also when async
decryption happens, the decryption does not stop when two different
types of records get dequeued and submitted for decryption. To address
it, we decrypt tls1.2 non-data records in synchronous way. We check
whether the record we just processed has same type as the previous one
before checking for async condition and jumping to dequeue next record.

Fixes: 130b392c ("net: tls: Add tls 1.3 support")
Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c0ab4732

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功