提交 · e97f0bc7be88b00baeecc0b7c46ca3a0ff85d2a4 · openanolis / cloud-kernel

02 5月, 2019 1 次提交

net/tls: fix refcount adjustment in fallback · e97f0bc7

由 Jakub Kicinski 提交于 4月 17, 2019

[ Upstream commit 9188d5ca454fd665145904267e726e9e8d122f5c ]

Unlike atomic_add(), refcount_add() does not deal well
with a negative argument.  TLS fallback code reallocates
the skb and is very likely to shrink the truesize, leading to:

[  189.513254] WARNING: CPU: 5 PID: 0 at lib/refcount.c:81 refcount_add_not_zero_checked+0x15c/0x180
 Call Trace:
  refcount_add_checked+0x6/0x40
  tls_enc_skb+0xb93/0x13e0 [tls]

Once wmem_allocated count saturates the application can no longer
send data on the socket.  This is similar to Eric's fixes for GSO,
TCP:
commit 7ec318fe ("tcp: gso: avoid refcount_t warning from tcp_gso_segment()")
and UDP:
commit 575b65bc ("udp: avoid refcount_t saturation in __udp_gso_segment()").

Unlike the GSO case, for TLS fallback it's likely that the skb has
shrunk, so the "likely" annotation is the other way around (likely
branch being "sub").

Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NJohn Hurley <john.hurley@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

e97f0bc7

13 1月, 2019 1 次提交

net/tls: Init routines in create_ctx · 997cb503

由 Atul Gupta 提交于 12月 11, 2018

[ Upstream commit 6c0563e442528733219afe15c749eb2cc365da3f ]

create_ctx is called from tls_init and tls_hw_prot
hence initialize function pointers in common routine.
Signed-off-by: NAtul Gupta <atul.gupta@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>

997cb503

10 1月, 2019 1 次提交

net/tls: allocate tls context using GFP_ATOMIC · f624d95c

由 Ganesh Goudar 提交于 12月 19, 2018

[ Upstream commit c6ec179a0082e2e76e3a72050c2b99d3d0f3da3f ]

create_ctx can be called from atomic context, hence use
GFP_ATOMIC instead of GFP_KERNEL.

[  395.962599] BUG: sleeping function called from invalid context at mm/slab.h:421
[  395.979896] in_atomic(): 1, irqs_disabled(): 0, pid: 16254, name: openssl
[  395.996564] 2 locks held by openssl/16254:
[  396.010492]  #0: 00000000347acb52 (sk_lock-AF_INET){+.+.}, at: do_tcp_setsockopt.isra.44+0x13b/0x9a0
[  396.029838]  #1: 000000006c9552b5 (device_spinlock){+...}, at: tls_init+0x1d/0x280
[  396.047675] CPU: 5 PID: 16254 Comm: openssl Tainted: G           O      4.20.0-rc6+ #25
[  396.066019] Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0c 09/25/2017
[  396.083537] Call Trace:
[  396.096265]  dump_stack+0x5e/0x8b
[  396.109876]  ___might_sleep+0x216/0x250
[  396.123940]  kmem_cache_alloc_trace+0x1b0/0x240
[  396.138800]  create_ctx+0x1f/0x60
[  396.152504]  tls_init+0xbd/0x280
[  396.166135]  tcp_set_ulp+0x191/0x2d0
[  396.180035]  ? tcp_set_ulp+0x2c/0x2d0
[  396.193960]  do_tcp_setsockopt.isra.44+0x148/0x9a0
[  396.209013]  __sys_setsockopt+0x7c/0xe0
[  396.223054]  __x64_sys_setsockopt+0x20/0x30
[  396.237378]  do_syscall_64+0x4a/0x180
[  396.251200]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: df9d4a178022 ("net/tls: sleeping function from invalid context")
Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

f624d95c

17 9月, 2018 1 次提交

tls: fix currently broken MSG_PEEK behavior · 50c6b58a

由 Daniel Borkmann 提交于 9月 14, 2018

In kTLS MSG_PEEK behavior is currently failing, strace example:

  [pid  2430] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
  [pid  2430] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4
  [pid  2430] bind(4, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
  [pid  2430] listen(4, 10)               = 0
  [pid  2430] getsockname(4, {sa_family=AF_INET, sin_port=htons(38855), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0
  [pid  2430] connect(3, {sa_family=AF_INET, sin_port=htons(38855), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
  [pid  2430] setsockopt(3, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
  [pid  2430] setsockopt(3, 0x11a /* SOL_?? */, 1, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
  [pid  2430] accept(4, {sa_family=AF_INET, sin_port=htons(49636), sin_addr=inet_addr("127.0.0.1")}, [16]) = 5
  [pid  2430] setsockopt(5, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
  [pid  2430] setsockopt(5, 0x11a /* SOL_?? */, 2, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
  [pid  2430] close(4)                    = 0
  [pid  2430] sendto(3, "test_read_peek", 14, 0, NULL, 0) = 14
  [pid  2430] sendto(3, "_mult_recs\0", 11, 0, NULL, 0) = 11
  [pid  2430] recvfrom(5, "test_read_peektest_read_peektest"..., 64, MSG_PEEK, NULL, NULL) = 64

As can be seen from strace, there are two TLS records sent,
i) 'test_read_peek' and ii) '_mult_recs\0' where we end up
peeking 'test_read_peektest_read_peektest'. This is clearly
wrong, and what happens is that given peek cannot call into
tls_sw_advance_skb() to unpause strparser and proceed with
the next skb, we end up looping over the current one, copying
the 'test_read_peek' over and over into the user provided
buffer.

Here, we can only peek into the currently held skb (current,
full TLS record) as otherwise we would end up having to hold
all the original skb(s) (depending on the peek depth) in a
separate queue when unpausing strparser to process next
records, minimally intrusive is to return only up to the
current record's size (which likely was what c46234eb
("tls: RX path for ktls") originally intended as well). Thus,
after patch we properly peek the first record:

  [pid  2046] wait4(2075,  <unfinished ...>
  [pid  2075] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
  [pid  2075] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4
  [pid  2075] bind(4, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
  [pid  2075] listen(4, 10)               = 0
  [pid  2075] getsockname(4, {sa_family=AF_INET, sin_port=htons(55115), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0
  [pid  2075] connect(3, {sa_family=AF_INET, sin_port=htons(55115), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
  [pid  2075] setsockopt(3, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
  [pid  2075] setsockopt(3, 0x11a /* SOL_?? */, 1, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
  [pid  2075] accept(4, {sa_family=AF_INET, sin_port=htons(45732), sin_addr=inet_addr("127.0.0.1")}, [16]) = 5
  [pid  2075] setsockopt(5, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
  [pid  2075] setsockopt(5, 0x11a /* SOL_?? */, 2, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
  [pid  2075] close(4)                    = 0
  [pid  2075] sendto(3, "test_read_peek", 14, 0, NULL, 0) = 14
  [pid  2075] sendto(3, "_mult_recs\0", 11, 0, NULL, 0) = 11
  [pid  2075] recvfrom(5, "test_read_peek", 64, MSG_PEEK, NULL, NULL) = 14

Fixes: c46234eb ("tls: RX path for ktls")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

50c6b58a

14 9月, 2018 3 次提交

tls: clear key material from kernel memory when do_tls_setsockopt_conf fails · c844eb46

由 Sabrina Dubroca 提交于 9月 12, 2018

Fixes: 3c4d7559 ("tls: kernel TLS support")
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c844eb46

tls: zero the crypto information from tls_context before freeing · 86029d10

由 Sabrina Dubroca 提交于 9月 12, 2018

This contains key material in crypto_send_aes_gcm_128 and
crypto_recv_aes_gcm_128.

Introduce union tls_crypto_context, and replace the two identical
unions directly embedded in struct tls_context with it. We can then
use this union to clean up the memory in the new tls_ctx_free()
function.

Fixes: 3c4d7559 ("tls: kernel TLS support")
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86029d10

tls: don't copy the key out of tls12_crypto_info_aes_gcm_128 · 7cba09c6

由 Sabrina Dubroca 提交于 9月 12, 2018

There's no need to copy the key to an on-stack buffer before calling
crypto_aead_setkey().

Fixes: 3c4d7559 ("tls: kernel TLS support")
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7cba09c6

09 9月, 2018 1 次提交

net/tls: Set count of SG entries if sk_alloc_sg returns -ENOSPC · 52ea992c

由 Vakul Garg 提交于 9月 06, 2018

tls_sw_sendmsg() allocates plaintext and encrypted SG entries using
function sk_alloc_sg(). In case the number of SG entries hit
MAX_SKB_FRAGS, sk_alloc_sg() returns -ENOSPC and sets the variable for
current SG index to '0'. This leads to calling of function
tls_push_record() with 'sg_encrypted_num_elem = 0' and later causes
kernel crash. To fix this, set the number of SG elements to the number
of elements in plaintext/encrypted SG arrays in case sk_alloc_sg()
returns -ENOSPC.

Fixes: 3c4d7559 ("tls: kernel TLS support")
Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52ea992c

23 8月, 2018 1 次提交

tls: possible hang when do_tcp_sendpages hits sndbuf is full case · 67db7cd2

由 John Fastabend 提交于 8月 22, 2018

Currently, the lower protocols sk_write_space handler is not called if
TLS is sending a scatterlist via  tls_push_sg. However, normally
tls_push_sg calls do_tcp_sendpage, which may be under memory pressure,
that in turn may trigger a wait via sk_wait_event. Typically, this
happens when the in-flight bytes exceed the sdnbuf size. In the normal
case when enough ACKs are received sk_write_space() will be called and
the sk_wait_event will be woken up allowing it to send more data
and/or return to the user.

But, in the TLS case because the sk_write_space() handler does not
wake up the events the above send will wait until the sndtimeo is
exceeded. By default this is MAX_SCHEDULE_TIMEOUT so it look like a
hang to the user (especially this impatient user). To fix this pass
the sk_write_space event to the lower layers sk_write_space event
which in the TCP case will wake any pending events.

I observed the above while integrating sockmap and ktls. It
initially appeared as test_sockmap (modified to use ktls) occasionally
hanging. To reliably reproduce this reduce the sndbuf size and stress
the tls layer by sending many 1B sends. This results in every byte
needing a header and each byte individually being sent to the crypto
layer.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NDave Watson <davejwatson@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

67db7cd2

17 8月, 2018 1 次提交

tcp, ulp: add alias for all ulp modules · 037b0b86

由 Daniel Borkmann 提交于 8月 16, 2018

Lets not turn the TCP ULP lookup into an arbitrary module loader as
we only intend to load ULP modules through this mechanism, not other
unrelated kernel modules:

  [root@bar]# cat foo.c
  #include <sys/types.h>
  #include <sys/socket.h>
  #include <linux/tcp.h>
  #include <linux/in.h>

  int main(void)
  {
      int sock = socket(PF_INET, SOCK_STREAM, 0);
      setsockopt(sock, IPPROTO_TCP, TCP_ULP, "sctp", sizeof("sctp"));
      return 0;
  }

  [root@bar]# gcc foo.c -O2 -Wall
  [root@bar]# lsmod | grep sctp
  [root@bar]# ./a.out
  [root@bar]# lsmod | grep sctp
  sctp                 1077248  4
  libcrc32c              16384  3 nf_conntrack,nf_nat,sctp
  [root@bar]#

Fix it by adding module alias to TCP ULP modules, so probing module
via request_module() will be limited to tcp-ulp-[name]. The existing
modules like kTLS will load fine given tcp-ulp-tls alias, but others
will fail to load:

  [root@bar]# lsmod | grep sctp
  [root@bar]# ./a.out
  [root@bar]# lsmod | grep sctp
  [root@bar]#

Sockmap is not affected from this since it's either built-in or not.

Fixes: 734942cc ("tcp: ULP infrastructure")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

037b0b86

13 8月, 2018 1 次提交

net/tls: Combined memory allocation for decryption request · 0b243d00

由 Vakul Garg 提交于 8月 10, 2018

For preparing decryption request, several memory chunks are required
(aead_req, sgin, sgout, iv, aad). For submitting the decrypt request to
an accelerator, it is required that the buffers which are read by the
accelerator must be dma-able and not come from stack. The buffers for
aad and iv can be separately kmalloced each, but it is inefficient.
This patch does a combined allocation for preparing decryption request
and then segments into aead_req || sgin || sgout || iv || aad.
Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0b243d00

06 8月, 2018 1 次提交

net/tls: Mark the end in scatterlist table · cfb4099f

由 Vakul Garg 提交于 8月 02, 2018

Function zerocopy_from_iter() unmarks the 'end' in input sgtable while
adding new entries in it. The last entry in sgtable remained unmarked.
This results in KASAN error report on using apis like sg_nents(). Before
returning, the function needs to mark the 'end' in the last entry it
adds.
Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
Acked-by: NDave Watson <davejwatson@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cfb4099f

03 8月, 2018 1 次提交

crypto: scatterwalk - remove 'chain' argument from scatterwalk_crypto_chain() · 8c30fbe6

由 Eric Biggers 提交于 7月 23, 2018

All callers pass chain=0 to scatterwalk_crypto_chain().

Remove this unneeded parameter.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

8c30fbe6

02 8月, 2018 1 次提交

net/tls: Use kmemdup to simplify the code · 969d5090

由 zhong jiang 提交于 8月 01, 2018

Kmemdup is better than kmalloc+memcpy. So replace them.
Signed-off-by: Nzhong jiang <zhongjiang@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

969d5090

31 7月, 2018 1 次提交

net/tls: Use socket data_ready callback on record availability · ad13acce

由 Vakul Garg 提交于 7月 30, 2018

On receipt of a complete tls record, use socket's saved data_ready
callback instead of state_change callback. In function tls_queue(),
the TLS record is queued in encrypted state. But the decryption
happen inline when tls_sw_recvmsg() or tls_sw_splice_read() get invoked.
So it should be ok to notify the waiting context about the availability
of data as soon as we could collect a full TLS record. For new data
availability notification, sk_data_ready callback is more appropriate.
It points to sock_def_readable() which wakes up specifically for EPOLLIN
event. This is in contrast to the socket callback sk_state_change which
points to sock_def_wakeup() which issues a wakeup unconditionally
(without event mask).
Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad13acce

29 7月, 2018 2 次提交

tls: Fix improper revert in zerocopy_from_iter · 2da19ed3

由 Doron Roberts-Kedes 提交于 7月 26, 2018

The current code is problematic because the iov_iter is reverted and
never advanced in the non-error case. This patch skips the revert in the
non-error case. This patch also fixes the amount by which the iov_iter
is reverted. Currently, iov_iter is reverted by size, which can be
greater than the amount by which the iter was actually advanced.
Instead, only revert by the amount that the iter was advanced.

Fixes: 47187998 ("tls: Fix zerocopy_from_iter iov handling")
Signed-off-by: NDoron Roberts-Kedes <doronrk@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2da19ed3

tls: Remove dead code in tls_sw_sendmsg · 5a3611ef

由 Doron Roberts-Kedes 提交于 7月 26, 2018

tls_push_record either returns 0 on success or a negative value on failure.
This patch removes code that would only be executed if tls_push_record
were to return a positive value.
Signed-off-by: NDoron Roberts-Kedes <doronrk@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a3611ef

27 7月, 2018 2 次提交

tls: Skip zerocopy path for ITER_KVEC · 0a26cf3f

由 Doron Roberts-Kedes 提交于 7月 25, 2018

The zerocopy path ultimately calls iov_iter_get_pages, which defines the
step function for ITER_KVECs as simply, return -EFAULT. Taking the
non-zerocopy path for ITER_KVECs avoids the unnecessary fallback.

See https://lore.kernel.org/lkml/20150401023311.GL29656@ZenIV.linux.org.uk/T/#u
for a discussion of why zerocopy for vmalloc data is not a good idea.

Discovered while testing NBD traffic encrypted with ktls.

Fixes: c46234eb ("tls: RX path for ktls")
Signed-off-by: NDoron Roberts-Kedes <doronrk@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a26cf3f

net/tls: Removed redundant checks for non-NULL · 201876b3

由 Vakul Garg 提交于 7月 24, 2018

Removed checks against non-NULL before calling kfree_skb() and
crypto_free_aead(). These functions are safe to be called with NULL
as an argument.
Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
Acked-by: NDave Watson <davejwatson@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

201876b3

21 7月, 2018 2 次提交

tls: check RCV_SHUTDOWN in tls_wait_data · fcf4793e

由 Doron Roberts-Kedes 提交于 7月 18, 2018

The current code does not check sk->sk_shutdown & RCV_SHUTDOWN.
tls_sw_recvmsg may return a positive value in the case where bytes have
already been copied when the socket is shutdown. sk->sk_err has been
cleared, causing the tls_wait_data to hang forever on a subsequent
invocation. Checking sk->sk_shutdown & RCV_SHUTDOWN, as in tcp_recvmsg,
fixes this problem.

Fixes: c46234eb ("tls: RX path for ktls")
Acked-by: NDave Watson <davejwatson@fb.com>
Signed-off-by: NDoron Roberts-Kedes <doronrk@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fcf4793e

tls: Fix copy-paste error in tls_device_reencrypt · eecd6857

由 Gustavo A. R. Silva 提交于 7月 18, 2018

It seems that the proper structure to use in this particular
case is *skb_iter* instead of skb.

Addresses-Coverity-ID: 1471906 ("Copy-paste error")
Fixes: 4799ac81 ("tls: Add rx inline crypto offload")
Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eecd6857

17 7月, 2018 1 次提交

tls: Stricter error checking in zerocopy sendmsg path · 32da1221

由 Dave Watson 提交于 7月 12, 2018

In the zerocopy sendmsg() path, there are error checks to revert
the zerocopy if we get any error code.  syzkaller has discovered
that tls_push_record can return -ECONNRESET, which is fatal, and
happens after the point at which it is safe to revert the iter,
as we've already passed the memory to do_tcp_sendpages.

Previously this code could return -ENOMEM and we would want to
revert the iter, but AFAIK this no longer returns ENOMEM after
a447da7d ("tls: fix waitall behavior in tls_sw_recvmsg"),
so we fail for all error codes.

Reported-by: syzbot+c226690f7b3126c5ee04@syzkaller.appspotmail.com
Reported-by: syzbot+709f2810a6a05f11d4d3@syzkaller.appspotmail.com
Signed-off-by: NDave Watson <davejwatson@fb.com>
Fixes: 3c4d7559 ("tls: kernel TLS support")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

32da1221

16 7月, 2018 6 次提交

tls: Fix zerocopy_from_iter iov handling · 47187998

由 Boris Pismenny 提交于 7月 13, 2018

zerocopy_from_iter iterates over the message, but it doesn't revert the
updates made by the iov iteration. This patch fixes it. Now, the iov can
be used after calling zerocopy_from_iter.

Fixes: 3c4d7559 ("tls: kernel TLS support")
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

47187998

tls: Add rx inline crypto offload · 4799ac81

由 Boris Pismenny 提交于 7月 13, 2018

This patch completes the generic infrastructure to offload TLS crypto to a
network device. It enables the kernel to skip decryption and
authentication of some skbs marked as decrypted by the NIC. In the fast
path, all packets received are decrypted by the NIC and the performance
is comparable to plain TCP.

This infrastructure doesn't require a TCP offload engine. Instead, the
NIC only decrypts packets that contain the expected TCP sequence number.
Out-Of-Order TCP packets are provided unmodified. As a result, at the
worst case a received TLS record consists of both plaintext and ciphertext
packets. These partially decrypted records must be reencrypted,
only to be decrypted.

The notable differences between SW KTLS Rx and this offload are as
follows:
1. Partial decryption - Software must handle the case of a TLS record
that was only partially decrypted by HW. This can happen due to packet
reordering.
2. Resynchronization - tls_read_size calls the device driver to
resynchronize HW after HW lost track of TLS record framing in
the TCP stream.
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4799ac81

tls: Fill software context without allocation · b190a587

由 Boris Pismenny 提交于 7月 13, 2018

This patch allows tls_set_sw_offload to fill the context in case it was
already allocated previously.

We will use it in TLS_DEVICE to fill the RX software context.
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b190a587

tls: Split tls_sw_release_resources_rx · 39f56e1a

由 Boris Pismenny 提交于 7月 13, 2018

This patch splits tls_sw_release_resources_rx into two functions one
which releases all inner software tls structures and another that also
frees the containing structure.

In TLS_DEVICE we will need to release the software structures without
freeeing the containing structure, which contains other information.
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

39f56e1a

tls: Split decrypt_skb to two functions · dafb67f3

由 Boris Pismenny 提交于 7月 13, 2018

Previously, decrypt_skb also updated the TLS context.
Now, decrypt_skb only decrypts the payload using the current context,
while decrypt_skb_update also updates the state.

Later, in the tls_device Rx flow, we will use decrypt_skb directly.
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dafb67f3

tls: Refactor tls_offload variable names · d80a1b9d

由 Boris Pismenny 提交于 7月 13, 2018

For symmetry, we rename tls_offload_context to
tls_offload_context_tx before we add tls_offload_context_rx.
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d80a1b9d

13 7月, 2018 1 次提交

net/tls: Use aead_request_alloc/free for request alloc/free · d2bdd268

由 Vakul Garg 提交于 7月 11, 2018

Instead of kzalloc/free for aead_request allocation and free, use
functions aead_request_alloc(), aead_request_free(). It ensures that
any sensitive crypto material held in crypto transforms is securely
erased from memory.
Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
Acked-by: NDave Watson <davejwatson@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d2bdd268

03 7月, 2018 1 次提交

tls: fix skb_to_sgvec returning unhandled error. · 52ee6ef3

由 Doron Roberts-Kedes 提交于 7月 02, 2018

The current code does not inspect the return value of skb_to_sgvec. This
can cause a nullptr kernel panic when the malformed sgvec is passed into
the crypto request.

Checking the return value of skb_to_sgvec and skipping decryption if it
is negative fixes this problem.

Fixes: c46234eb ("tls: RX path for ktls")
Acked-by: NDave Watson <davejwatson@fb.com>
Signed-off-by: NDoron Roberts-Kedes <doronrk@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52ee6ef3

29 6月, 2018 1 次提交

Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL · a11e1d43

由 Linus Torvalds 提交于 6月 28, 2018

The poll() changes were not well thought out, and completely
unexplained.  They also caused a huge performance regression, because
"->poll()" was no longer a trivial file operation that just called down
to the underlying file operations, but instead did at least two indirect
calls.

Indirect calls are sadly slow now with the Spectre mitigation, but the
performance problem could at least be largely mitigated by changing the
"->get_poll_head()" operation to just have a per-file-descriptor pointer
to the poll head instead.  That gets rid of one of the new indirections.

But that doesn't fix the new complexity that is completely unwarranted
for the regular case.  The (undocumented) reason for the poll() changes
was some alleged AIO poll race fixing, but we don't make the common case
slower and more complex for some uncommon special case, so this all
really needs way more explanations and most likely a fundamental
redesign.

[ This revert is a revert of about 30 different commits, not reverted
  individually because that would just be unnecessarily messy  - Linus ]

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a11e1d43

27 6月, 2018 1 次提交

net/tls: Remove VLA usage on nonce · 3463e51d

由 Kees Cook 提交于 6月 25, 2018

It looks like the prior VLA removal, commit b16520f7 ("net/tls: Remove
VLA usage"), and a new VLA addition, commit c46234eb ("tls: RX path
for ktls"), passed in the night. This removes the newly added VLA, which
happens to have its bounds based on the same max value.
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3463e51d

24 6月, 2018 1 次提交

tls: Removed unused variable · 0ef8b456

由 Vakul Garg 提交于 6月 25, 2018

Removed unused variable 'rxm' from tls_queue().
Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ef8b456

16 6月, 2018 2 次提交

tls: fix waitall behavior in tls_sw_recvmsg · 06030dba

由 Daniel Borkmann 提交于 6月 15, 2018

Current behavior in tls_sw_recvmsg() is to wait for incoming tls
messages and copy up to exactly len bytes of data that the user
provided. This is problematic in the sense that i) if no packet
is currently queued in strparser we keep waiting until one has been
processed and pushed into tls receive layer for tls_wait_data() to
wake up and push the decrypted bits to user space. Given after
tls decryption, we're back at streaming data, use sock_rcvlowat()
hint from tcp socket instead. Retain current behavior with MSG_WAITALL
flag and otherwise use the hint target for breaking the loop and
returning to application. This is done if currently no ctx->recv_pkt
is ready, otherwise continue to process it from our strparser
backlog.

Fixes: c46234eb ("tls: RX path for ktls")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NDave Watson <davejwatson@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

06030dba

tls: fix use-after-free in tls_push_record · a447da7d

由 Daniel Borkmann 提交于 6月 15, 2018

syzkaller managed to trigger a use-after-free in tls like the
following:

  BUG: KASAN: use-after-free in tls_push_record.constprop.15+0x6a2/0x810 [tls]
  Write of size 1 at addr ffff88037aa08000 by task a.out/2317

  CPU: 3 PID: 2317 Comm: a.out Not tainted 4.17.0+ #144
  Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016
  Call Trace:
   dump_stack+0x71/0xab
   print_address_description+0x6a/0x280
   kasan_report+0x258/0x380
   ? tls_push_record.constprop.15+0x6a2/0x810 [tls]
   tls_push_record.constprop.15+0x6a2/0x810 [tls]
   tls_sw_push_pending_record+0x2e/0x40 [tls]
   tls_sk_proto_close+0x3fe/0x710 [tls]
   ? tcp_check_oom+0x4c0/0x4c0
   ? tls_write_space+0x260/0x260 [tls]
   ? kmem_cache_free+0x88/0x1f0
   inet_release+0xd6/0x1b0
   __sock_release+0xc0/0x240
   sock_close+0x11/0x20
   __fput+0x22d/0x660
   task_work_run+0x114/0x1a0
   do_exit+0x71a/0x2780
   ? mm_update_next_owner+0x650/0x650
   ? handle_mm_fault+0x2f5/0x5f0
   ? __do_page_fault+0x44f/0xa50
   ? mm_fault_error+0x2d0/0x2d0
   do_group_exit+0xde/0x300
   __x64_sys_exit_group+0x3a/0x50
   do_syscall_64+0x9a/0x300
   ? page_fault+0x8/0x30
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

This happened through fault injection where aead_req allocation in
tls_do_encryption() eventually failed and we returned -ENOMEM from
the function. Turns out that the use-after-free is triggered from
tls_sw_sendmsg() in the second tls_push_record(). The error then
triggers a jump to waiting for memory in sk_stream_wait_memory()
resp. returning immediately in case of MSG_DONTWAIT. What follows is
the trim_both_sgl(sk, orig_size), which drops elements from the sg
list added via tls_sw_sendmsg(). Now the use-after-free gets triggered
when the socket is being closed, where tls_sk_proto_close() callback
is invoked. The tls_complete_pending_work() will figure that there's
a pending closed tls record to be flushed and thus calls into the
tls_push_pending_closed_record() from there. ctx->push_pending_record()
is called from the latter, which is the tls_sw_push_pending_record()
from sw path. This again calls into tls_push_record(). And here the
tls_fill_prepend() will panic since the buffer address has been freed
earlier via trim_both_sgl(). One way to fix it is to move the aead
request allocation out of tls_do_encryption() early into tls_push_record().
This means we don't prep the tls header and advance state to the
TLS_PENDING_CLOSED_RECORD before allocation which could potentially
fail happened. That fixes the issue on my side.

Fixes: 3c4d7559 ("tls: kernel TLS support")
Reported-by: syzbot+5c74af81c547738e1684@syzkaller.appspotmail.com
Reported-by: syzbot+709f2810a6a05f11d4d3@syzkaller.appspotmail.com
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NDave Watson <davejwatson@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a447da7d

12 6月, 2018 1 次提交

tls: fix NULL pointer dereference on poll · f6fadff3

由 Daniel Borkmann 提交于 6月 11, 2018

While hacking on kTLS, I ran into the following panic from an
unprivileged netserver / netperf TCP session:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
  PGD 800000037f378067 P4D 800000037f378067 PUD 3c0e61067 PMD 0
  Oops: 0010 [#1] SMP KASAN PTI
  CPU: 1 PID: 2289 Comm: netserver Not tainted 4.17.0+ #139
  Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016
  RIP: 0010:          (null)
  Code: Bad RIP value.
  RSP: 0018:ffff88036abcf740 EFLAGS: 00010246
  RAX: dffffc0000000000 RBX: ffff88036f5f6800 RCX: 1ffff1006debed26
  RDX: ffff88036abcf920 RSI: ffff8803cb1a4f00 RDI: ffff8803c258c280
  RBP: ffff8803c258c280 R08: ffff8803c258c280 R09: ffffed006f559d48
  R10: ffff88037aacea43 R11: ffffed006f559d49 R12: ffff8803c258c280
  R13: ffff8803cb1a4f20 R14: 00000000000000db R15: ffffffffc168a350
  FS:  00007f7e631f4700(0000) GS:ffff8803d1c80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffffffffffffffd6 CR3: 00000003ccf64005 CR4: 00000000003606e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   ? tls_sw_poll+0xa4/0x160 [tls]
   ? sock_poll+0x20a/0x680
   ? do_select+0x77b/0x11a0
   ? poll_schedule_timeout.constprop.12+0x130/0x130
   ? pick_link+0xb00/0xb00
   ? read_word_at_a_time+0x13/0x20
   ? vfs_poll+0x270/0x270
   ? deref_stack_reg+0xad/0xe0
   ? __read_once_size_nocheck.constprop.6+0x10/0x10
  [...]

Debugging further, it turns out that calling into ctx->sk_poll() is
invalid since sk_poll itself is NULL which was saved from the original
TCP socket in order for tls_sw_poll() to invoke it.

Looks like the recent conversion from poll to poll_mask callback started
in 15252423 ("net: add support for ->poll_mask in proto_ops") missed
to eventually convert kTLS, too: TCP's ->poll was converted over to the
->poll_mask in commit 2c7d3dac ("net/tcp: convert to ->poll_mask")
and therefore kTLS wrongly saved the ->poll old one which is now NULL.

Convert kTLS over to use ->poll_mask instead. Also instead of POLLIN |
POLLRDNORM use the proper EPOLLIN | EPOLLRDNORM bits as the case in
tcp_poll_mask() as well that is mangled here.

Fixes: 2c7d3dac ("net/tcp: convert to ->poll_mask")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Watson <davejwatson@fb.com>
Tested-by: NDave Watson <davejwatson@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6fadff3

07 6月, 2018 1 次提交

strparser: Add __strp_unpause and use it in ktls. · 7170e604

由 Doron Roberts-Kedes 提交于 6月 06, 2018

strp_unpause queues strp_work in order to parse any messages that
arrived while the strparser was paused. However, the process invoking
strp_unpause could eagerly parse a buffered message itself if it held
the sock lock.

__strp_unpause is an alternative to strp_pause that avoids the scheduling
overhead that results when a receiving thread unpauses the strparser
and waits for the next message to be delivered by the workqueue thread.

This patch more than doubled the IOPS achieved in a benchmark of NBD
traffic encrypted using ktls.
Signed-off-by: NDoron Roberts-Kedes <doronrk@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7170e604

18 5月, 2018 1 次提交

tls: don't use stack memory in a scatterlist · 8ab6ffba

由 Matt Mullins 提交于 5月 16, 2018

scatterlist code expects virt_to_page() to work, which fails with
CONFIG_VMAP_STACK=y.

Fixes: c46234eb ("tls: RX path for ktls")
Signed-off-by: NMatt Mullins <mmullins@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8ab6ffba

11 5月, 2018 1 次提交

tls: Fix tls_device initialization · 895262d8

由 Boris Pismenny 提交于 5月 10, 2018

Add sg table initialization to fix a BUG_ON encountered when enabling
CONFIG_DEBUG_SG.
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

895262d8

08 5月, 2018 1 次提交

net/tls: Fix connection stall on partial tls record · 080324c3

由 Andre Tomt 提交于 5月 07, 2018

In the case of writing a partial tls record we forgot to clear the
ctx->in_tcp_sendpages flag, causing some connections to stall.

Fixes: c212d2c7 ("net/tls: Don't recursively call push_record during tls_write_space callbacks")
Signed-off-by: NAndre Tomt <andre@tomt.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

080324c3

openanolis / cloud-kernel 大约 2 年 前同步成功

openanolis / cloud-kernel
大约 2 年前同步成功