1. 07 6月, 2019 2 次提交
  2. 05 6月, 2019 2 次提交
  3. 28 4月, 2019 3 次提交
  4. 11 4月, 2019 2 次提交
  5. 21 3月, 2019 1 次提交
    • V
      net/tls: Add support of AES128-CCM based ciphers · f295b3ae
      Vakul Garg 提交于
      Added support for AES128-CCM based record encryption. AES128-CCM is
      similar to AES128-GCM. Both of them have same salt/iv/mac size. The
      notable difference between the two is that while invoking AES128-CCM
      operation, the salt||nonce (which is passed as IV) has to be prefixed
      with a hardcoded value '2'. Further, CCM implementation in kernel
      requires IV passed in crypto_aead_request() to be full '16' bytes.
      Therefore, the record structure 'struct tls_rec' has been modified to
      reserve '16' bytes for IV. This works for both GCM and CCM based cipher.
      Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f295b3ae
  6. 04 3月, 2019 2 次提交
    • B
      tls: Fix write space handling · 7463d3a2
      Boris Pismenny 提交于
      TLS device cannot use the sw context. This patch returns the original
      tls device write space handler and moves the sw/device specific portions
      to the relevant files.
      
      Also, we remove the write_space call for the tls_sw flow, because it
      handles partial records in its delayed tx work handler.
      
      Fixes: a42055e8 ("net/tls: Add support for async encryption of records for performance")
      Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
      Reviewed-by: NEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7463d3a2
    • B
      tls: Fix tls_device handling of partial records · 94850257
      Boris Pismenny 提交于
      Cleanup the handling of partial records while fixing a bug where the
      tls_push_pending_closed_record function is using the software tls
      context instead of the hardware context.
      
      The bug resulted in the following crash:
      [   88.791229] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      [   88.793271] #PF error: [normal kernel read fault]
      [   88.794449] PGD 800000022a426067 P4D 800000022a426067 PUD 22a156067 PMD 0
      [   88.795958] Oops: 0000 [#1] SMP PTI
      [   88.796884] CPU: 2 PID: 4973 Comm: openssl Not tainted 5.0.0-rc4+ #3
      [   88.798314] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      [   88.800067] RIP: 0010:tls_tx_records+0xef/0x1d0 [tls]
      [   88.801256] Code: 00 02 48 89 43 08 e8 a0 0b 96 d9 48 89 df e8 48 dd
      4d d9 4c 89 f8 4d 8b bf 98 00 00 00 48 05 98 00 00 00 48 89 04 24 49 39
      c7 <49> 8b 1f 4d 89 fd 0f 84 af 00 00 00 41 8b 47 10 85 c0 0f 85 8d 00
      [   88.805179] RSP: 0018:ffffbd888186fca8 EFLAGS: 00010213
      [   88.806458] RAX: ffff9af1ed657c98 RBX: ffff9af1e88a1980 RCX: 0000000000000000
      [   88.808050] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9af1e88a1980
      [   88.809724] RBP: ffff9af1e88a1980 R08: 0000000000000017 R09: ffff9af1ebeeb700
      [   88.811294] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      [   88.812917] R13: ffff9af1e88a1980 R14: ffff9af1ec13f800 R15: 0000000000000000
      [   88.814506] FS:  00007fcad2240740(0000) GS:ffff9af1f7880000(0000) knlGS:0000000000000000
      [   88.816337] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   88.817717] CR2: 0000000000000000 CR3: 0000000228b3e000 CR4: 00000000001406e0
      [   88.819328] Call Trace:
      [   88.820123]  tls_push_data+0x628/0x6a0 [tls]
      [   88.821283]  ? remove_wait_queue+0x20/0x60
      [   88.822383]  ? n_tty_read+0x683/0x910
      [   88.823363]  tls_device_sendmsg+0x53/0xa0 [tls]
      [   88.824505]  sock_sendmsg+0x36/0x50
      [   88.825492]  sock_write_iter+0x87/0x100
      [   88.826521]  __vfs_write+0x127/0x1b0
      [   88.827499]  vfs_write+0xad/0x1b0
      [   88.828454]  ksys_write+0x52/0xc0
      [   88.829378]  do_syscall_64+0x5b/0x180
      [   88.830369]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   88.831603] RIP: 0033:0x7fcad1451680
      
      [ 1248.470626] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      [ 1248.472564] #PF error: [normal kernel read fault]
      [ 1248.473790] PGD 0 P4D 0
      [ 1248.474642] Oops: 0000 [#1] SMP PTI
      [ 1248.475651] CPU: 3 PID: 7197 Comm: openssl Tainted: G           OE 5.0.0-rc4+ #3
      [ 1248.477426] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      [ 1248.479310] RIP: 0010:tls_tx_records+0x110/0x1f0 [tls]
      [ 1248.480644] Code: 00 02 48 89 43 08 e8 4f cb 63 d7 48 89 df e8 f7 9c
      1b d7 4c 89 f8 4d 8b bf 98 00 00 00 48 05 98 00 00 00 48 89 04 24 49 39
      c7 <49> 8b 1f 4d 89 fd 0f 84 af 00 00 00 41 8b 47 10 85 c0 0f 85 8d 00
      [ 1248.484825] RSP: 0018:ffffaa0a41543c08 EFLAGS: 00010213
      [ 1248.486154] RAX: ffff955a2755dc98 RBX: ffff955a36031980 RCX: 0000000000000006
      [ 1248.487855] RDX: 0000000000000000 RSI: 000000000000002b RDI: 0000000000000286
      [ 1248.489524] RBP: ffff955a36031980 R08: 0000000000000000 R09: 00000000000002b1
      [ 1248.491394] R10: 0000000000000003 R11: 00000000ad55ad55 R12: 0000000000000000
      [ 1248.493162] R13: 0000000000000000 R14: ffff955a2abe6c00 R15: 0000000000000000
      [ 1248.494923] FS:  0000000000000000(0000) GS:ffff955a378c0000(0000) knlGS:0000000000000000
      [ 1248.496847] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1248.498357] CR2: 0000000000000000 CR3: 000000020c40e000 CR4: 00000000001406e0
      [ 1248.500136] Call Trace:
      [ 1248.500998]  ? tcp_check_oom+0xd0/0xd0
      [ 1248.502106]  tls_sk_proto_close+0x127/0x1e0 [tls]
      [ 1248.503411]  inet_release+0x3c/0x60
      [ 1248.504530]  __sock_release+0x3d/0xb0
      [ 1248.505611]  sock_close+0x11/0x20
      [ 1248.506612]  __fput+0xb4/0x220
      [ 1248.507559]  task_work_run+0x88/0xa0
      [ 1248.508617]  do_exit+0x2cb/0xbc0
      [ 1248.509597]  ? core_sys_select+0x17a/0x280
      [ 1248.510740]  do_group_exit+0x39/0xb0
      [ 1248.511789]  get_signal+0x1d0/0x630
      [ 1248.512823]  do_signal+0x36/0x620
      [ 1248.513822]  exit_to_usermode_loop+0x5c/0xc6
      [ 1248.515003]  do_syscall_64+0x157/0x180
      [ 1248.516094]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [ 1248.517456] RIP: 0033:0x7fb398bd3f53
      [ 1248.518537] Code: Bad RIP value.
      
      Fixes: a42055e8 ("net/tls: Add support for async encryption of records for performance")
      Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      94850257
  7. 25 2月, 2019 1 次提交
    • V
      tls: Return type of non-data records retrieved using MSG_PEEK in recvmsg · 2b794c40
      Vakul Garg 提交于
      The patch enables returning 'type' in msghdr for records that are
      retrieved with MSG_PEEK in recvmsg. Further it prevents records peeked
      from socket from getting clubbed with any other record of different
      type when records are subsequently dequeued from strparser.
      
      For each record, we now retain its type in sk_buff's control buffer
      cb[]. Inside control buffer, record's full length and offset are already
      stored by strparser in 'struct strp_msg'. We store record type after
      'struct strp_msg' inside 'struct tls_msg'. For tls1.2, the type is
      stored just after record dequeue. For tls1.3, the type is stored after
      record has been decrypted.
      
      Inside process_rx_list(), before processing a non-data record, we check
      that we must be able to return back the record type to the user
      application. If not, the decrypted records in tls context's rx_list is
      left there without consuming any data.
      
      Fixes: 692d7b5d ("tls: Fix recvmsg() to be able to peek across multiple records")
      Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b794c40
  8. 20 2月, 2019 1 次提交
  9. 02 2月, 2019 4 次提交
  10. 29 1月, 2019 1 次提交
    • D
      net: tls: Save iv in tls_rec for async crypto requests · 32eb67b9
      Dave Watson 提交于
      aead_request_set_crypt takes an iv pointer, and we change the iv
      soon after setting it.  Some async crypto algorithms don't save the iv,
      so we need to save it in the tls_rec for async requests.
      
      Found by hardcoding x64 aesni to use async crypto manager (to test the async
      codepath), however I don't think this combination can happen in the wild.
      Presumably other hardware offloads will need this fix, but there have been
      no user reports.
      
      Fixes: a42055e8 ("Add support for async encryption of records...")
      Signed-off-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32eb67b9
  11. 18 1月, 2019 1 次提交
    • V
      tls: Fix recvmsg() to be able to peek across multiple records · 692d7b5d
      Vakul Garg 提交于
      This fixes recvmsg() to be able to peek across multiple tls records.
      Without this patch, the tls's selftests test case
      'recv_peek_large_buf_mult_recs' fails. Each tls receive context now
      maintains a 'rx_list' to retain incoming skb carrying tls records. If a
      tls record needs to be retained e.g. for peek case or for the case when
      the buffer passed to recvmsg() has a length smaller than decrypted
      record length, then it is added to 'rx_list'. Additionally, records are
      added in 'rx_list' if the crypto operation runs in async mode. The
      records are dequeued from 'rx_list' after the decrypted data is consumed
      by copying into the buffer passed to recvmsg(). In case, the MSG_PEEK
      flag is used in recvmsg(), then records are not consumed or removed
      from the 'rx_list'.
      Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      692d7b5d
  12. 21 12月, 2018 1 次提交
    • J
      bpf: sk_msg, sock{map|hash} redirect through ULP · 0608c69c
      John Fastabend 提交于
      A sockmap program that redirects through a kTLS ULP enabled socket
      will not work correctly because the ULP layer is skipped. This
      fixes the behavior to call through the ULP layer on redirect to
      ensure any operations required on the data stream at the ULP layer
      continue to be applied.
      
      To do this we add an internal flag MSG_SENDPAGE_NOPOLICY to avoid
      calling the BPF layer on a redirected message. This is
      required to avoid calling the BPF layer multiple times (possibly
      recursively) which is not the current/expected behavior without
      ULPs. In the future we may add a redirect flag if users _do_
      want the policy applied again but this would need to work for both
      ULP and non-ULP sockets and be opt-in to avoid breaking existing
      programs.
      
      Also to avoid polluting the flag space with an internal flag we
      reuse the flag space overlapping MSG_SENDPAGE_NOPOLICY with
      MSG_WAITFORONE. Here WAITFORONE is specific to recv path and
      SENDPAGE_NOPOLICY is only used for sendpage hooks. The last thing
      to verify is user space API is masked correctly to ensure the flag
      can not be set by user. (Note this needs to be true regardless
      because we have internal flags already in-use that user space
      should not be able to set). But for completeness we have two UAPI
      paths into sendpage, sendfile and splice.
      
      In the sendfile case the function do_sendfile() zero's flags,
      
      ./fs/read_write.c:
       static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
      		   	    size_t count, loff_t max)
       {
         ...
         fl = 0;
      #if 0
         /*
          * We need to debate whether we can enable this or not. The
          * man page documents EAGAIN return for the output at least,
          * and the application is arguably buggy if it doesn't expect
          * EAGAIN on a non-blocking file descriptor.
          */
          if (in.file->f_flags & O_NONBLOCK)
      	fl = SPLICE_F_NONBLOCK;
      #endif
          file_start_write(out.file);
          retval = do_splice_direct(in.file, &pos, out.file, &out_pos, count, fl);
       }
      
      In the splice case the pipe_to_sendpage "actor" is used which
      masks flags with SPLICE_F_MORE.
      
      ./fs/splice.c:
       static int pipe_to_sendpage(struct pipe_inode_info *pipe,
      			    struct pipe_buffer *buf, struct splice_desc *sd)
       {
         ...
         more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0;
         ...
       }
      
      Confirming what we expect that internal flags  are in fact internal
      to socket side.
      
      Fixes: d3b18ad3 ("tls: add bpf support to sk_msg handling")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      0608c69c
  13. 15 12月, 2018 1 次提交
    • A
      net/tls: sleeping function from invalid context · df9d4a17
      Atul Gupta 提交于
      HW unhash within mutex for registered tls devices cause sleep
      when called from tcp_set_state for TCP_CLOSE. Release lock and
      re-acquire after function call with ref count incr/dec.
      defined kref and fp release for tls_device to ensure device
      is not released outside lock.
      
      BUG: sleeping function called from invalid context at
      kernel/locking/mutex.c:748
      in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/7
      INFO: lockdep is turned off.
      CPU: 7 PID: 0 Comm: swapper/7 Tainted: G        W  O
      Call Trace:
       <IRQ>
       dump_stack+0x5e/0x8b
       ___might_sleep+0x222/0x260
       __mutex_lock+0x5c/0xa50
       ? vprintk_emit+0x1f3/0x440
       ? kmem_cache_free+0x22d/0x2a0
       ? tls_hw_unhash+0x2f/0x80
       ? printk+0x52/0x6e
       ? tls_hw_unhash+0x2f/0x80
       tls_hw_unhash+0x2f/0x80
       tcp_set_state+0x5f/0x180
       tcp_done+0x2e/0xe0
       tcp_rcv_state_process+0x92c/0xdd3
       ? lock_acquire+0xf5/0x1f0
       ? tcp_v4_rcv+0xa7c/0xbe0
       ? tcp_v4_do_rcv+0x70/0x1e0
      Signed-off-by: NAtul Gupta <atul.gupta@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df9d4a17
  14. 16 10月, 2018 2 次提交
    • J
      tls: replace poll implementation with read hook · 924ad65e
      John Fastabend 提交于
      Instead of re-implementing poll routine use the poll callback to
      trigger read from kTLS, we reuse the stream_memory_read callback
      which is simpler and achieves the same. This helps to align sockmap
      and kTLS so we can more easily embed BPF in kTLS.
      
      Joint work with Daniel.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      924ad65e
    • D
      tls: convert to generic sk_msg interface · d829e9c4
      Daniel Borkmann 提交于
      Convert kTLS over to make use of sk_msg interface for plaintext and
      encrypted scattergather data, so it reuses all the sk_msg helpers
      and data structure which later on in a second step enables to glue
      this to BPF.
      
      This also allows to remove quite a bit of open coded helpers which
      are covered by the sk_msg API. Recent changes in kTLs 80ece6a0
      ("tls: Remove redundant vars from tls record structure") and
      4e6d4720 ("tls: Add support for inplace records encryption")
      changed the data path handling a bit; while we've kept the latter
      optimization intact, we had to undo the former change to better
      fit the sk_msg model, hence the sg_aead_in and sg_aead_out have
      been brought back and are linked into the sk_msg sgs. Now the kTLS
      record contains a msg_plaintext and msg_encrypted sk_msg each.
      
      In the original code, the zerocopy_from_iter() has been used out
      of TX but also RX path. For the strparser skb-based RX path,
      we've left the zerocopy_from_iter() in decrypt_internal() mostly
      untouched, meaning it has been moved into tls_setup_from_iter()
      with charging logic removed (as not used from RX). Given RX path
      is not based on sk_msg objects, we haven't pursued setting up a
      dummy sk_msg to call into sk_msg_zerocopy_from_iter(), but it
      could be an option to prusue in a later step.
      
      Joint work with John.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      d829e9c4
  15. 03 10月, 2018 1 次提交
    • V
      tls: Add support for inplace records encryption · 4e6d4720
      Vakul Garg 提交于
      Presently, for non-zero copy case, separate pages are allocated for
      storing plaintext and encrypted text of records. These pages are stored
      in sg_plaintext_data and sg_encrypted_data scatterlists inside record
      structure. Further, sg_plaintext_data & sg_encrypted_data are passed
      to cryptoapis for record encryption. Allocating separate pages for
      plaintext and encrypted text is inefficient from both required memory
      and performance point of view.
      
      This patch adds support of inplace encryption of records. For non-zero
      copy case, we reuse the pages from sg_encrypted_data scatterlist to
      copy the application's plaintext data. For the movement of pages from
      sg_encrypted_data to sg_plaintext_data scatterlists, we introduce a new
      function move_to_plaintext_sg(). This function add pages into
      sg_plaintext_data from sg_encrypted_data scatterlists.
      
      tls_do_encryption() is modified to pass the same scatterlist as both
      source and destination into aead_request_set_crypt() if inplace crypto
      has been enabled. A new ariable 'inplace_crypto' has been introduced in
      record structure to signify whether the same scatterlist can be used.
      By default, the inplace_crypto is enabled in get_rec(). If zero-copy is
      used (i.e. plaintext data is not copied), inplace_crypto is set to '0'.
      Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
      Reviewed-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e6d4720
  16. 30 9月, 2018 1 次提交
    • V
      tls: Remove redundant vars from tls record structure · 80ece6a0
      Vakul Garg 提交于
      Structure 'tls_rec' contains sg_aead_in and sg_aead_out which point
      to a aad_space and then chain scatterlists sg_plaintext_data,
      sg_encrypted_data respectively. Rather than using chained scatterlists
      for plaintext and encrypted data in aead_req, it is efficient to store
      aad_space in sg_encrypted_data and sg_plaintext_data itself in the
      first index and get rid of sg_aead_in, sg_aead_in and further chaining.
      
      This requires increasing size of sg_encrypted_data & sg_plaintext_data
      arrarys by 1 to accommodate entry for aad_space. The code which uses
      sg_encrypted_data and sg_plaintext_data has been modified to skip first
      index as it points to aad_space.
      Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80ece6a0
  17. 25 9月, 2018 1 次提交
    • V
      net/tls: Fixed race condition in async encryption · 9932a29a
      Vakul Garg 提交于
      On processors with multi-engine crypto accelerators, it is possible that
      multiple records get encrypted in parallel and their encryption
      completion is notified to different cpus in multicore processor. This
      leads to the situation where tls_encrypt_done() starts executing in
      parallel on different cores. In current implementation, encrypted
      records are queued to tx_ready_list in tls_encrypt_done(). This requires
      addition to linked list 'tx_ready_list' to be protected. As
      tls_decrypt_done() could be executing in irq content, it is not possible
      to protect linked list addition operation using a lock.
      
      To fix the problem, we remove linked list addition operation from the
      irq context. We do tx_ready_list addition/removal operation from
      application context only and get rid of possible multiple access to
      the linked list. Before starting encryption on the record, we add it to
      the tail of tx_ready_list. To prevent tls_tx_records() from transmitting
      it, we mark the record with a new flag 'tx_ready' in 'struct tls_rec'.
      When record encryption gets completed, tls_encrypt_done() has to only
      update the 'tx_ready' flag to true & linked list add operation is not
      required.
      
      The changed logic brings some other side benefits. Since the records
      are always submitted in tls sequence number order for encryption, the
      tx_ready_list always remains sorted and addition of new records to it
      does not have to traverse the linked list.
      
      Lastly, we renamed tx_ready_list in 'struct tls_sw_context_tx' to
      'tx_list'. This is because now, the some of the records at the tail are
      not ready to transmit.
      
      Fixes: a42055e8 ("net/tls: Add support for async encryption")
      Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9932a29a
  18. 22 9月, 2018 1 次提交
    • V
      net/tls: Add support for async encryption of records for performance · a42055e8
      Vakul Garg 提交于
      In current implementation, tls records are encrypted & transmitted
      serially. Till the time the previously submitted user data is encrypted,
      the implementation waits and on finish starts transmitting the record.
      This approach of encrypt-one record at a time is inefficient when
      asynchronous crypto accelerators are used. For each record, there are
      overheads of interrupts, driver softIRQ scheduling etc. Also the crypto
      accelerator sits idle most of time while an encrypted record's pages are
      handed over to tcp stack for transmission.
      
      This patch enables encryption of multiple records in parallel when an
      async capable crypto accelerator is present in system. This is achieved
      by allowing the user space application to send more data using sendmsg()
      even while previously issued data is being processed by crypto
      accelerator. This requires returning the control back to user space
      application after submitting encryption request to accelerator. This
      also means that zero-copy mode of encryption cannot be used with async
      accelerator as we must be done with user space application buffer before
      returning from sendmsg().
      
      There can be multiple records in flight to/from the accelerator. Each of
      the record is represented by 'struct tls_rec'. This is used to store the
      memory pages for the record.
      
      After the records are encrypted, they are added in a linked list called
      tx_ready_list which contains encrypted tls records sorted as per tls
      sequence number. The records from tx_ready_list are transmitted using a
      newly introduced function called tls_tx_records(). The tx_ready_list is
      polled for any record ready to be transmitted in sendmsg(), sendpage()
      after initiating encryption of new tls records. This achieves parallel
      encryption and transmission of records when async accelerator is
      present.
      
      There could be situation when crypto accelerator completes encryption
      later than polling of tx_ready_list by sendmsg()/sendpage(). Therefore
      we need a deferred work context to be able to transmit records from
      tx_ready_list. The deferred work context gets scheduled if applications
      are not sending much data through the socket. If the applications issue
      sendmsg()/sendpage() in quick succession, then the scheduling of
      tx_work_handler gets cancelled as the tx_ready_list would be polled from
      application's context itself. This saves scheduling overhead of deferred
      work.
      
      The patch also brings some side benefit. We are able to get rid of the
      concept of CLOSED record. This is because the records once closed are
      either encrypted and then placed into tx_ready_list or if encryption
      fails, the socket error is set. This simplifies the kernel tls
      sendpath. However since tls_device.c is still using macros, accessory
      functions for CLOSED records have been retained.
      Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a42055e8
  19. 17 9月, 2018 1 次提交
    • J
      tls: async support causes out-of-bounds access in crypto APIs · 7a3dd8c8
      John Fastabend 提交于
      When async support was added it needed to access the sk from the async
      callback to report errors up the stack. The patch tried to use space
      after the aead request struct by directly setting the reqsize field in
      aead_request. This is an internal field that should not be used
      outside the crypto APIs. It is used by the crypto code to define extra
      space for private structures used in the crypto context. Users of the
      API then use crypto_aead_reqsize() and add the returned amount of
      bytes to the end of the request memory allocation before posting the
      request to encrypt/decrypt APIs.
      
      So this breaks (with general protection fault and KASAN error, if
      enabled) because the request sent to decrypt is shorter than required
      causing the crypto API out-of-bounds errors. Also it seems unlikely the
      sk is even valid by the time it gets to the callback because of memset
      in crypto layer.
      
      Anyways, fix this by holding the sk in the skb->sk field when the
      callback is set up and because the skb is already passed through to
      the callback handler via void* we can access it in the handler. Then
      in the handler we need to be careful to NULL the pointer again before
      kfree_skb. I added comments on both the setup (in tls_do_decryption)
      and when we clear it from the crypto callback handler
      tls_decrypt_done(). After this selftests pass again and fixes KASAN
      errors/warnings.
      
      Fixes: 94524d8f ("net/tls: Add support for async decryption of tls records")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Reviewed-by: NVakul Garg <Vakul.garg@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a3dd8c8
  20. 14 9月, 2018 1 次提交
  21. 02 9月, 2018 1 次提交
    • V
      net/tls: Add support for async decryption of tls records · 94524d8f
      Vakul Garg 提交于
      When tls records are decrypted using asynchronous acclerators such as
      NXP CAAM engine, the crypto apis return -EINPROGRESS. Presently, on
      getting -EINPROGRESS, the tls record processing stops till the time the
      crypto accelerator finishes off and returns the result. This incurs a
      context switch and is not an efficient way of accessing the crypto
      accelerators. Crypto accelerators work efficient when they are queued
      with multiple crypto jobs without having to wait for the previous ones
      to complete.
      
      The patch submits multiple crypto requests without having to wait for
      for previous ones to complete. This has been implemented for records
      which are decrypted in zero-copy mode. At the end of recvmsg(), we wait
      for all the asynchronous decryption requests to complete.
      
      The references to records which have been sent for async decryption are
      dropped. For cases where record decryption is not possible in zero-copy
      mode, asynchronous decryption is not used and we wait for decryption
      crypto api to complete.
      
      For crypto requests executing in async fashion, the memory for
      aead_request, sglists and skb etc is freed from the decryption
      completion handler. The decryption completion handler wakesup the
      sleeping user context when recvmsg() flags that it has done sending
      all the decryption requests and there are no more decryption requests
      pending to be completed.
      Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
      Reviewed-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      94524d8f
  22. 13 8月, 2018 1 次提交
    • V
      net/tls: Combined memory allocation for decryption request · 0b243d00
      Vakul Garg 提交于
      For preparing decryption request, several memory chunks are required
      (aead_req, sgin, sgout, iv, aad). For submitting the decrypt request to
      an accelerator, it is required that the buffers which are read by the
      accelerator must be dma-able and not come from stack. The buffers for
      aad and iv can be separately kmalloced each, but it is inefficient.
      This patch does a combined allocation for preparing decryption request
      and then segments into aead_req || sgin || sgout || iv || aad.
      Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0b243d00
  23. 16 7月, 2018 4 次提交
    • B
      tls: Add rx inline crypto offload · 4799ac81
      Boris Pismenny 提交于
      This patch completes the generic infrastructure to offload TLS crypto to a
      network device. It enables the kernel to skip decryption and
      authentication of some skbs marked as decrypted by the NIC. In the fast
      path, all packets received are decrypted by the NIC and the performance
      is comparable to plain TCP.
      
      This infrastructure doesn't require a TCP offload engine. Instead, the
      NIC only decrypts packets that contain the expected TCP sequence number.
      Out-Of-Order TCP packets are provided unmodified. As a result, at the
      worst case a received TLS record consists of both plaintext and ciphertext
      packets. These partially decrypted records must be reencrypted,
      only to be decrypted.
      
      The notable differences between SW KTLS Rx and this offload are as
      follows:
      1. Partial decryption - Software must handle the case of a TLS record
      that was only partially decrypted by HW. This can happen due to packet
      reordering.
      2. Resynchronization - tls_read_size calls the device driver to
      resynchronize HW after HW lost track of TLS record framing in
      the TCP stream.
      Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4799ac81
    • B
      tls: Split tls_sw_release_resources_rx · 39f56e1a
      Boris Pismenny 提交于
      This patch splits tls_sw_release_resources_rx into two functions one
      which releases all inner software tls structures and another that also
      frees the containing structure.
      
      In TLS_DEVICE we will need to release the software structures without
      freeeing the containing structure, which contains other information.
      Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      39f56e1a
    • B
      tls: Split decrypt_skb to two functions · dafb67f3
      Boris Pismenny 提交于
      Previously, decrypt_skb also updated the TLS context.
      Now, decrypt_skb only decrypts the payload using the current context,
      while decrypt_skb_update also updates the state.
      
      Later, in the tls_device Rx flow, we will use decrypt_skb directly.
      Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dafb67f3
    • B
      tls: Refactor tls_offload variable names · d80a1b9d
      Boris Pismenny 提交于
      For symmetry, we rename tls_offload_context to
      tls_offload_context_tx before we add tls_offload_context_rx.
      Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d80a1b9d
  24. 29 6月, 2018 1 次提交
    • L
      Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL · a11e1d43
      Linus Torvalds 提交于
      The poll() changes were not well thought out, and completely
      unexplained.  They also caused a huge performance regression, because
      "->poll()" was no longer a trivial file operation that just called down
      to the underlying file operations, but instead did at least two indirect
      calls.
      
      Indirect calls are sadly slow now with the Spectre mitigation, but the
      performance problem could at least be largely mitigated by changing the
      "->get_poll_head()" operation to just have a per-file-descriptor pointer
      to the poll head instead.  That gets rid of one of the new indirections.
      
      But that doesn't fix the new complexity that is completely unwarranted
      for the regular case.  The (undocumented) reason for the poll() changes
      was some alleged AIO poll race fixing, but we don't make the common case
      slower and more complex for some uncommon special case, so this all
      really needs way more explanations and most likely a fundamental
      redesign.
      
      [ This revert is a revert of about 30 different commits, not reverted
        individually because that would just be unnecessarily messy  - Linus ]
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a11e1d43
  25. 12 6月, 2018 1 次提交
    • D
      tls: fix NULL pointer dereference on poll · f6fadff3
      Daniel Borkmann 提交于
      While hacking on kTLS, I ran into the following panic from an
      unprivileged netserver / netperf TCP session:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
        PGD 800000037f378067 P4D 800000037f378067 PUD 3c0e61067 PMD 0
        Oops: 0010 [#1] SMP KASAN PTI
        CPU: 1 PID: 2289 Comm: netserver Not tainted 4.17.0+ #139
        Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016
        RIP: 0010:          (null)
        Code: Bad RIP value.
        RSP: 0018:ffff88036abcf740 EFLAGS: 00010246
        RAX: dffffc0000000000 RBX: ffff88036f5f6800 RCX: 1ffff1006debed26
        RDX: ffff88036abcf920 RSI: ffff8803cb1a4f00 RDI: ffff8803c258c280
        RBP: ffff8803c258c280 R08: ffff8803c258c280 R09: ffffed006f559d48
        R10: ffff88037aacea43 R11: ffffed006f559d49 R12: ffff8803c258c280
        R13: ffff8803cb1a4f20 R14: 00000000000000db R15: ffffffffc168a350
        FS:  00007f7e631f4700(0000) GS:ffff8803d1c80000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: ffffffffffffffd6 CR3: 00000003ccf64005 CR4: 00000000003606e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         ? tls_sw_poll+0xa4/0x160 [tls]
         ? sock_poll+0x20a/0x680
         ? do_select+0x77b/0x11a0
         ? poll_schedule_timeout.constprop.12+0x130/0x130
         ? pick_link+0xb00/0xb00
         ? read_word_at_a_time+0x13/0x20
         ? vfs_poll+0x270/0x270
         ? deref_stack_reg+0xad/0xe0
         ? __read_once_size_nocheck.constprop.6+0x10/0x10
        [...]
      
      Debugging further, it turns out that calling into ctx->sk_poll() is
      invalid since sk_poll itself is NULL which was saved from the original
      TCP socket in order for tls_sw_poll() to invoke it.
      
      Looks like the recent conversion from poll to poll_mask callback started
      in 15252423 ("net: add support for ->poll_mask in proto_ops") missed
      to eventually convert kTLS, too: TCP's ->poll was converted over to the
      ->poll_mask in commit 2c7d3dac ("net/tcp: convert to ->poll_mask")
      and therefore kTLS wrongly saved the ->poll old one which is now NULL.
      
      Convert kTLS over to use ->poll_mask instead. Also instead of POLLIN |
      POLLRDNORM use the proper EPOLLIN | EPOLLRDNORM bits as the case in
      tcp_poll_mask() as well that is mangled here.
      
      Fixes: 2c7d3dac ("net/tcp: convert to ->poll_mask")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Watson <davejwatson@fb.com>
      Tested-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f6fadff3
  26. 18 5月, 2018 1 次提交
  27. 02 5月, 2018 1 次提交