1. 22 7月, 2019 4 次提交
  2. 09 7月, 2019 1 次提交
  3. 02 7月, 2019 1 次提交
  4. 24 6月, 2019 1 次提交
    • D
      net/tls: fix page double free on TX cleanup · 9354544c
      Dirk van der Merwe 提交于
      With commit 94850257 ("tls: Fix tls_device handling of partial records")
      a new path was introduced to cleanup partial records during sk_proto_close.
      This path does not handle the SW KTLS tx_list cleanup.
      
      This is unnecessary though since the free_resources calls for both
      SW and offload paths will cleanup a partial record.
      
      The visible effect is the following warning, but this bug also causes
      a page double free.
      
          WARNING: CPU: 7 PID: 4000 at net/core/stream.c:206 sk_stream_kill_queues+0x103/0x110
          RIP: 0010:sk_stream_kill_queues+0x103/0x110
          RSP: 0018:ffffb6df87e07bd0 EFLAGS: 00010206
          RAX: 0000000000000000 RBX: ffff8c21db4971c0 RCX: 0000000000000007
          RDX: ffffffffffffffa0 RSI: 000000000000001d RDI: ffff8c21db497270
          RBP: ffff8c21db497270 R08: ffff8c29f4748600 R09: 000000010020001a
          R10: ffffb6df87e07aa0 R11: ffffffff9a445600 R12: 0000000000000007
          R13: 0000000000000000 R14: ffff8c21f03f2900 R15: ffff8c21f03b8df0
          Call Trace:
           inet_csk_destroy_sock+0x55/0x100
           tcp_close+0x25d/0x400
           ? tcp_check_oom+0x120/0x120
           tls_sk_proto_close+0x127/0x1c0
           inet_release+0x3c/0x60
           __sock_release+0x3d/0xb0
           sock_close+0x11/0x20
           __fput+0xd8/0x210
           task_work_run+0x84/0xa0
           do_exit+0x2dc/0xb90
           ? release_sock+0x43/0x90
           do_group_exit+0x3a/0xa0
           get_signal+0x295/0x720
           do_signal+0x36/0x610
           ? SYSC_recvfrom+0x11d/0x130
           exit_to_usermode_loop+0x69/0xb0
           do_syscall_64+0x173/0x180
           entry_SYSCALL_64_after_hwframe+0x3d/0xa2
          RIP: 0033:0x7fe9b9abc10d
          RSP: 002b:00007fe9b19a1d48 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
          RAX: fffffffffffffe00 RBX: 0000000000000006 RCX: 00007fe9b9abc10d
          RDX: 0000000000000002 RSI: 0000000000000080 RDI: 00007fe948003430
          RBP: 00007fe948003410 R08: 00007fe948003430 R09: 0000000000000000
          R10: 0000000000000000 R11: 0000000000000246 R12: 00005603739d9080
          R13: 00007fe9b9ab9f90 R14: 00007fe948003430 R15: 0000000000000000
      
      Fixes: 94850257 ("tls: Fix tls_device handling of partial records")
      Signed-off-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9354544c
  5. 12 6月, 2019 5 次提交
  6. 07 6月, 2019 3 次提交
  7. 05 6月, 2019 3 次提交
  8. 28 4月, 2019 3 次提交
  9. 11 4月, 2019 2 次提交
  10. 21 3月, 2019 1 次提交
    • V
      net/tls: Add support of AES128-CCM based ciphers · f295b3ae
      Vakul Garg 提交于
      Added support for AES128-CCM based record encryption. AES128-CCM is
      similar to AES128-GCM. Both of them have same salt/iv/mac size. The
      notable difference between the two is that while invoking AES128-CCM
      operation, the salt||nonce (which is passed as IV) has to be prefixed
      with a hardcoded value '2'. Further, CCM implementation in kernel
      requires IV passed in crypto_aead_request() to be full '16' bytes.
      Therefore, the record structure 'struct tls_rec' has been modified to
      reserve '16' bytes for IV. This works for both GCM and CCM based cipher.
      Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f295b3ae
  11. 04 3月, 2019 2 次提交
    • B
      tls: Fix write space handling · 7463d3a2
      Boris Pismenny 提交于
      TLS device cannot use the sw context. This patch returns the original
      tls device write space handler and moves the sw/device specific portions
      to the relevant files.
      
      Also, we remove the write_space call for the tls_sw flow, because it
      handles partial records in its delayed tx work handler.
      
      Fixes: a42055e8 ("net/tls: Add support for async encryption of records for performance")
      Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
      Reviewed-by: NEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7463d3a2
    • B
      tls: Fix tls_device handling of partial records · 94850257
      Boris Pismenny 提交于
      Cleanup the handling of partial records while fixing a bug where the
      tls_push_pending_closed_record function is using the software tls
      context instead of the hardware context.
      
      The bug resulted in the following crash:
      [   88.791229] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      [   88.793271] #PF error: [normal kernel read fault]
      [   88.794449] PGD 800000022a426067 P4D 800000022a426067 PUD 22a156067 PMD 0
      [   88.795958] Oops: 0000 [#1] SMP PTI
      [   88.796884] CPU: 2 PID: 4973 Comm: openssl Not tainted 5.0.0-rc4+ #3
      [   88.798314] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      [   88.800067] RIP: 0010:tls_tx_records+0xef/0x1d0 [tls]
      [   88.801256] Code: 00 02 48 89 43 08 e8 a0 0b 96 d9 48 89 df e8 48 dd
      4d d9 4c 89 f8 4d 8b bf 98 00 00 00 48 05 98 00 00 00 48 89 04 24 49 39
      c7 <49> 8b 1f 4d 89 fd 0f 84 af 00 00 00 41 8b 47 10 85 c0 0f 85 8d 00
      [   88.805179] RSP: 0018:ffffbd888186fca8 EFLAGS: 00010213
      [   88.806458] RAX: ffff9af1ed657c98 RBX: ffff9af1e88a1980 RCX: 0000000000000000
      [   88.808050] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9af1e88a1980
      [   88.809724] RBP: ffff9af1e88a1980 R08: 0000000000000017 R09: ffff9af1ebeeb700
      [   88.811294] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      [   88.812917] R13: ffff9af1e88a1980 R14: ffff9af1ec13f800 R15: 0000000000000000
      [   88.814506] FS:  00007fcad2240740(0000) GS:ffff9af1f7880000(0000) knlGS:0000000000000000
      [   88.816337] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   88.817717] CR2: 0000000000000000 CR3: 0000000228b3e000 CR4: 00000000001406e0
      [   88.819328] Call Trace:
      [   88.820123]  tls_push_data+0x628/0x6a0 [tls]
      [   88.821283]  ? remove_wait_queue+0x20/0x60
      [   88.822383]  ? n_tty_read+0x683/0x910
      [   88.823363]  tls_device_sendmsg+0x53/0xa0 [tls]
      [   88.824505]  sock_sendmsg+0x36/0x50
      [   88.825492]  sock_write_iter+0x87/0x100
      [   88.826521]  __vfs_write+0x127/0x1b0
      [   88.827499]  vfs_write+0xad/0x1b0
      [   88.828454]  ksys_write+0x52/0xc0
      [   88.829378]  do_syscall_64+0x5b/0x180
      [   88.830369]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   88.831603] RIP: 0033:0x7fcad1451680
      
      [ 1248.470626] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      [ 1248.472564] #PF error: [normal kernel read fault]
      [ 1248.473790] PGD 0 P4D 0
      [ 1248.474642] Oops: 0000 [#1] SMP PTI
      [ 1248.475651] CPU: 3 PID: 7197 Comm: openssl Tainted: G           OE 5.0.0-rc4+ #3
      [ 1248.477426] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      [ 1248.479310] RIP: 0010:tls_tx_records+0x110/0x1f0 [tls]
      [ 1248.480644] Code: 00 02 48 89 43 08 e8 4f cb 63 d7 48 89 df e8 f7 9c
      1b d7 4c 89 f8 4d 8b bf 98 00 00 00 48 05 98 00 00 00 48 89 04 24 49 39
      c7 <49> 8b 1f 4d 89 fd 0f 84 af 00 00 00 41 8b 47 10 85 c0 0f 85 8d 00
      [ 1248.484825] RSP: 0018:ffffaa0a41543c08 EFLAGS: 00010213
      [ 1248.486154] RAX: ffff955a2755dc98 RBX: ffff955a36031980 RCX: 0000000000000006
      [ 1248.487855] RDX: 0000000000000000 RSI: 000000000000002b RDI: 0000000000000286
      [ 1248.489524] RBP: ffff955a36031980 R08: 0000000000000000 R09: 00000000000002b1
      [ 1248.491394] R10: 0000000000000003 R11: 00000000ad55ad55 R12: 0000000000000000
      [ 1248.493162] R13: 0000000000000000 R14: ffff955a2abe6c00 R15: 0000000000000000
      [ 1248.494923] FS:  0000000000000000(0000) GS:ffff955a378c0000(0000) knlGS:0000000000000000
      [ 1248.496847] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1248.498357] CR2: 0000000000000000 CR3: 000000020c40e000 CR4: 00000000001406e0
      [ 1248.500136] Call Trace:
      [ 1248.500998]  ? tcp_check_oom+0xd0/0xd0
      [ 1248.502106]  tls_sk_proto_close+0x127/0x1e0 [tls]
      [ 1248.503411]  inet_release+0x3c/0x60
      [ 1248.504530]  __sock_release+0x3d/0xb0
      [ 1248.505611]  sock_close+0x11/0x20
      [ 1248.506612]  __fput+0xb4/0x220
      [ 1248.507559]  task_work_run+0x88/0xa0
      [ 1248.508617]  do_exit+0x2cb/0xbc0
      [ 1248.509597]  ? core_sys_select+0x17a/0x280
      [ 1248.510740]  do_group_exit+0x39/0xb0
      [ 1248.511789]  get_signal+0x1d0/0x630
      [ 1248.512823]  do_signal+0x36/0x620
      [ 1248.513822]  exit_to_usermode_loop+0x5c/0xc6
      [ 1248.515003]  do_syscall_64+0x157/0x180
      [ 1248.516094]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [ 1248.517456] RIP: 0033:0x7fb398bd3f53
      [ 1248.518537] Code: Bad RIP value.
      
      Fixes: a42055e8 ("net/tls: Add support for async encryption of records for performance")
      Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      94850257
  12. 25 2月, 2019 1 次提交
    • V
      tls: Return type of non-data records retrieved using MSG_PEEK in recvmsg · 2b794c40
      Vakul Garg 提交于
      The patch enables returning 'type' in msghdr for records that are
      retrieved with MSG_PEEK in recvmsg. Further it prevents records peeked
      from socket from getting clubbed with any other record of different
      type when records are subsequently dequeued from strparser.
      
      For each record, we now retain its type in sk_buff's control buffer
      cb[]. Inside control buffer, record's full length and offset are already
      stored by strparser in 'struct strp_msg'. We store record type after
      'struct strp_msg' inside 'struct tls_msg'. For tls1.2, the type is
      stored just after record dequeue. For tls1.3, the type is stored after
      record has been decrypted.
      
      Inside process_rx_list(), before processing a non-data record, we check
      that we must be able to return back the record type to the user
      application. If not, the decrypted records in tls context's rx_list is
      left there without consuming any data.
      
      Fixes: 692d7b5d ("tls: Fix recvmsg() to be able to peek across multiple records")
      Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b794c40
  13. 20 2月, 2019 1 次提交
  14. 02 2月, 2019 4 次提交
  15. 29 1月, 2019 1 次提交
    • D
      net: tls: Save iv in tls_rec for async crypto requests · 32eb67b9
      Dave Watson 提交于
      aead_request_set_crypt takes an iv pointer, and we change the iv
      soon after setting it.  Some async crypto algorithms don't save the iv,
      so we need to save it in the tls_rec for async requests.
      
      Found by hardcoding x64 aesni to use async crypto manager (to test the async
      codepath), however I don't think this combination can happen in the wild.
      Presumably other hardware offloads will need this fix, but there have been
      no user reports.
      
      Fixes: a42055e8 ("Add support for async encryption of records...")
      Signed-off-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32eb67b9
  16. 18 1月, 2019 1 次提交
    • V
      tls: Fix recvmsg() to be able to peek across multiple records · 692d7b5d
      Vakul Garg 提交于
      This fixes recvmsg() to be able to peek across multiple tls records.
      Without this patch, the tls's selftests test case
      'recv_peek_large_buf_mult_recs' fails. Each tls receive context now
      maintains a 'rx_list' to retain incoming skb carrying tls records. If a
      tls record needs to be retained e.g. for peek case or for the case when
      the buffer passed to recvmsg() has a length smaller than decrypted
      record length, then it is added to 'rx_list'. Additionally, records are
      added in 'rx_list' if the crypto operation runs in async mode. The
      records are dequeued from 'rx_list' after the decrypted data is consumed
      by copying into the buffer passed to recvmsg(). In case, the MSG_PEEK
      flag is used in recvmsg(), then records are not consumed or removed
      from the 'rx_list'.
      Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      692d7b5d
  17. 21 12月, 2018 1 次提交
    • J
      bpf: sk_msg, sock{map|hash} redirect through ULP · 0608c69c
      John Fastabend 提交于
      A sockmap program that redirects through a kTLS ULP enabled socket
      will not work correctly because the ULP layer is skipped. This
      fixes the behavior to call through the ULP layer on redirect to
      ensure any operations required on the data stream at the ULP layer
      continue to be applied.
      
      To do this we add an internal flag MSG_SENDPAGE_NOPOLICY to avoid
      calling the BPF layer on a redirected message. This is
      required to avoid calling the BPF layer multiple times (possibly
      recursively) which is not the current/expected behavior without
      ULPs. In the future we may add a redirect flag if users _do_
      want the policy applied again but this would need to work for both
      ULP and non-ULP sockets and be opt-in to avoid breaking existing
      programs.
      
      Also to avoid polluting the flag space with an internal flag we
      reuse the flag space overlapping MSG_SENDPAGE_NOPOLICY with
      MSG_WAITFORONE. Here WAITFORONE is specific to recv path and
      SENDPAGE_NOPOLICY is only used for sendpage hooks. The last thing
      to verify is user space API is masked correctly to ensure the flag
      can not be set by user. (Note this needs to be true regardless
      because we have internal flags already in-use that user space
      should not be able to set). But for completeness we have two UAPI
      paths into sendpage, sendfile and splice.
      
      In the sendfile case the function do_sendfile() zero's flags,
      
      ./fs/read_write.c:
       static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
      		   	    size_t count, loff_t max)
       {
         ...
         fl = 0;
      #if 0
         /*
          * We need to debate whether we can enable this or not. The
          * man page documents EAGAIN return for the output at least,
          * and the application is arguably buggy if it doesn't expect
          * EAGAIN on a non-blocking file descriptor.
          */
          if (in.file->f_flags & O_NONBLOCK)
      	fl = SPLICE_F_NONBLOCK;
      #endif
          file_start_write(out.file);
          retval = do_splice_direct(in.file, &pos, out.file, &out_pos, count, fl);
       }
      
      In the splice case the pipe_to_sendpage "actor" is used which
      masks flags with SPLICE_F_MORE.
      
      ./fs/splice.c:
       static int pipe_to_sendpage(struct pipe_inode_info *pipe,
      			    struct pipe_buffer *buf, struct splice_desc *sd)
       {
         ...
         more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0;
         ...
       }
      
      Confirming what we expect that internal flags  are in fact internal
      to socket side.
      
      Fixes: d3b18ad3 ("tls: add bpf support to sk_msg handling")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      0608c69c
  18. 15 12月, 2018 1 次提交
    • A
      net/tls: sleeping function from invalid context · df9d4a17
      Atul Gupta 提交于
      HW unhash within mutex for registered tls devices cause sleep
      when called from tcp_set_state for TCP_CLOSE. Release lock and
      re-acquire after function call with ref count incr/dec.
      defined kref and fp release for tls_device to ensure device
      is not released outside lock.
      
      BUG: sleeping function called from invalid context at
      kernel/locking/mutex.c:748
      in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/7
      INFO: lockdep is turned off.
      CPU: 7 PID: 0 Comm: swapper/7 Tainted: G        W  O
      Call Trace:
       <IRQ>
       dump_stack+0x5e/0x8b
       ___might_sleep+0x222/0x260
       __mutex_lock+0x5c/0xa50
       ? vprintk_emit+0x1f3/0x440
       ? kmem_cache_free+0x22d/0x2a0
       ? tls_hw_unhash+0x2f/0x80
       ? printk+0x52/0x6e
       ? tls_hw_unhash+0x2f/0x80
       tls_hw_unhash+0x2f/0x80
       tcp_set_state+0x5f/0x180
       tcp_done+0x2e/0xe0
       tcp_rcv_state_process+0x92c/0xdd3
       ? lock_acquire+0xf5/0x1f0
       ? tcp_v4_rcv+0xa7c/0xbe0
       ? tcp_v4_do_rcv+0x70/0x1e0
      Signed-off-by: NAtul Gupta <atul.gupta@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df9d4a17
  19. 16 10月, 2018 2 次提交
    • J
      tls: replace poll implementation with read hook · 924ad65e
      John Fastabend 提交于
      Instead of re-implementing poll routine use the poll callback to
      trigger read from kTLS, we reuse the stream_memory_read callback
      which is simpler and achieves the same. This helps to align sockmap
      and kTLS so we can more easily embed BPF in kTLS.
      
      Joint work with Daniel.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      924ad65e
    • D
      tls: convert to generic sk_msg interface · d829e9c4
      Daniel Borkmann 提交于
      Convert kTLS over to make use of sk_msg interface for plaintext and
      encrypted scattergather data, so it reuses all the sk_msg helpers
      and data structure which later on in a second step enables to glue
      this to BPF.
      
      This also allows to remove quite a bit of open coded helpers which
      are covered by the sk_msg API. Recent changes in kTLs 80ece6a0
      ("tls: Remove redundant vars from tls record structure") and
      4e6d4720 ("tls: Add support for inplace records encryption")
      changed the data path handling a bit; while we've kept the latter
      optimization intact, we had to undo the former change to better
      fit the sk_msg model, hence the sg_aead_in and sg_aead_out have
      been brought back and are linked into the sk_msg sgs. Now the kTLS
      record contains a msg_plaintext and msg_encrypted sk_msg each.
      
      In the original code, the zerocopy_from_iter() has been used out
      of TX but also RX path. For the strparser skb-based RX path,
      we've left the zerocopy_from_iter() in decrypt_internal() mostly
      untouched, meaning it has been moved into tls_setup_from_iter()
      with charging logic removed (as not used from RX). Given RX path
      is not based on sk_msg objects, we haven't pursued setting up a
      dummy sk_msg to call into sk_msg_zerocopy_from_iter(), but it
      could be an option to prusue in a later step.
      
      Joint work with John.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      d829e9c4
  20. 03 10月, 2018 1 次提交
    • V
      tls: Add support for inplace records encryption · 4e6d4720
      Vakul Garg 提交于
      Presently, for non-zero copy case, separate pages are allocated for
      storing plaintext and encrypted text of records. These pages are stored
      in sg_plaintext_data and sg_encrypted_data scatterlists inside record
      structure. Further, sg_plaintext_data & sg_encrypted_data are passed
      to cryptoapis for record encryption. Allocating separate pages for
      plaintext and encrypted text is inefficient from both required memory
      and performance point of view.
      
      This patch adds support of inplace encryption of records. For non-zero
      copy case, we reuse the pages from sg_encrypted_data scatterlist to
      copy the application's plaintext data. For the movement of pages from
      sg_encrypted_data to sg_plaintext_data scatterlists, we introduce a new
      function move_to_plaintext_sg(). This function add pages into
      sg_plaintext_data from sg_encrypted_data scatterlists.
      
      tls_do_encryption() is modified to pass the same scatterlist as both
      source and destination into aead_request_set_crypt() if inplace crypto
      has been enabled. A new ariable 'inplace_crypto' has been introduced in
      record structure to signify whether the same scatterlist can be used.
      By default, the inplace_crypto is enabled in get_rec(). If zero-copy is
      used (i.e. plaintext data is not copied), inplace_crypto is set to '0'.
      Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
      Reviewed-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e6d4720
  21. 30 9月, 2018 1 次提交
    • V
      tls: Remove redundant vars from tls record structure · 80ece6a0
      Vakul Garg 提交于
      Structure 'tls_rec' contains sg_aead_in and sg_aead_out which point
      to a aad_space and then chain scatterlists sg_plaintext_data,
      sg_encrypted_data respectively. Rather than using chained scatterlists
      for plaintext and encrypted data in aead_req, it is efficient to store
      aad_space in sg_encrypted_data and sg_plaintext_data itself in the
      first index and get rid of sg_aead_in, sg_aead_in and further chaining.
      
      This requires increasing size of sg_encrypted_data & sg_plaintext_data
      arrarys by 1 to accommodate entry for aad_space. The code which uses
      sg_encrypted_data and sg_plaintext_data has been modified to skip first
      index as it points to aad_space.
      Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80ece6a0