1. 25 2月, 2019 1 次提交
    • V
      tls: Return type of non-data records retrieved using MSG_PEEK in recvmsg · 2b794c40
      Vakul Garg 提交于
      The patch enables returning 'type' in msghdr for records that are
      retrieved with MSG_PEEK in recvmsg. Further it prevents records peeked
      from socket from getting clubbed with any other record of different
      type when records are subsequently dequeued from strparser.
      
      For each record, we now retain its type in sk_buff's control buffer
      cb[]. Inside control buffer, record's full length and offset are already
      stored by strparser in 'struct strp_msg'. We store record type after
      'struct strp_msg' inside 'struct tls_msg'. For tls1.2, the type is
      stored just after record dequeue. For tls1.3, the type is stored after
      record has been decrypted.
      
      Inside process_rx_list(), before processing a non-data record, we check
      that we must be able to return back the record type to the user
      application. If not, the decrypted records in tls context's rx_list is
      left there without consuming any data.
      
      Fixes: 692d7b5d ("tls: Fix recvmsg() to be able to peek across multiple records")
      Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b794c40
  2. 20 2月, 2019 1 次提交
  3. 02 2月, 2019 4 次提交
  4. 29 1月, 2019 1 次提交
    • D
      net: tls: Save iv in tls_rec for async crypto requests · 32eb67b9
      Dave Watson 提交于
      aead_request_set_crypt takes an iv pointer, and we change the iv
      soon after setting it.  Some async crypto algorithms don't save the iv,
      so we need to save it in the tls_rec for async requests.
      
      Found by hardcoding x64 aesni to use async crypto manager (to test the async
      codepath), however I don't think this combination can happen in the wild.
      Presumably other hardware offloads will need this fix, but there have been
      no user reports.
      
      Fixes: a42055e8 ("Add support for async encryption of records...")
      Signed-off-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32eb67b9
  5. 18 1月, 2019 1 次提交
    • V
      tls: Fix recvmsg() to be able to peek across multiple records · 692d7b5d
      Vakul Garg 提交于
      This fixes recvmsg() to be able to peek across multiple tls records.
      Without this patch, the tls's selftests test case
      'recv_peek_large_buf_mult_recs' fails. Each tls receive context now
      maintains a 'rx_list' to retain incoming skb carrying tls records. If a
      tls record needs to be retained e.g. for peek case or for the case when
      the buffer passed to recvmsg() has a length smaller than decrypted
      record length, then it is added to 'rx_list'. Additionally, records are
      added in 'rx_list' if the crypto operation runs in async mode. The
      records are dequeued from 'rx_list' after the decrypted data is consumed
      by copying into the buffer passed to recvmsg(). In case, the MSG_PEEK
      flag is used in recvmsg(), then records are not consumed or removed
      from the 'rx_list'.
      Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      692d7b5d
  6. 21 12月, 2018 1 次提交
    • J
      bpf: sk_msg, sock{map|hash} redirect through ULP · 0608c69c
      John Fastabend 提交于
      A sockmap program that redirects through a kTLS ULP enabled socket
      will not work correctly because the ULP layer is skipped. This
      fixes the behavior to call through the ULP layer on redirect to
      ensure any operations required on the data stream at the ULP layer
      continue to be applied.
      
      To do this we add an internal flag MSG_SENDPAGE_NOPOLICY to avoid
      calling the BPF layer on a redirected message. This is
      required to avoid calling the BPF layer multiple times (possibly
      recursively) which is not the current/expected behavior without
      ULPs. In the future we may add a redirect flag if users _do_
      want the policy applied again but this would need to work for both
      ULP and non-ULP sockets and be opt-in to avoid breaking existing
      programs.
      
      Also to avoid polluting the flag space with an internal flag we
      reuse the flag space overlapping MSG_SENDPAGE_NOPOLICY with
      MSG_WAITFORONE. Here WAITFORONE is specific to recv path and
      SENDPAGE_NOPOLICY is only used for sendpage hooks. The last thing
      to verify is user space API is masked correctly to ensure the flag
      can not be set by user. (Note this needs to be true regardless
      because we have internal flags already in-use that user space
      should not be able to set). But for completeness we have two UAPI
      paths into sendpage, sendfile and splice.
      
      In the sendfile case the function do_sendfile() zero's flags,
      
      ./fs/read_write.c:
       static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
      		   	    size_t count, loff_t max)
       {
         ...
         fl = 0;
      #if 0
         /*
          * We need to debate whether we can enable this or not. The
          * man page documents EAGAIN return for the output at least,
          * and the application is arguably buggy if it doesn't expect
          * EAGAIN on a non-blocking file descriptor.
          */
          if (in.file->f_flags & O_NONBLOCK)
      	fl = SPLICE_F_NONBLOCK;
      #endif
          file_start_write(out.file);
          retval = do_splice_direct(in.file, &pos, out.file, &out_pos, count, fl);
       }
      
      In the splice case the pipe_to_sendpage "actor" is used which
      masks flags with SPLICE_F_MORE.
      
      ./fs/splice.c:
       static int pipe_to_sendpage(struct pipe_inode_info *pipe,
      			    struct pipe_buffer *buf, struct splice_desc *sd)
       {
         ...
         more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0;
         ...
       }
      
      Confirming what we expect that internal flags  are in fact internal
      to socket side.
      
      Fixes: d3b18ad3 ("tls: add bpf support to sk_msg handling")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      0608c69c
  7. 15 12月, 2018 1 次提交
    • A
      net/tls: sleeping function from invalid context · df9d4a17
      Atul Gupta 提交于
      HW unhash within mutex for registered tls devices cause sleep
      when called from tcp_set_state for TCP_CLOSE. Release lock and
      re-acquire after function call with ref count incr/dec.
      defined kref and fp release for tls_device to ensure device
      is not released outside lock.
      
      BUG: sleeping function called from invalid context at
      kernel/locking/mutex.c:748
      in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/7
      INFO: lockdep is turned off.
      CPU: 7 PID: 0 Comm: swapper/7 Tainted: G        W  O
      Call Trace:
       <IRQ>
       dump_stack+0x5e/0x8b
       ___might_sleep+0x222/0x260
       __mutex_lock+0x5c/0xa50
       ? vprintk_emit+0x1f3/0x440
       ? kmem_cache_free+0x22d/0x2a0
       ? tls_hw_unhash+0x2f/0x80
       ? printk+0x52/0x6e
       ? tls_hw_unhash+0x2f/0x80
       tls_hw_unhash+0x2f/0x80
       tcp_set_state+0x5f/0x180
       tcp_done+0x2e/0xe0
       tcp_rcv_state_process+0x92c/0xdd3
       ? lock_acquire+0xf5/0x1f0
       ? tcp_v4_rcv+0xa7c/0xbe0
       ? tcp_v4_do_rcv+0x70/0x1e0
      Signed-off-by: NAtul Gupta <atul.gupta@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df9d4a17
  8. 16 10月, 2018 2 次提交
    • J
      tls: replace poll implementation with read hook · 924ad65e
      John Fastabend 提交于
      Instead of re-implementing poll routine use the poll callback to
      trigger read from kTLS, we reuse the stream_memory_read callback
      which is simpler and achieves the same. This helps to align sockmap
      and kTLS so we can more easily embed BPF in kTLS.
      
      Joint work with Daniel.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      924ad65e
    • D
      tls: convert to generic sk_msg interface · d829e9c4
      Daniel Borkmann 提交于
      Convert kTLS over to make use of sk_msg interface for plaintext and
      encrypted scattergather data, so it reuses all the sk_msg helpers
      and data structure which later on in a second step enables to glue
      this to BPF.
      
      This also allows to remove quite a bit of open coded helpers which
      are covered by the sk_msg API. Recent changes in kTLs 80ece6a0
      ("tls: Remove redundant vars from tls record structure") and
      4e6d4720 ("tls: Add support for inplace records encryption")
      changed the data path handling a bit; while we've kept the latter
      optimization intact, we had to undo the former change to better
      fit the sk_msg model, hence the sg_aead_in and sg_aead_out have
      been brought back and are linked into the sk_msg sgs. Now the kTLS
      record contains a msg_plaintext and msg_encrypted sk_msg each.
      
      In the original code, the zerocopy_from_iter() has been used out
      of TX but also RX path. For the strparser skb-based RX path,
      we've left the zerocopy_from_iter() in decrypt_internal() mostly
      untouched, meaning it has been moved into tls_setup_from_iter()
      with charging logic removed (as not used from RX). Given RX path
      is not based on sk_msg objects, we haven't pursued setting up a
      dummy sk_msg to call into sk_msg_zerocopy_from_iter(), but it
      could be an option to prusue in a later step.
      
      Joint work with John.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      d829e9c4
  9. 03 10月, 2018 1 次提交
    • V
      tls: Add support for inplace records encryption · 4e6d4720
      Vakul Garg 提交于
      Presently, for non-zero copy case, separate pages are allocated for
      storing plaintext and encrypted text of records. These pages are stored
      in sg_plaintext_data and sg_encrypted_data scatterlists inside record
      structure. Further, sg_plaintext_data & sg_encrypted_data are passed
      to cryptoapis for record encryption. Allocating separate pages for
      plaintext and encrypted text is inefficient from both required memory
      and performance point of view.
      
      This patch adds support of inplace encryption of records. For non-zero
      copy case, we reuse the pages from sg_encrypted_data scatterlist to
      copy the application's plaintext data. For the movement of pages from
      sg_encrypted_data to sg_plaintext_data scatterlists, we introduce a new
      function move_to_plaintext_sg(). This function add pages into
      sg_plaintext_data from sg_encrypted_data scatterlists.
      
      tls_do_encryption() is modified to pass the same scatterlist as both
      source and destination into aead_request_set_crypt() if inplace crypto
      has been enabled. A new ariable 'inplace_crypto' has been introduced in
      record structure to signify whether the same scatterlist can be used.
      By default, the inplace_crypto is enabled in get_rec(). If zero-copy is
      used (i.e. plaintext data is not copied), inplace_crypto is set to '0'.
      Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
      Reviewed-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e6d4720
  10. 30 9月, 2018 1 次提交
    • V
      tls: Remove redundant vars from tls record structure · 80ece6a0
      Vakul Garg 提交于
      Structure 'tls_rec' contains sg_aead_in and sg_aead_out which point
      to a aad_space and then chain scatterlists sg_plaintext_data,
      sg_encrypted_data respectively. Rather than using chained scatterlists
      for plaintext and encrypted data in aead_req, it is efficient to store
      aad_space in sg_encrypted_data and sg_plaintext_data itself in the
      first index and get rid of sg_aead_in, sg_aead_in and further chaining.
      
      This requires increasing size of sg_encrypted_data & sg_plaintext_data
      arrarys by 1 to accommodate entry for aad_space. The code which uses
      sg_encrypted_data and sg_plaintext_data has been modified to skip first
      index as it points to aad_space.
      Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80ece6a0
  11. 25 9月, 2018 1 次提交
    • V
      net/tls: Fixed race condition in async encryption · 9932a29a
      Vakul Garg 提交于
      On processors with multi-engine crypto accelerators, it is possible that
      multiple records get encrypted in parallel and their encryption
      completion is notified to different cpus in multicore processor. This
      leads to the situation where tls_encrypt_done() starts executing in
      parallel on different cores. In current implementation, encrypted
      records are queued to tx_ready_list in tls_encrypt_done(). This requires
      addition to linked list 'tx_ready_list' to be protected. As
      tls_decrypt_done() could be executing in irq content, it is not possible
      to protect linked list addition operation using a lock.
      
      To fix the problem, we remove linked list addition operation from the
      irq context. We do tx_ready_list addition/removal operation from
      application context only and get rid of possible multiple access to
      the linked list. Before starting encryption on the record, we add it to
      the tail of tx_ready_list. To prevent tls_tx_records() from transmitting
      it, we mark the record with a new flag 'tx_ready' in 'struct tls_rec'.
      When record encryption gets completed, tls_encrypt_done() has to only
      update the 'tx_ready' flag to true & linked list add operation is not
      required.
      
      The changed logic brings some other side benefits. Since the records
      are always submitted in tls sequence number order for encryption, the
      tx_ready_list always remains sorted and addition of new records to it
      does not have to traverse the linked list.
      
      Lastly, we renamed tx_ready_list in 'struct tls_sw_context_tx' to
      'tx_list'. This is because now, the some of the records at the tail are
      not ready to transmit.
      
      Fixes: a42055e8 ("net/tls: Add support for async encryption")
      Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9932a29a
  12. 22 9月, 2018 1 次提交
    • V
      net/tls: Add support for async encryption of records for performance · a42055e8
      Vakul Garg 提交于
      In current implementation, tls records are encrypted & transmitted
      serially. Till the time the previously submitted user data is encrypted,
      the implementation waits and on finish starts transmitting the record.
      This approach of encrypt-one record at a time is inefficient when
      asynchronous crypto accelerators are used. For each record, there are
      overheads of interrupts, driver softIRQ scheduling etc. Also the crypto
      accelerator sits idle most of time while an encrypted record's pages are
      handed over to tcp stack for transmission.
      
      This patch enables encryption of multiple records in parallel when an
      async capable crypto accelerator is present in system. This is achieved
      by allowing the user space application to send more data using sendmsg()
      even while previously issued data is being processed by crypto
      accelerator. This requires returning the control back to user space
      application after submitting encryption request to accelerator. This
      also means that zero-copy mode of encryption cannot be used with async
      accelerator as we must be done with user space application buffer before
      returning from sendmsg().
      
      There can be multiple records in flight to/from the accelerator. Each of
      the record is represented by 'struct tls_rec'. This is used to store the
      memory pages for the record.
      
      After the records are encrypted, they are added in a linked list called
      tx_ready_list which contains encrypted tls records sorted as per tls
      sequence number. The records from tx_ready_list are transmitted using a
      newly introduced function called tls_tx_records(). The tx_ready_list is
      polled for any record ready to be transmitted in sendmsg(), sendpage()
      after initiating encryption of new tls records. This achieves parallel
      encryption and transmission of records when async accelerator is
      present.
      
      There could be situation when crypto accelerator completes encryption
      later than polling of tx_ready_list by sendmsg()/sendpage(). Therefore
      we need a deferred work context to be able to transmit records from
      tx_ready_list. The deferred work context gets scheduled if applications
      are not sending much data through the socket. If the applications issue
      sendmsg()/sendpage() in quick succession, then the scheduling of
      tx_work_handler gets cancelled as the tx_ready_list would be polled from
      application's context itself. This saves scheduling overhead of deferred
      work.
      
      The patch also brings some side benefit. We are able to get rid of the
      concept of CLOSED record. This is because the records once closed are
      either encrypted and then placed into tx_ready_list or if encryption
      fails, the socket error is set. This simplifies the kernel tls
      sendpath. However since tls_device.c is still using macros, accessory
      functions for CLOSED records have been retained.
      Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a42055e8
  13. 17 9月, 2018 1 次提交
    • J
      tls: async support causes out-of-bounds access in crypto APIs · 7a3dd8c8
      John Fastabend 提交于
      When async support was added it needed to access the sk from the async
      callback to report errors up the stack. The patch tried to use space
      after the aead request struct by directly setting the reqsize field in
      aead_request. This is an internal field that should not be used
      outside the crypto APIs. It is used by the crypto code to define extra
      space for private structures used in the crypto context. Users of the
      API then use crypto_aead_reqsize() and add the returned amount of
      bytes to the end of the request memory allocation before posting the
      request to encrypt/decrypt APIs.
      
      So this breaks (with general protection fault and KASAN error, if
      enabled) because the request sent to decrypt is shorter than required
      causing the crypto API out-of-bounds errors. Also it seems unlikely the
      sk is even valid by the time it gets to the callback because of memset
      in crypto layer.
      
      Anyways, fix this by holding the sk in the skb->sk field when the
      callback is set up and because the skb is already passed through to
      the callback handler via void* we can access it in the handler. Then
      in the handler we need to be careful to NULL the pointer again before
      kfree_skb. I added comments on both the setup (in tls_do_decryption)
      and when we clear it from the crypto callback handler
      tls_decrypt_done(). After this selftests pass again and fixes KASAN
      errors/warnings.
      
      Fixes: 94524d8f ("net/tls: Add support for async decryption of tls records")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Reviewed-by: NVakul Garg <Vakul.garg@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a3dd8c8
  14. 14 9月, 2018 1 次提交
  15. 02 9月, 2018 1 次提交
    • V
      net/tls: Add support for async decryption of tls records · 94524d8f
      Vakul Garg 提交于
      When tls records are decrypted using asynchronous acclerators such as
      NXP CAAM engine, the crypto apis return -EINPROGRESS. Presently, on
      getting -EINPROGRESS, the tls record processing stops till the time the
      crypto accelerator finishes off and returns the result. This incurs a
      context switch and is not an efficient way of accessing the crypto
      accelerators. Crypto accelerators work efficient when they are queued
      with multiple crypto jobs without having to wait for the previous ones
      to complete.
      
      The patch submits multiple crypto requests without having to wait for
      for previous ones to complete. This has been implemented for records
      which are decrypted in zero-copy mode. At the end of recvmsg(), we wait
      for all the asynchronous decryption requests to complete.
      
      The references to records which have been sent for async decryption are
      dropped. For cases where record decryption is not possible in zero-copy
      mode, asynchronous decryption is not used and we wait for decryption
      crypto api to complete.
      
      For crypto requests executing in async fashion, the memory for
      aead_request, sglists and skb etc is freed from the decryption
      completion handler. The decryption completion handler wakesup the
      sleeping user context when recvmsg() flags that it has done sending
      all the decryption requests and there are no more decryption requests
      pending to be completed.
      Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
      Reviewed-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      94524d8f
  16. 13 8月, 2018 1 次提交
    • V
      net/tls: Combined memory allocation for decryption request · 0b243d00
      Vakul Garg 提交于
      For preparing decryption request, several memory chunks are required
      (aead_req, sgin, sgout, iv, aad). For submitting the decrypt request to
      an accelerator, it is required that the buffers which are read by the
      accelerator must be dma-able and not come from stack. The buffers for
      aad and iv can be separately kmalloced each, but it is inefficient.
      This patch does a combined allocation for preparing decryption request
      and then segments into aead_req || sgin || sgout || iv || aad.
      Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0b243d00
  17. 16 7月, 2018 4 次提交
    • B
      tls: Add rx inline crypto offload · 4799ac81
      Boris Pismenny 提交于
      This patch completes the generic infrastructure to offload TLS crypto to a
      network device. It enables the kernel to skip decryption and
      authentication of some skbs marked as decrypted by the NIC. In the fast
      path, all packets received are decrypted by the NIC and the performance
      is comparable to plain TCP.
      
      This infrastructure doesn't require a TCP offload engine. Instead, the
      NIC only decrypts packets that contain the expected TCP sequence number.
      Out-Of-Order TCP packets are provided unmodified. As a result, at the
      worst case a received TLS record consists of both plaintext and ciphertext
      packets. These partially decrypted records must be reencrypted,
      only to be decrypted.
      
      The notable differences between SW KTLS Rx and this offload are as
      follows:
      1. Partial decryption - Software must handle the case of a TLS record
      that was only partially decrypted by HW. This can happen due to packet
      reordering.
      2. Resynchronization - tls_read_size calls the device driver to
      resynchronize HW after HW lost track of TLS record framing in
      the TCP stream.
      Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4799ac81
    • B
      tls: Split tls_sw_release_resources_rx · 39f56e1a
      Boris Pismenny 提交于
      This patch splits tls_sw_release_resources_rx into two functions one
      which releases all inner software tls structures and another that also
      frees the containing structure.
      
      In TLS_DEVICE we will need to release the software structures without
      freeeing the containing structure, which contains other information.
      Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      39f56e1a
    • B
      tls: Split decrypt_skb to two functions · dafb67f3
      Boris Pismenny 提交于
      Previously, decrypt_skb also updated the TLS context.
      Now, decrypt_skb only decrypts the payload using the current context,
      while decrypt_skb_update also updates the state.
      
      Later, in the tls_device Rx flow, we will use decrypt_skb directly.
      Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dafb67f3
    • B
      tls: Refactor tls_offload variable names · d80a1b9d
      Boris Pismenny 提交于
      For symmetry, we rename tls_offload_context to
      tls_offload_context_tx before we add tls_offload_context_rx.
      Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d80a1b9d
  18. 29 6月, 2018 1 次提交
    • L
      Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL · a11e1d43
      Linus Torvalds 提交于
      The poll() changes were not well thought out, and completely
      unexplained.  They also caused a huge performance regression, because
      "->poll()" was no longer a trivial file operation that just called down
      to the underlying file operations, but instead did at least two indirect
      calls.
      
      Indirect calls are sadly slow now with the Spectre mitigation, but the
      performance problem could at least be largely mitigated by changing the
      "->get_poll_head()" operation to just have a per-file-descriptor pointer
      to the poll head instead.  That gets rid of one of the new indirections.
      
      But that doesn't fix the new complexity that is completely unwarranted
      for the regular case.  The (undocumented) reason for the poll() changes
      was some alleged AIO poll race fixing, but we don't make the common case
      slower and more complex for some uncommon special case, so this all
      really needs way more explanations and most likely a fundamental
      redesign.
      
      [ This revert is a revert of about 30 different commits, not reverted
        individually because that would just be unnecessarily messy  - Linus ]
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a11e1d43
  19. 12 6月, 2018 1 次提交
    • D
      tls: fix NULL pointer dereference on poll · f6fadff3
      Daniel Borkmann 提交于
      While hacking on kTLS, I ran into the following panic from an
      unprivileged netserver / netperf TCP session:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
        PGD 800000037f378067 P4D 800000037f378067 PUD 3c0e61067 PMD 0
        Oops: 0010 [#1] SMP KASAN PTI
        CPU: 1 PID: 2289 Comm: netserver Not tainted 4.17.0+ #139
        Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016
        RIP: 0010:          (null)
        Code: Bad RIP value.
        RSP: 0018:ffff88036abcf740 EFLAGS: 00010246
        RAX: dffffc0000000000 RBX: ffff88036f5f6800 RCX: 1ffff1006debed26
        RDX: ffff88036abcf920 RSI: ffff8803cb1a4f00 RDI: ffff8803c258c280
        RBP: ffff8803c258c280 R08: ffff8803c258c280 R09: ffffed006f559d48
        R10: ffff88037aacea43 R11: ffffed006f559d49 R12: ffff8803c258c280
        R13: ffff8803cb1a4f20 R14: 00000000000000db R15: ffffffffc168a350
        FS:  00007f7e631f4700(0000) GS:ffff8803d1c80000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: ffffffffffffffd6 CR3: 00000003ccf64005 CR4: 00000000003606e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         ? tls_sw_poll+0xa4/0x160 [tls]
         ? sock_poll+0x20a/0x680
         ? do_select+0x77b/0x11a0
         ? poll_schedule_timeout.constprop.12+0x130/0x130
         ? pick_link+0xb00/0xb00
         ? read_word_at_a_time+0x13/0x20
         ? vfs_poll+0x270/0x270
         ? deref_stack_reg+0xad/0xe0
         ? __read_once_size_nocheck.constprop.6+0x10/0x10
        [...]
      
      Debugging further, it turns out that calling into ctx->sk_poll() is
      invalid since sk_poll itself is NULL which was saved from the original
      TCP socket in order for tls_sw_poll() to invoke it.
      
      Looks like the recent conversion from poll to poll_mask callback started
      in 15252423 ("net: add support for ->poll_mask in proto_ops") missed
      to eventually convert kTLS, too: TCP's ->poll was converted over to the
      ->poll_mask in commit 2c7d3dac ("net/tcp: convert to ->poll_mask")
      and therefore kTLS wrongly saved the ->poll old one which is now NULL.
      
      Convert kTLS over to use ->poll_mask instead. Also instead of POLLIN |
      POLLRDNORM use the proper EPOLLIN | EPOLLRDNORM bits as the case in
      tcp_poll_mask() as well that is mangled here.
      
      Fixes: 2c7d3dac ("net/tcp: convert to ->poll_mask")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Watson <davejwatson@fb.com>
      Tested-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f6fadff3
  20. 18 5月, 2018 1 次提交
  21. 02 5月, 2018 1 次提交
  22. 01 5月, 2018 2 次提交
    • I
      net/tls: Add generic NIC offload infrastructure · e8f69799
      Ilya Lesokhin 提交于
      This patch adds a generic infrastructure to offload TLS crypto to a
      network device. It enables the kernel TLS socket to skip encryption
      and authentication operations on the transmit side of the data path.
      Leaving those computationally expensive operations to the NIC.
      
      The NIC offload infrastructure builds TLS records and pushes them to
      the TCP layer just like the SW KTLS implementation and using the same
      API.
      TCP segmentation is mostly unaffected. Currently the only exception is
      that we prevent mixed SKBs where only part of the payload requires
      offload. In the future we are likely to add a similar restriction
      following a change cipher spec record.
      
      The notable differences between SW KTLS and NIC offloaded TLS
      implementations are as follows:
      1. The offloaded implementation builds "plaintext TLS record", those
      records contain plaintext instead of ciphertext and place holder bytes
      instead of authentication tags.
      2. The offloaded implementation maintains a mapping from TCP sequence
      number to TLS records. Thus given a TCP SKB sent from a NIC offloaded
      TLS socket, we can use the tls NIC offload infrastructure to obtain
      enough context to encrypt the payload of the SKB.
      A TLS record is released when the last byte of the record is ack'ed,
      this is done through the new icsk_clean_acked callback.
      
      The infrastructure should be extendable to support various NIC offload
      implementations.  However it is currently written with the
      implementation below in mind:
      The NIC assumes that packets from each offloaded stream are sent as
      plaintext and in-order. It keeps track of the TLS records in the TCP
      stream. When a packet marked for offload is transmitted, the NIC
      encrypts the payload in-place and puts authentication tags in the
      relevant place holders.
      
      The responsibility for handling out-of-order packets (i.e. TCP
      retransmission, qdisc drops) falls on the netdev driver.
      
      The netdev driver keeps track of the expected TCP SN from the NIC's
      perspective.  If the next packet to transmit matches the expected TCP
      SN, the driver advances the expected TCP SN, and transmits the packet
      with TLS offload indication.
      
      If the next packet to transmit does not match the expected TCP SN. The
      driver calls the TLS layer to obtain the TLS record that includes the
      TCP of the packet for transmission. Using this TLS record, the driver
      posts a work entry on the transmit queue to reconstruct the NIC TLS
      state required for the offload of the out-of-order packet. It updates
      the expected TCP SN accordingly and transmits the now in-order packet.
      The same queue is used for packet transmission and TLS context
      reconstruction to avoid the need for flushing the transmit queue before
      issuing the context reconstruction request.
      Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com>
      Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NAviad Yehezkel <aviadye@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e8f69799
    • B
      net/tls: Split conf to rx + tx · f66de3ee
      Boris Pismenny 提交于
      In TLS inline crypto, we can have one direction in software
      and another in hardware. Thus, we split the TLS configuration to separate
      structures for receive and transmit.
      Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f66de3ee
  23. 01 4月, 2018 1 次提交
  24. 24 3月, 2018 4 次提交
    • D
      tls: RX path for ktls · c46234eb
      Dave Watson 提交于
      Add rx path for tls software implementation.
      
      recvmsg, splice_read, and poll implemented.
      
      An additional sockopt TLS_RX is added, with the same interface as
      TLS_TX.  Either TLX_RX or TLX_TX may be provided separately, or
      together (with two different setsockopt calls with appropriate keys).
      
      Control messages are passed via CMSG in a similar way to transmit.
      If no cmsg buffer is passed, then only application data records
      will be passed to userspace, and EIO is returned for other types of
      alerts.
      
      EBADMSG is passed for decryption errors, and EMSGSIZE is passed for
      framing too big, and EBADMSG for framing too small (matching openssl
      semantics). EINVAL is returned for TLS versions that do not match the
      original setsockopt call.  All are unrecoverable.
      
      strparser is used to parse TLS framing.   Decryption is done directly
      in to userspace buffers if they are large enough to support it, otherwise
      sk_cow_data is called (similar to ipsec), and buffers are decrypted in
      place and copied.  splice_read always decrypts in place, since no
      buffers are provided to decrypt in to.
      
      sk_poll is overridden, and only returns POLLIN if a full TLS message is
      received.  Otherwise we wait for strparser to finish reading a full frame.
      Actual decryption is only done during recvmsg or splice_read calls.
      Signed-off-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c46234eb
    • D
      tls: Refactor variable names · 58371585
      Dave Watson 提交于
      Several config variables are prefixed with tx, drop the prefix
      since these will be used for both tx and rx.
      Signed-off-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58371585
    • D
      tls: Pass error code explicitly to tls_err_abort · f4a8e43f
      Dave Watson 提交于
      Pass EBADMSG explicitly to tls_err_abort.  Receive path will
      pass additional codes - EMSGSIZE if framing is larger than max
      TLS record size, EINVAL if TLS version mismatch.
      Signed-off-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4a8e43f
    • D
      tls: Move cipher info to a separate struct · dbe42559
      Dave Watson 提交于
      Separate tx crypto parameters to a separate cipher_context struct.
      The same parameters will be used for rx using the same struct.
      
      tls_advance_record_sn is modified to only take the cipher info.
      Signed-off-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dbe42559
  25. 31 1月, 2018 1 次提交
  26. 16 1月, 2018 1 次提交
  27. 15 11月, 2017 1 次提交
    • D
      uapi: fix linux/tls.h userspace compilation error · b9f3eb49
      Dmitry V. Levin 提交于
      Move inclusion of a private kernel header <net/tcp.h>
      from uapi/linux/tls.h to its only user - net/tls.h,
      to fix the following linux/tls.h userspace compilation error:
      
      /usr/include/linux/tls.h:41:21: fatal error: net/tcp.h: No such file or directory
      
      As to this point uapi/linux/tls.h was totaly unusuable for userspace,
      cleanup this header file further by moving other redundant includes
      to net/tls.h.
      
      Fixes: 3c4d7559 ("tls: kernel TLS support")
      Cc: <stable@vger.kernel.org> # v4.13+
      Signed-off-by: NDmitry V. Levin <ldv@altlinux.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b9f3eb49
  28. 14 11月, 2017 2 次提交