1. 16 1月, 2020 3 次提交
    • J
      bpf: Sockmap/tls, fix pop data with SK_DROP return code · 7361d448
      John Fastabend 提交于
      When user returns SK_DROP we need to reset the number of copied bytes
      to indicate to the user the bytes were dropped and not sent. If we
      don't reset the copied arg sendmsg will return as if those bytes were
      copied giving the user a positive return value.
      
      This works as expected today except in the case where the user also
      pops bytes. In the pop case the sg.size is reduced but we don't correctly
      account for this when copied bytes is reset. The popped bytes are not
      accounted for and we return a small positive value potentially confusing
      the user.
      
      The reason this happens is due to a typo where we do the wrong comparison
      when accounting for pop bytes. In this fix notice the if/else is not
      needed and that we have a similar problem if we push data except its not
      visible to the user because if delta is larger the sg.size we return a
      negative value so it appears as an error regardless.
      
      Fixes: 7246d8ed ("bpf: helper to pop data from messages")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/bpf/20200111061206.8028-9-john.fastabend@gmail.com
      7361d448
    • J
      bpf: Sockmap/tls, skmsg can have wrapped skmsg that needs extra chaining · 9aaaa568
      John Fastabend 提交于
      Its possible through a set of push, pop, apply helper calls to construct
      a skmsg, which is just a ring of scatterlist elements, with the start
      value larger than the end value. For example,
      
            end       start
        |_0_|_1_| ... |_n_|_n+1_|
      
      Where end points at 1 and start points and n so that valid elements is
      the set {n, n+1, 0, 1}.
      
      Currently, because we don't build the correct chain only {n, n+1} will
      be sent. This adds a check and sg_chain call to correctly submit the
      above to the crypto and tls send path.
      
      Fixes: d3b18ad3 ("tls: add bpf support to sk_msg handling")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/bpf/20200111061206.8028-8-john.fastabend@gmail.com
      9aaaa568
    • J
      bpf: Sockmap/tls, tls_sw can create a plaintext buf > encrypt buf · d468e477
      John Fastabend 提交于
      It is possible to build a plaintext buffer using push helper that is larger
      than the allocated encrypt buffer. When this record is pushed to crypto
      layers this can result in a NULL pointer dereference because the crypto
      API expects the encrypt buffer is large enough to fit the plaintext
      buffer. Kernel splat below.
      
      To resolve catch the cases this can happen and split the buffer into two
      records to send individually. Unfortunately, there is still one case to
      handle where the split creates a zero sized buffer. In this case we merge
      the buffers and unmark the split. This happens when apply is zero and user
      pushed data beyond encrypt buffer. This fixes the original case as well
      because the split allocated an encrypt buffer larger than the plaintext
      buffer and the merge simply moves the pointers around so we now have
      a reference to the new (larger) encrypt buffer.
      
      Perhaps its not ideal but it seems the best solution for a fixes branch
      and avoids handling these two cases, (a) apply that needs split and (b)
      non apply case. The are edge cases anyways so optimizing them seems not
      necessary unless someone wants later in next branches.
      
      [  306.719107] BUG: kernel NULL pointer dereference, address: 0000000000000008
      [...]
      [  306.747260] RIP: 0010:scatterwalk_copychunks+0x12f/0x1b0
      [...]
      [  306.770350] Call Trace:
      [  306.770956]  scatterwalk_map_and_copy+0x6c/0x80
      [  306.772026]  gcm_enc_copy_hash+0x4b/0x50
      [  306.772925]  gcm_hash_crypt_remain_continue+0xef/0x110
      [  306.774138]  gcm_hash_crypt_continue+0xa1/0xb0
      [  306.775103]  ? gcm_hash_crypt_continue+0xa1/0xb0
      [  306.776103]  gcm_hash_assoc_remain_continue+0x94/0xa0
      [  306.777170]  gcm_hash_assoc_continue+0x9d/0xb0
      [  306.778239]  gcm_hash_init_continue+0x8f/0xa0
      [  306.779121]  gcm_hash+0x73/0x80
      [  306.779762]  gcm_encrypt_continue+0x6d/0x80
      [  306.780582]  crypto_gcm_encrypt+0xcb/0xe0
      [  306.781474]  crypto_aead_encrypt+0x1f/0x30
      [  306.782353]  tls_push_record+0x3b9/0xb20 [tls]
      [  306.783314]  ? sk_psock_msg_verdict+0x199/0x300
      [  306.784287]  bpf_exec_tx_verdict+0x3f2/0x680 [tls]
      [  306.785357]  tls_sw_sendmsg+0x4a3/0x6a0 [tls]
      
      test_sockmap test signature to trigger bug,
      
      [TEST]: (1, 1, 1, sendmsg, pass,redir,start 1,end 2,pop (1,2),ktls,):
      
      Fixes: d3b18ad3 ("tls: add bpf support to sk_msg handling")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/bpf/20200111061206.8028-7-john.fastabend@gmail.com
      d468e477
  2. 07 12月, 2019 1 次提交
  3. 29 11月, 2019 4 次提交
  4. 20 11月, 2019 1 次提交
  5. 07 11月, 2019 2 次提交
    • J
      net/tls: add a TX lock · 79ffe608
      Jakub Kicinski 提交于
      TLS TX needs to release and re-acquire the socket lock if send buffer
      fills up.
      
      TLS SW TX path currently depends on only allowing one thread to enter
      the function by the abuse of sk_write_pending. If another writer is
      already waiting for memory no new ones are allowed in.
      
      This has two problems:
       - writers don't wake other threads up when they leave the kernel;
         meaning that this scheme works for single extra thread (second
         application thread or delayed work) because memory becoming
         available will send a wake up request, but as Mallesham and
         Pooja report with larger number of threads it leads to threads
         being put to sleep indefinitely;
       - the delayed work does not get _scheduled_ but it may _run_ when
         other writers are present leading to crashes as writers don't
         expect state to change under their feet (same records get pushed
         and freed multiple times); it's hard to reliably bail from the
         work, however, because the mere presence of a writer does not
         guarantee that the writer will push pending records before exiting.
      
      Ensuring wakeups always happen will make the code basically open
      code a mutex. Just use a mutex.
      
      The TLS HW TX path does not have any locking (not even the
      sk_write_pending hack), yet it uses a per-socket sg_tx_data
      array to push records.
      
      Fixes: a42055e8 ("net/tls: Add support for async encryption of records for performance")
      Reported-by: NMallesham  Jatharakonda <mallesh537@gmail.com>
      Reported-by: NPooja Trivedi <poojatrivedi@gmail.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79ffe608
    • J
      net/tls: don't pay attention to sk_write_pending when pushing partial records · 02b1fa07
      Jakub Kicinski 提交于
      sk_write_pending being not zero does not guarantee that partial
      record will be pushed. If the thread waiting for memory times out
      the pending record may get stuck.
      
      In case of tls_device there is no path where parial record is
      set and writer present in the first place. Partial record is
      set only in tls_push_sg() and tls_push_sg() will return an
      error immediately. All tls_device callers of tls_push_sg()
      will return (and not wait for memory) if it failed.
      
      Fixes: a42055e8 ("net/tls: Add support for async encryption of records for performance")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02b1fa07
  6. 07 10月, 2019 3 次提交
  7. 06 10月, 2019 1 次提交
  8. 05 9月, 2019 1 次提交
  9. 22 7月, 2019 3 次提交
  10. 08 7月, 2019 1 次提交
  11. 13 6月, 2019 1 次提交
    • J
      net: tls, correctly account for copied bytes with multiple sk_msgs · 648ee6ce
      John Fastabend 提交于
      tls_sw_do_sendpage needs to return the total number of bytes sent
      regardless of how many sk_msgs are allocated. Unfortunately, copied
      (the value we return up the stack) is zero'd before each new sk_msg
      is allocated so we only return the copied size of the last sk_msg used.
      
      The caller (splice, etc.) of sendpage will then believe only part
      of its data was sent and send the missing chunks again. However,
      because the data actually was sent the receiver will get multiple
      copies of the same data.
      
      To reproduce this do multiple sendfile calls with a length close to
      the max record size. This will in turn call splice/sendpage, sendpage
      may use multiple sk_msg in this case and then returns the incorrect
      number of bytes. This will cause splice to resend creating duplicate
      data on the receiver. Andre created a C program that can easily
      generate this case so we will push a similar selftest for this to
      bpf-next shortly.
      
      The fix is to _not_ zero the copied field so that the total sent
      bytes is returned.
      Reported-by: NSteinar H. Gunderson <steinar+kernel@gunderson.no>
      Reported-by: NAndre Tomt <andre@tomt.net>
      Tested-by: NAndre Tomt <andre@tomt.net>
      Fixes: d829e9c4 ("tls: convert to generic sk_msg interface")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      648ee6ce
  12. 12 6月, 2019 3 次提交
    • J
      net/tls: add kernel-driven TLS RX resync · f953d33b
      Jakub Kicinski 提交于
      TLS offload device may lose sync with the TCP stream if packets
      arrive out of order.  Drivers can currently request a resync at
      a specific TCP sequence number.  When a record is found starting
      at that sequence number kernel will inform the device of the
      corresponding record number.
      
      This requires the device to constantly scan the stream for a
      known pattern (constant bytes of the header) after sync is lost.
      
      This patch adds an alternative approach which is entirely under
      the control of the kernel.  Kernel tracks records it had to fully
      decrypt, even though TLS socket is in TLS_HW mode.  If multiple
      records did not have any decrypted parts - it's a pretty strong
      indication that the device is out of sync.
      
      We choose the min number of fully encrypted records to be 2,
      which should hopefully be more than will get retransmitted at
      a time.
      
      After kernel decides the device is out of sync it schedules a
      resync request.  If the TCP socket is empty the resync gets
      performed immediately.  If socket is not empty we leave the
      record parser to resync when next record comes.
      
      Before resync in message parser we peek at the TCP socket and
      don't attempt the sync if the socket already has some of the
      next record queued.
      
      On resync failure (encrypted data continues to flow in) we
      retry with exponential backoff, up to once every 128 records
      (with a 16k record thats at most once every 2M of data).
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f953d33b
    • J
      net/tls: rename handle_device_resync() · fe58a5a0
      Jakub Kicinski 提交于
      handle_device_resync() doesn't describe the function very well.
      The function checks if resync should be issued upon parsing of
      a new record.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe58a5a0
    • J
      net/tls: pass record number as a byte array · 89fec474
      Jakub Kicinski 提交于
      TLS offload code casts record number to a u64.  The buffer
      should be aligned to 8 bytes, but its actually a __be64, and
      the rest of the TLS code treats it as big int.  Make the
      offload callbacks take a byte array, drivers can make the
      choice to do the ugly cast if they want to.
      
      Prepare for copying the record number onto the stack by
      defining a constant for max size of the byte array.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89fec474
  13. 05 6月, 2019 2 次提交
  14. 27 5月, 2019 2 次提交
  15. 10 5月, 2019 1 次提交
  16. 21 4月, 2019 1 次提交
  17. 11 4月, 2019 1 次提交
  18. 30 3月, 2019 1 次提交
  19. 22 3月, 2019 1 次提交
  20. 21 3月, 2019 1 次提交
    • V
      net/tls: Add support of AES128-CCM based ciphers · f295b3ae
      Vakul Garg 提交于
      Added support for AES128-CCM based record encryption. AES128-CCM is
      similar to AES128-GCM. Both of them have same salt/iv/mac size. The
      notable difference between the two is that while invoking AES128-CCM
      operation, the salt||nonce (which is passed as IV) has to be prefixed
      with a hardcoded value '2'. Further, CCM implementation in kernel
      requires IV passed in crypto_aead_request() to be full '16' bytes.
      Therefore, the record structure 'struct tls_rec' has been modified to
      reserve '16' bytes for IV. This works for both GCM and CCM based cipher.
      Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f295b3ae
  21. 04 3月, 2019 3 次提交
  22. 25 2月, 2019 1 次提交
    • V
      tls: Return type of non-data records retrieved using MSG_PEEK in recvmsg · 2b794c40
      Vakul Garg 提交于
      The patch enables returning 'type' in msghdr for records that are
      retrieved with MSG_PEEK in recvmsg. Further it prevents records peeked
      from socket from getting clubbed with any other record of different
      type when records are subsequently dequeued from strparser.
      
      For each record, we now retain its type in sk_buff's control buffer
      cb[]. Inside control buffer, record's full length and offset are already
      stored by strparser in 'struct strp_msg'. We store record type after
      'struct strp_msg' inside 'struct tls_msg'. For tls1.2, the type is
      stored just after record dequeue. For tls1.3, the type is stored after
      record has been decrypted.
      
      Inside process_rx_list(), before processing a non-data record, we check
      that we must be able to return back the record type to the user
      application. If not, the decrypted records in tls context's rx_list is
      left there without consuming any data.
      
      Fixes: 692d7b5d ("tls: Fix recvmsg() to be able to peek across multiple records")
      Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b794c40
  23. 20 2月, 2019 1 次提交
  24. 13 2月, 2019 1 次提交
    • V
      net/tls: Do not use async crypto for non-data records · c0ab4732
      Vakul Garg 提交于
      Addition of tls1.3 support broke tls1.2 handshake when async crypto
      accelerator is used. This is because the record type for non-data
      records is not propagated to user application. Also when async
      decryption happens, the decryption does not stop when two different
      types of records get dequeued and submitted for decryption. To address
      it, we decrypt tls1.2 non-data records in synchronous way. We check
      whether the record we just processed has same type as the previous one
      before checking for async condition and jumping to dequeue next record.
      
      Fixes: 130b392c ("net: tls: Add tls 1.3 support")
      Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0ab4732