1. 27 7月, 2022 1 次提交
  2. 18 7月, 2022 2 次提交
    • T
      net/tls: Fix race in TLS device down flow · f08d8c1b
      Tariq Toukan 提交于
      Socket destruction flow and tls_device_down function sync against each
      other using tls_device_lock and the context refcount, to guarantee the
      device resources are freed via tls_dev_del() by the end of
      tls_device_down.
      
      In the following unfortunate flow, this won't happen:
      - refcount is decreased to zero in tls_device_sk_destruct.
      - tls_device_down starts, skips the context as refcount is zero, going
        all the way until it flushes the gc work, and returns without freeing
        the device resources.
      - only then, tls_device_queue_ctx_destruction is called, queues the gc
        work and frees the context's device resources.
      
      Solve it by decreasing the refcount in the socket's destruction flow
      under the tls_device_lock, for perfect synchronization.  This does not
      slow down the common likely destructor flow, in which both the refcount
      is decreased and the spinlock is acquired, anyway.
      
      Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
      Reviewed-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
      Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f08d8c1b
    • J
      tls: rx: read the input skb from ctx->recv_pkt · 541cc48b
      Jakub Kicinski 提交于
      Callers always pass ctx->recv_pkt into decrypt_skb_update(),
      and it propagates it to its callees. This may give someone
      the false impression that those functions can accept any valid
      skb containing a TLS record. That's not the case, the record
      sequence number is read from the context, and they can only
      take the next record coming out of the strp.
      
      Let the functions get the skb from the context instead of
      passing it in. This will also make it cleaner to return
      a different skb than ctx->recv_pkt as the decrypted one
      later on.
      
      Since we're touching the definition of decrypt_skb_update()
      use this as an opportunity to rename it.
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      541cc48b
  3. 15 7月, 2022 1 次提交
  4. 09 7月, 2022 1 次提交
    • J
      tls: create an internal header · 58790314
      Jakub Kicinski 提交于
      include/net/tls.h is getting a little long, and is probably hard
      for driver authors to navigate. Split out the internals into a
      header which will live under net/tls/. While at it move some
      static inlines with a single user into the source files, add
      a few tls_ prefixes and fix spelling of 'proccess'.
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      58790314
  5. 19 5月, 2022 1 次提交
    • B
      tls: Add opt-in zerocopy mode of sendfile() · c1318b39
      Boris Pismenny 提交于
      TLS device offload copies sendfile data to a bounce buffer before
      transmitting. It allows to maintain the valid MAC on TLS records when
      the file contents change and a part of TLS record has to be
      retransmitted on TCP level.
      
      In many common use cases (like serving static files over HTTPS) the file
      contents are not changed on the fly. In many use cases breaking the
      connection is totally acceptable if the file is changed during
      transmission, because it would be received corrupted in any case.
      
      This commit allows to optimize performance for such use cases to
      providing a new optional mode of TLS sendfile(), in which the extra copy
      is skipped. Removing this copy improves performance significantly, as
      TLS and TCP sendfile perform the same operations, and the only overhead
      is TLS header/trailer insertion.
      
      The new mode can only be enabled with the new socket option named
      TLS_TX_ZEROCOPY_SENDFILE on per-socket basis. It preserves backwards
      compatibility with existing applications that rely on the copying
      behavior.
      
      The new mode is safe, meaning that unsolicited modifications of the file
      being sent can't break integrity of the kernel. The worst thing that can
      happen is sending a corrupted TLS record, which is in any case not
      forbidden when using regular TCP sockets.
      
      Sockets other than TLS device offload are not affected by the new socket
      option. The actual status of zerocopy sendfile can be queried with
      sock_diag.
      
      Performance numbers in a single-core test with 24 HTTPS streams on
      nginx, under 100% CPU load:
      
      * non-zerocopy: 33.6 Gbit/s
      * zerocopy: 79.92 Gbit/s
      
      CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
      Signed-off-by: NBoris Pismenny <borisp@nvidia.com>
      Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Link: https://lore.kernel.org/r/20220518092731.1243494-1-maximmi@nvidia.comSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
      c1318b39
  6. 13 5月, 2022 1 次提交
    • M
      tls: Fix context leak on tls_device_down · 3740651b
      Maxim Mikityanskiy 提交于
      The commit cited below claims to fix a use-after-free condition after
      tls_device_down. Apparently, the description wasn't fully accurate. The
      context stayed alive, but ctx->netdev became NULL, and the offload was
      torn down without a proper fallback, so a bug was present, but a
      different kind of bug.
      
      Due to misunderstanding of the issue, the original patch dropped the
      refcount_dec_and_test line for the context to avoid the alleged
      premature deallocation. That line has to be restored, because it matches
      the refcount_inc_not_zero from the same function, otherwise the contexts
      that survived tls_device_down are leaked.
      
      This patch fixes the described issue by restoring refcount_dec_and_test.
      After this change, there is no leak anymore, and the fallback to
      software kTLS still works.
      
      Fixes: c55dcdd4 ("net/tls: Fix use-after-free after the TLS device goes down and up")
      Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
      Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20220512091830.678684-1-maximmi@nvidia.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      3740651b
  7. 28 4月, 2022 1 次提交
  8. 08 4月, 2022 2 次提交
  9. 22 3月, 2022 1 次提交
    • Z
      net/tls: optimize judgement processes in tls_set_device_offload() · b1a6f56b
      Ziyang Xuan 提交于
      It is known that priority setting HW offload when set tls TX/RX offload
      by setsockopt(). Check netdevice whether support NETIF_F_HW_TLS_TX or
      not at the later stages in the whole tls_set_device_offload() process,
      some memory allocations have been done before that. We must release those
      memory and return error if we judge the netdevice not support
      NETIF_F_HW_TLS_TX. It is redundant.
      
      Move NETIF_F_HW_TLS_TX judgement forward, and move start_marker_record
      and offload_ctx memory allocation back slightly. Thus, we can get
      simpler exception handling process.
      Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      b1a6f56b
  10. 08 6月, 2021 1 次提交
  11. 02 6月, 2021 2 次提交
    • M
      net/tls: Fix use-after-free after the TLS device goes down and up · c55dcdd4
      Maxim Mikityanskiy 提交于
      When a netdev with active TLS offload goes down, tls_device_down is
      called to stop the offload and tear down the TLS context. However, the
      socket stays alive, and it still points to the TLS context, which is now
      deallocated. If a netdev goes up, while the connection is still active,
      and the data flow resumes after a number of TCP retransmissions, it will
      lead to a use-after-free of the TLS context.
      
      This commit addresses this bug by keeping the context alive until its
      normal destruction, and implements the necessary fallbacks, so that the
      connection can resume in software (non-offloaded) kTLS mode.
      
      On the TX side tls_sw_fallback is used to encrypt all packets. The RX
      side already has all the necessary fallbacks, because receiving
      non-decrypted packets is supported. The thing needed on the RX side is
      to block resync requests, which are normally produced after receiving
      non-decrypted packets.
      
      The necessary synchronization is implemented for a graceful teardown:
      first the fallbacks are deployed, then the driver resources are released
      (it used to be possible to have a tls_dev_resync after tls_dev_del).
      
      A new flag called TLS_RX_DEV_DEGRADED is added to indicate the fallback
      mode. It's used to skip the RX resync logic completely, as it becomes
      useless, and some objects may be released (for example, resync_async,
      which is allocated and freed by the driver).
      
      Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
      Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
      Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c55dcdd4
    • M
      net/tls: Replace TLS_RX_SYNC_RUNNING with RCU · 05fc8b6c
      Maxim Mikityanskiy 提交于
      RCU synchronization is guaranteed to finish in finite time, unlike a
      busy loop that polls a flag. This patch is a preparation for the bugfix
      in the next patch, where the same synchronize_net() call will also be
      used to sync with the TX datapath.
      Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
      Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05fc8b6c
  12. 28 4月, 2021 1 次提交
  13. 25 3月, 2021 1 次提交
  14. 19 1月, 2021 2 次提交
  15. 02 12月, 2020 1 次提交
  16. 28 11月, 2020 1 次提交
  17. 26 11月, 2020 1 次提交
  18. 18 11月, 2020 1 次提交
  19. 10 10月, 2020 1 次提交
    • R
      net/tls: sendfile fails with ktls offload · ea1dd3e9
      Rohit Maheshwari 提交于
      At first when sendpage gets called, if there is more data, 'more' in
      tls_push_data() gets set which later sets pending_open_record_frags, but
      when there is no more data in file left, and last time tls_push_data()
      gets called, pending_open_record_frags doesn't get reset. And later when
      2 bytes of encrypted alert comes as sendmsg, it first checks for
      pending_open_record_frags, and since this is set, it creates a record with
      0 data bytes to encrypt, meaning record length is prepend_size + tag_size
      only, which causes problem.
       We should set/reset pending_open_record_frags based on more bit.
      
      Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
      Signed-off-by: NRohit Maheshwari <rohitm@chelsio.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      ea1dd3e9
  20. 12 8月, 2020 1 次提交
  21. 28 6月, 2020 2 次提交
  22. 28 5月, 2020 1 次提交
  23. 22 3月, 2020 1 次提交
    • J
      net/tls: Annotate access to sk_prot with READ_ONCE/WRITE_ONCE · d5bee737
      Jakub Sitnicki 提交于
      sockmap performs lockless writes to sk->sk_prot on the following paths:
      
      tcp_bpf_{recvmsg|sendmsg} / sock_map_unref
        sk_psock_put
          sk_psock_drop
            sk_psock_restore_proto
              WRITE_ONCE(sk->sk_prot, proto)
      
      To prevent load/store tearing [1], and to make tooling aware of intentional
      shared access [2], we need to annotate other sites that access sk_prot with
      READ_ONCE/WRITE_ONCE macros.
      
      Change done with Coccinelle with following semantic patch:
      
      @@
      expression E;
      identifier I;
      struct sock *sk;
      identifier sk_prot =~ "^sk_prot$";
      @@
      (
       E =
      -sk->sk_prot
      +READ_ONCE(sk->sk_prot)
      |
      -sk->sk_prot = E
      +WRITE_ONCE(sk->sk_prot, E)
      |
      -sk->sk_prot
      +READ_ONCE(sk->sk_prot)
       ->I
      )
      Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5bee737
  24. 20 2月, 2020 1 次提交
    • R
      net/tls: Fix to avoid gettig invalid tls record · 06f5201c
      Rohit Maheshwari 提交于
      Current code doesn't check if tcp sequence number is starting from (/after)
      1st record's start sequnce number. It only checks if seq number is before
      1st record's end sequnce number. This problem will always be a possibility
      in re-transmit case. If a record which belongs to a requested seq number is
      already deleted, tls_get_record will start looking into list and as per the
      check it will look if seq number is before the end seq of 1st record, which
      will always be true and will return 1st record always, it should in fact
      return NULL.
      As part of the fix, start looking each record only if the sequence number
      lies in the list else return NULL.
      There is one more check added, driver look for the start marker record to
      handle tcp packets which are before the tls offload start sequence number,
      hence return 1st record if the record is tls start marker and seq number is
      before the 1st record's starting sequence number.
      
      Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
      Signed-off-by: NRohit Maheshwari <rohitm@chelsio.com>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06f5201c
  25. 20 12月, 2019 1 次提交
  26. 07 12月, 2019 1 次提交
  27. 07 11月, 2019 2 次提交
    • J
      net/tls: add a TX lock · 79ffe608
      Jakub Kicinski 提交于
      TLS TX needs to release and re-acquire the socket lock if send buffer
      fills up.
      
      TLS SW TX path currently depends on only allowing one thread to enter
      the function by the abuse of sk_write_pending. If another writer is
      already waiting for memory no new ones are allowed in.
      
      This has two problems:
       - writers don't wake other threads up when they leave the kernel;
         meaning that this scheme works for single extra thread (second
         application thread or delayed work) because memory becoming
         available will send a wake up request, but as Mallesham and
         Pooja report with larger number of threads it leads to threads
         being put to sleep indefinitely;
       - the delayed work does not get _scheduled_ but it may _run_ when
         other writers are present leading to crashes as writers don't
         expect state to change under their feet (same records get pushed
         and freed multiple times); it's hard to reliably bail from the
         work, however, because the mere presence of a writer does not
         guarantee that the writer will push pending records before exiting.
      
      Ensuring wakeups always happen will make the code basically open
      code a mutex. Just use a mutex.
      
      The TLS HW TX path does not have any locking (not even the
      sk_write_pending hack), yet it uses a per-socket sg_tx_data
      array to push records.
      
      Fixes: a42055e8 ("net/tls: Add support for async encryption of records for performance")
      Reported-by: NMallesham  Jatharakonda <mallesh537@gmail.com>
      Reported-by: NPooja Trivedi <poojatrivedi@gmail.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79ffe608
    • J
      net/tls: don't pay attention to sk_write_pending when pushing partial records · 02b1fa07
      Jakub Kicinski 提交于
      sk_write_pending being not zero does not guarantee that partial
      record will be pushed. If the thread waiting for memory times out
      the pending record may get stuck.
      
      In case of tls_device there is no path where parial record is
      set and writer present in the first place. Partial record is
      set only in tls_push_sg() and tls_push_sg() will return an
      error immediately. All tls_device callers of tls_push_sg()
      will return (and not wait for memory) if it failed.
      
      Fixes: a42055e8 ("net/tls: Add support for async encryption of records for performance")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02b1fa07
  28. 07 10月, 2019 3 次提交
  29. 06 10月, 2019 3 次提交
  30. 08 9月, 2019 1 次提交