1. 10 6月, 2022 1 次提交
  2. 19 5月, 2022 1 次提交
    • B
      tls: Add opt-in zerocopy mode of sendfile() · c1318b39
      Boris Pismenny 提交于
      TLS device offload copies sendfile data to a bounce buffer before
      transmitting. It allows to maintain the valid MAC on TLS records when
      the file contents change and a part of TLS record has to be
      retransmitted on TCP level.
      
      In many common use cases (like serving static files over HTTPS) the file
      contents are not changed on the fly. In many use cases breaking the
      connection is totally acceptable if the file is changed during
      transmission, because it would be received corrupted in any case.
      
      This commit allows to optimize performance for such use cases to
      providing a new optional mode of TLS sendfile(), in which the extra copy
      is skipped. Removing this copy improves performance significantly, as
      TLS and TCP sendfile perform the same operations, and the only overhead
      is TLS header/trailer insertion.
      
      The new mode can only be enabled with the new socket option named
      TLS_TX_ZEROCOPY_SENDFILE on per-socket basis. It preserves backwards
      compatibility with existing applications that rely on the copying
      behavior.
      
      The new mode is safe, meaning that unsolicited modifications of the file
      being sent can't break integrity of the kernel. The worst thing that can
      happen is sending a corrupted TLS record, which is in any case not
      forbidden when using regular TCP sockets.
      
      Sockets other than TLS device offload are not affected by the new socket
      option. The actual status of zerocopy sendfile can be queried with
      sock_diag.
      
      Performance numbers in a single-core test with 24 HTTPS streams on
      nginx, under 100% CPU load:
      
      * non-zerocopy: 33.6 Gbit/s
      * zerocopy: 79.92 Gbit/s
      
      CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
      Signed-off-by: NBoris Pismenny <borisp@nvidia.com>
      Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Link: https://lore.kernel.org/r/20220518092731.1243494-1-maximmi@nvidia.comSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
      c1318b39
  3. 22 3月, 2022 1 次提交
  4. 26 11月, 2021 1 次提交
    • J
      tls: fix replacing proto_ops · f3911f73
      Jakub Kicinski 提交于
      We replace proto_ops whenever TLS is configured for RX. But our
      replacement also overrides sendpage_locked, which will crash
      unless TX is also configured. Similarly we plug both of those
      in for TLS_HW (NIC crypto offload) even tho TLS_HW has a completely
      different implementation for TX.
      
      Last but not least we always plug in something based on inet_stream_ops
      even though a few of the callbacks differ for IPv6 (getname, release,
      bind).
      
      Use a callback building method similar to what we do for struct proto.
      
      Fixes: c46234eb ("tls: RX path for ktls")
      Fixes: d4ffb02d ("net/tls: enable sk_msg redirect to tls socket egress")
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      f3911f73
  5. 27 10月, 2021 1 次提交
  6. 25 10月, 2021 1 次提交
  7. 16 9月, 2021 1 次提交
  8. 02 6月, 2021 1 次提交
    • M
      net/tls: Fix use-after-free after the TLS device goes down and up · c55dcdd4
      Maxim Mikityanskiy 提交于
      When a netdev with active TLS offload goes down, tls_device_down is
      called to stop the offload and tear down the TLS context. However, the
      socket stays alive, and it still points to the TLS context, which is now
      deallocated. If a netdev goes up, while the connection is still active,
      and the data flow resumes after a number of TCP retransmissions, it will
      lead to a use-after-free of the TLS context.
      
      This commit addresses this bug by keeping the context alive until its
      normal destruction, and implements the necessary fallbacks, so that the
      connection can resume in software (non-offloaded) kTLS mode.
      
      On the TX side tls_sw_fallback is used to encrypt all packets. The RX
      side already has all the necessary fallbacks, because receiving
      non-decrypted packets is supported. The thing needed on the RX side is
      to block resync requests, which are normally produced after receiving
      non-decrypted packets.
      
      The necessary synchronization is implemented for a graceful teardown:
      first the fallbacks are deployed, then the driver resources are released
      (it used to be possible to have a tls_dev_resync after tls_dev_del).
      
      A new flag called TLS_RX_DEV_DEGRADED is added to indicate the fallback
      mode. It's used to skip the RX resync logic completely, as it becomes
      useless, and some objects may be released (for example, resync_async,
      which is allocated and freed by the driver).
      
      Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
      Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
      Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c55dcdd4
  9. 28 11月, 2020 1 次提交
  10. 14 10月, 2020 1 次提交
  11. 02 9月, 2020 1 次提交
  12. 29 7月, 2020 1 次提交
  13. 25 7月, 2020 1 次提交
  14. 16 4月, 2020 1 次提交
    • W
      net: tls: Avoid assigning 'const' pointer to non-const pointer · 9a893949
      Will Deacon 提交于
      tls_build_proto() uses WRITE_ONCE() to assign a 'const' pointer to a
      'non-const' pointer. Cleanups to the implementation of WRITE_ONCE() mean
      that this will give rise to a compiler warning, just like a plain old
      assignment would do:
      
        | net/tls/tls_main.c: In function ‘tls_build_proto’:
        | ./include/linux/compiler.h:229:30: warning: assignment discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
        | net/tls/tls_main.c:640:4: note: in expansion of macro ‘smp_store_release’
        |   640 |    smp_store_release(&saved_tcpv6_prot, prot);
        |       |    ^~~~~~~~~~~~~~~~~
      
      Drop the const qualifier from the local 'prot' variable, as it isn't
      needed.
      
      Cc: Boris Pismenny <borisp@mellanox.com>
      Cc: Aviad Yehezkel <aviadye@mellanox.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NWill Deacon <will@kernel.org>
      9a893949
  15. 09 4月, 2020 1 次提交
    • A
      net/tls: fix const assignment warning · f691a25c
      Arnd Bergmann 提交于
      Building with some experimental patches, I came across a warning
      in the tls code:
      
      include/linux/compiler.h:215:30: warning: assignment discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
        215 |  *(volatile typeof(x) *)&(x) = (val);  \
            |                              ^
      net/tls/tls_main.c:650:4: note: in expansion of macro 'smp_store_release'
        650 |    smp_store_release(&saved_tcpv4_prot, prot);
      
      This appears to be a legitimate warning about assigning a const pointer
      into the non-const 'saved_tcpv4_prot' global. Annotate both the ipv4 and
      ipv6 pointers 'const' to make the code internally consistent.
      
      Fixes: 5bb4c45d ("net/tls: Read sk_prot once when building tls proto ops")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f691a25c
  16. 22 3月, 2020 3 次提交
  17. 22 2月, 2020 1 次提交
    • J
      net, sk_msg: Annotate lockless access to sk_prot on clone · b8e202d1
      Jakub Sitnicki 提交于
      sk_msg and ULP frameworks override protocol callbacks pointer in
      sk->sk_prot, while tcp accesses it locklessly when cloning the listening
      socket, that is with neither sk_lock nor sk_callback_lock held.
      
      Once we enable use of listening sockets with sockmap (and hence sk_msg),
      there will be shared access to sk->sk_prot if socket is getting cloned
      while being inserted/deleted to/from the sockmap from another CPU:
      
      Read side:
      
      tcp_v4_rcv
        sk = __inet_lookup_skb(...)
        tcp_check_req(sk)
          inet_csk(sk)->icsk_af_ops->syn_recv_sock
            tcp_v4_syn_recv_sock
              tcp_create_openreq_child
                inet_csk_clone_lock
                  sk_clone_lock
                    READ_ONCE(sk->sk_prot)
      
      Write side:
      
      sock_map_ops->map_update_elem
        sock_map_update_elem
          sock_map_update_common
            sock_map_link_no_progs
              tcp_bpf_init
                tcp_bpf_update_sk_prot
                  sk_psock_update_proto
                    WRITE_ONCE(sk->sk_prot, ops)
      
      sock_map_ops->map_delete_elem
        sock_map_delete_elem
          __sock_map_delete
           sock_map_unref
             sk_psock_put
               sk_psock_drop
                 sk_psock_restore_proto
                   tcp_update_ulp
                     WRITE_ONCE(sk->sk_prot, proto)
      
      Mark the shared access with READ_ONCE/WRITE_ONCE annotations.
      Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20200218171023.844439-2-jakub@cloudflare.com
      b8e202d1
  18. 16 1月, 2020 1 次提交
  19. 07 12月, 2019 1 次提交
  20. 29 11月, 2019 1 次提交
  21. 20 11月, 2019 1 次提交
  22. 07 11月, 2019 1 次提交
    • J
      net/tls: add a TX lock · 79ffe608
      Jakub Kicinski 提交于
      TLS TX needs to release and re-acquire the socket lock if send buffer
      fills up.
      
      TLS SW TX path currently depends on only allowing one thread to enter
      the function by the abuse of sk_write_pending. If another writer is
      already waiting for memory no new ones are allowed in.
      
      This has two problems:
       - writers don't wake other threads up when they leave the kernel;
         meaning that this scheme works for single extra thread (second
         application thread or delayed work) because memory becoming
         available will send a wake up request, but as Mallesham and
         Pooja report with larger number of threads it leads to threads
         being put to sleep indefinitely;
       - the delayed work does not get _scheduled_ but it may _run_ when
         other writers are present leading to crashes as writers don't
         expect state to change under their feet (same records get pushed
         and freed multiple times); it's hard to reliably bail from the
         work, however, because the mere presence of a writer does not
         guarantee that the writer will push pending records before exiting.
      
      Ensuring wakeups always happen will make the code basically open
      code a mutex. Just use a mutex.
      
      The TLS HW TX path does not have any locking (not even the
      sk_write_pending hack), yet it uses a per-socket sg_tx_data
      array to push records.
      
      Fixes: a42055e8 ("net/tls: Add support for async encryption of records for performance")
      Reported-by: NMallesham  Jatharakonda <mallesh537@gmail.com>
      Reported-by: NPooja Trivedi <poojatrivedi@gmail.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79ffe608
  23. 06 10月, 2019 2 次提交
  24. 05 10月, 2019 6 次提交
  25. 05 9月, 2019 2 次提交
  26. 01 9月, 2019 2 次提交
  27. 16 8月, 2019 1 次提交
  28. 10 8月, 2019 1 次提交
  29. 06 8月, 2019 1 次提交
  30. 22 7月, 2019 1 次提交
    • J
      bpf: sockmap/tls, close can race with map free · 95fa1454
      John Fastabend 提交于
      When a map free is called and in parallel a socket is closed we
      have two paths that can potentially reset the socket prot ops, the
      bpf close() path and the map free path. This creates a problem
      with which prot ops should be used from the socket closed side.
      
      If the map_free side completes first then we want to call the
      original lowest level ops. However, if the tls path runs first
      we want to call the sockmap ops. Additionally there was no locking
      around prot updates in TLS code paths so the prot ops could
      be changed multiple times once from TLS path and again from sockmap
      side potentially leaving ops pointed at either TLS or sockmap
      when psock and/or tls context have already been destroyed.
      
      To fix this race first only update ops inside callback lock
      so that TLS, sockmap and lowest level all agree on prot state.
      Second and a ULP callback update() so that lower layers can
      inform the upper layer when they are being removed allowing the
      upper layer to reset prot ops.
      
      This gets us close to allowing sockmap and tls to be stacked
      in arbitrary order but will save that patch for *next trees.
      
      v4:
       - make sure we don't free things for device;
       - remove the checks which swap the callbacks back
         only if TLS is at the top.
      
      Reported-by: syzbot+06537213db7ba2745c4a@syzkaller.appspotmail.com
      Fixes: 02c558b2 ("bpf: sockmap, support for msg_peek in sk_msg with redirect ingress")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      95fa1454