1. 02 11月, 2022 1 次提交
  2. 26 10月, 2022 2 次提交
  3. 14 1月, 2022 1 次提交
  4. 15 6月, 2021 1 次提交
    • M
      net/tls: Fix use-after-free after the TLS device goes down and up · aa3905c0
      Maxim Mikityanskiy 提交于
      stable inclusion
      from stable-5.10.43
      commit f1d4184f128dede82a59a841658ed40d4e6d3aa2
      bugzilla: 109284
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit c55dcdd4 ]
      
      When a netdev with active TLS offload goes down, tls_device_down is
      called to stop the offload and tear down the TLS context. However, the
      socket stays alive, and it still points to the TLS context, which is now
      deallocated. If a netdev goes up, while the connection is still active,
      and the data flow resumes after a number of TCP retransmissions, it will
      lead to a use-after-free of the TLS context.
      
      This commit addresses this bug by keeping the context alive until its
      normal destruction, and implements the necessary fallbacks, so that the
      connection can resume in software (non-offloaded) kTLS mode.
      
      On the TX side tls_sw_fallback is used to encrypt all packets. The RX
      side already has all the necessary fallbacks, because receiving
      non-decrypted packets is supported. The thing needed on the RX side is
      to block resync requests, which are normally produced after receiving
      non-decrypted packets.
      
      The necessary synchronization is implemented for a graceful teardown:
      first the fallbacks are deployed, then the driver resources are released
      (it used to be possible to have a tls_dev_resync after tls_dev_del).
      
      A new flag called TLS_RX_DEV_DEGRADED is added to indicate the fallback
      mode. It's used to skip the RX resync logic completely, as it becomes
      useless, and some objects may be released (for example, resync_async,
      which is allocated and freed by the driver).
      
      Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
      Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
      Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      aa3905c0
  5. 14 10月, 2020 1 次提交
  6. 02 9月, 2020 1 次提交
  7. 29 7月, 2020 1 次提交
  8. 25 7月, 2020 1 次提交
  9. 16 4月, 2020 1 次提交
    • W
      net: tls: Avoid assigning 'const' pointer to non-const pointer · 9a893949
      Will Deacon 提交于
      tls_build_proto() uses WRITE_ONCE() to assign a 'const' pointer to a
      'non-const' pointer. Cleanups to the implementation of WRITE_ONCE() mean
      that this will give rise to a compiler warning, just like a plain old
      assignment would do:
      
        | net/tls/tls_main.c: In function ‘tls_build_proto’:
        | ./include/linux/compiler.h:229:30: warning: assignment discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
        | net/tls/tls_main.c:640:4: note: in expansion of macro ‘smp_store_release’
        |   640 |    smp_store_release(&saved_tcpv6_prot, prot);
        |       |    ^~~~~~~~~~~~~~~~~
      
      Drop the const qualifier from the local 'prot' variable, as it isn't
      needed.
      
      Cc: Boris Pismenny <borisp@mellanox.com>
      Cc: Aviad Yehezkel <aviadye@mellanox.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NWill Deacon <will@kernel.org>
      9a893949
  10. 09 4月, 2020 1 次提交
    • A
      net/tls: fix const assignment warning · f691a25c
      Arnd Bergmann 提交于
      Building with some experimental patches, I came across a warning
      in the tls code:
      
      include/linux/compiler.h:215:30: warning: assignment discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
        215 |  *(volatile typeof(x) *)&(x) = (val);  \
            |                              ^
      net/tls/tls_main.c:650:4: note: in expansion of macro 'smp_store_release'
        650 |    smp_store_release(&saved_tcpv4_prot, prot);
      
      This appears to be a legitimate warning about assigning a const pointer
      into the non-const 'saved_tcpv4_prot' global. Annotate both the ipv4 and
      ipv6 pointers 'const' to make the code internally consistent.
      
      Fixes: 5bb4c45d ("net/tls: Read sk_prot once when building tls proto ops")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f691a25c
  11. 22 3月, 2020 3 次提交
  12. 22 2月, 2020 1 次提交
    • J
      net, sk_msg: Annotate lockless access to sk_prot on clone · b8e202d1
      Jakub Sitnicki 提交于
      sk_msg and ULP frameworks override protocol callbacks pointer in
      sk->sk_prot, while tcp accesses it locklessly when cloning the listening
      socket, that is with neither sk_lock nor sk_callback_lock held.
      
      Once we enable use of listening sockets with sockmap (and hence sk_msg),
      there will be shared access to sk->sk_prot if socket is getting cloned
      while being inserted/deleted to/from the sockmap from another CPU:
      
      Read side:
      
      tcp_v4_rcv
        sk = __inet_lookup_skb(...)
        tcp_check_req(sk)
          inet_csk(sk)->icsk_af_ops->syn_recv_sock
            tcp_v4_syn_recv_sock
              tcp_create_openreq_child
                inet_csk_clone_lock
                  sk_clone_lock
                    READ_ONCE(sk->sk_prot)
      
      Write side:
      
      sock_map_ops->map_update_elem
        sock_map_update_elem
          sock_map_update_common
            sock_map_link_no_progs
              tcp_bpf_init
                tcp_bpf_update_sk_prot
                  sk_psock_update_proto
                    WRITE_ONCE(sk->sk_prot, ops)
      
      sock_map_ops->map_delete_elem
        sock_map_delete_elem
          __sock_map_delete
           sock_map_unref
             sk_psock_put
               sk_psock_drop
                 sk_psock_restore_proto
                   tcp_update_ulp
                     WRITE_ONCE(sk->sk_prot, proto)
      
      Mark the shared access with READ_ONCE/WRITE_ONCE annotations.
      Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20200218171023.844439-2-jakub@cloudflare.com
      b8e202d1
  13. 16 1月, 2020 1 次提交
  14. 07 12月, 2019 1 次提交
  15. 29 11月, 2019 1 次提交
  16. 20 11月, 2019 1 次提交
  17. 07 11月, 2019 1 次提交
    • J
      net/tls: add a TX lock · 79ffe608
      Jakub Kicinski 提交于
      TLS TX needs to release and re-acquire the socket lock if send buffer
      fills up.
      
      TLS SW TX path currently depends on only allowing one thread to enter
      the function by the abuse of sk_write_pending. If another writer is
      already waiting for memory no new ones are allowed in.
      
      This has two problems:
       - writers don't wake other threads up when they leave the kernel;
         meaning that this scheme works for single extra thread (second
         application thread or delayed work) because memory becoming
         available will send a wake up request, but as Mallesham and
         Pooja report with larger number of threads it leads to threads
         being put to sleep indefinitely;
       - the delayed work does not get _scheduled_ but it may _run_ when
         other writers are present leading to crashes as writers don't
         expect state to change under their feet (same records get pushed
         and freed multiple times); it's hard to reliably bail from the
         work, however, because the mere presence of a writer does not
         guarantee that the writer will push pending records before exiting.
      
      Ensuring wakeups always happen will make the code basically open
      code a mutex. Just use a mutex.
      
      The TLS HW TX path does not have any locking (not even the
      sk_write_pending hack), yet it uses a per-socket sg_tx_data
      array to push records.
      
      Fixes: a42055e8 ("net/tls: Add support for async encryption of records for performance")
      Reported-by: NMallesham  Jatharakonda <mallesh537@gmail.com>
      Reported-by: NPooja Trivedi <poojatrivedi@gmail.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79ffe608
  18. 06 10月, 2019 2 次提交
  19. 05 10月, 2019 6 次提交
  20. 05 9月, 2019 2 次提交
  21. 01 9月, 2019 2 次提交
  22. 16 8月, 2019 1 次提交
  23. 10 8月, 2019 1 次提交
  24. 06 8月, 2019 1 次提交
  25. 22 7月, 2019 5 次提交
    • J
      bpf: sockmap/tls, close can race with map free · 95fa1454
      John Fastabend 提交于
      When a map free is called and in parallel a socket is closed we
      have two paths that can potentially reset the socket prot ops, the
      bpf close() path and the map free path. This creates a problem
      with which prot ops should be used from the socket closed side.
      
      If the map_free side completes first then we want to call the
      original lowest level ops. However, if the tls path runs first
      we want to call the sockmap ops. Additionally there was no locking
      around prot updates in TLS code paths so the prot ops could
      be changed multiple times once from TLS path and again from sockmap
      side potentially leaving ops pointed at either TLS or sockmap
      when psock and/or tls context have already been destroyed.
      
      To fix this race first only update ops inside callback lock
      so that TLS, sockmap and lowest level all agree on prot state.
      Second and a ULP callback update() so that lower layers can
      inform the upper layer when they are being removed allowing the
      upper layer to reset prot ops.
      
      This gets us close to allowing sockmap and tls to be stacked
      in arbitrary order but will save that patch for *next trees.
      
      v4:
       - make sure we don't free things for device;
       - remove the checks which swap the callbacks back
         only if TLS is at the top.
      
      Reported-by: syzbot+06537213db7ba2745c4a@syzkaller.appspotmail.com
      Fixes: 02c558b2 ("bpf: sockmap, support for msg_peek in sk_msg with redirect ingress")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      95fa1454
    • J
      net/tls: fix transition through disconnect with close · 32857cf5
      John Fastabend 提交于
      It is possible (via shutdown()) for TCP socks to go through TCP_CLOSE
      state via tcp_disconnect() without actually calling tcp_close which
      would then call the tls close callback. Because of this a user could
      disconnect a socket then put it in a LISTEN state which would break
      our assumptions about sockets always being ESTABLISHED state.
      
      More directly because close() can call unhash() and unhash is
      implemented by sockmap if a sockmap socket has TLS enabled we can
      incorrectly destroy the psock from unhash() and then call its close
      handler again. But because the psock (sockmap socket representation)
      is already destroyed we call close handler in sk->prot. However,
      in some cases (TLS BASE/BASE case) this will still point at the
      sockmap close handler resulting in a circular call and crash reported
      by syzbot.
      
      To fix both above issues implement the unhash() routine for TLS.
      
      v4:
       - add note about tls offload still needing the fix;
       - move sk_proto to the cold cache line;
       - split TX context free into "release" and "free",
         otherwise the GC work itself is in already freed
         memory;
       - more TX before RX for consistency;
       - reuse tls_ctx_free();
       - schedule the GC work after we're done with context
         to avoid UAF;
       - don't set the unhash in all modes, all modes "inherit"
         TLS_BASE's callbacks anyway;
       - disable the unhash hook for TLS_HW.
      
      Fixes: 3c4d7559 ("tls: kernel TLS support")
      Reported-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      32857cf5
    • J
      net/tls: remove sock unlock/lock around strp_done() · 313ab004
      John Fastabend 提交于
      The tls close() callback currently drops the sock lock to call
      strp_done(). Split up the RX cleanup into stopping the strparser
      and releasing most resources, syncing strparser and finally
      freeing the context.
      
      To avoid the need for a strp_done() call on the cleanup path
      of device offload make sure we don't arm the strparser until
      we are sure init will be successful.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      313ab004
    • J
      net/tls: remove close callback sock unlock/lock around TX work flush · f87e62d4
      John Fastabend 提交于
      The tls close() callback currently drops the sock lock, makes a
      cancel_delayed_work_sync() call, and then relocks the sock.
      
      By restructuring the code we can avoid droping lock and then
      reclaiming it. To simplify this we do the following,
      
       tls_sk_proto_close
       set_bit(CLOSING)
       set_bit(SCHEDULE)
       cancel_delay_work_sync() <- cancel workqueue
       lock_sock(sk)
       ...
       release_sock(sk)
       strp_done()
      
      Setting the CLOSING bit prevents the SCHEDULE bit from being
      cleared by any workqueue items e.g. if one happens to be
      scheduled and run between when we set SCHEDULE bit and cancel
      work. Then because SCHEDULE bit is set now no new work will
      be scheduled.
      
      Tested with net selftests and bpf selftests.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      f87e62d4
    • J
      net/tls: don't call tls_sk_proto_close for hw record offload · ac78fc14
      Jakub Kicinski 提交于
      The deprecated TOE offload doesn't actually do anything in
      tls_sk_proto_close() - all TLS code is skipped and context
      not freed. Remove the callback to make it easier to refactor
      tls_sk_proto_close().
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      ac78fc14