1. 02 5月, 2019 1 次提交
    • J
      net/tls: fix refcount adjustment in fallback · e97f0bc7
      Jakub Kicinski 提交于
      [ Upstream commit 9188d5ca454fd665145904267e726e9e8d122f5c ]
      
      Unlike atomic_add(), refcount_add() does not deal well
      with a negative argument.  TLS fallback code reallocates
      the skb and is very likely to shrink the truesize, leading to:
      
      [  189.513254] WARNING: CPU: 5 PID: 0 at lib/refcount.c:81 refcount_add_not_zero_checked+0x15c/0x180
       Call Trace:
        refcount_add_checked+0x6/0x40
        tls_enc_skb+0xb93/0x13e0 [tls]
      
      Once wmem_allocated count saturates the application can no longer
      send data on the socket.  This is similar to Eric's fixes for GSO,
      TCP:
      commit 7ec318fe ("tcp: gso: avoid refcount_t warning from tcp_gso_segment()")
      and UDP:
      commit 575b65bc ("udp: avoid refcount_t saturation in __udp_gso_segment()").
      
      Unlike the GSO case, for TLS fallback it's likely that the skb has
      shrunk, so the "likely" annotation is the other way around (likely
      branch being "sub").
      
      Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NJohn Hurley <john.hurley@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e97f0bc7
  2. 13 1月, 2019 1 次提交
  3. 10 1月, 2019 1 次提交
    • G
      net/tls: allocate tls context using GFP_ATOMIC · f624d95c
      Ganesh Goudar 提交于
      [ Upstream commit c6ec179a0082e2e76e3a72050c2b99d3d0f3da3f ]
      
      create_ctx can be called from atomic context, hence use
      GFP_ATOMIC instead of GFP_KERNEL.
      
      [  395.962599] BUG: sleeping function called from invalid context at mm/slab.h:421
      [  395.979896] in_atomic(): 1, irqs_disabled(): 0, pid: 16254, name: openssl
      [  395.996564] 2 locks held by openssl/16254:
      [  396.010492]  #0: 00000000347acb52 (sk_lock-AF_INET){+.+.}, at: do_tcp_setsockopt.isra.44+0x13b/0x9a0
      [  396.029838]  #1: 000000006c9552b5 (device_spinlock){+...}, at: tls_init+0x1d/0x280
      [  396.047675] CPU: 5 PID: 16254 Comm: openssl Tainted: G           O      4.20.0-rc6+ #25
      [  396.066019] Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0c 09/25/2017
      [  396.083537] Call Trace:
      [  396.096265]  dump_stack+0x5e/0x8b
      [  396.109876]  ___might_sleep+0x216/0x250
      [  396.123940]  kmem_cache_alloc_trace+0x1b0/0x240
      [  396.138800]  create_ctx+0x1f/0x60
      [  396.152504]  tls_init+0xbd/0x280
      [  396.166135]  tcp_set_ulp+0x191/0x2d0
      [  396.180035]  ? tcp_set_ulp+0x2c/0x2d0
      [  396.193960]  do_tcp_setsockopt.isra.44+0x148/0x9a0
      [  396.209013]  __sys_setsockopt+0x7c/0xe0
      [  396.223054]  __x64_sys_setsockopt+0x20/0x30
      [  396.237378]  do_syscall_64+0x4a/0x180
      [  396.251200]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: df9d4a178022 ("net/tls: sleeping function from invalid context")
      Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f624d95c
  4. 17 9月, 2018 1 次提交
    • D
      tls: fix currently broken MSG_PEEK behavior · 50c6b58a
      Daniel Borkmann 提交于
      In kTLS MSG_PEEK behavior is currently failing, strace example:
      
        [pid  2430] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
        [pid  2430] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4
        [pid  2430] bind(4, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
        [pid  2430] listen(4, 10)               = 0
        [pid  2430] getsockname(4, {sa_family=AF_INET, sin_port=htons(38855), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0
        [pid  2430] connect(3, {sa_family=AF_INET, sin_port=htons(38855), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
        [pid  2430] setsockopt(3, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
        [pid  2430] setsockopt(3, 0x11a /* SOL_?? */, 1, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
        [pid  2430] accept(4, {sa_family=AF_INET, sin_port=htons(49636), sin_addr=inet_addr("127.0.0.1")}, [16]) = 5
        [pid  2430] setsockopt(5, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
        [pid  2430] setsockopt(5, 0x11a /* SOL_?? */, 2, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
        [pid  2430] close(4)                    = 0
        [pid  2430] sendto(3, "test_read_peek", 14, 0, NULL, 0) = 14
        [pid  2430] sendto(3, "_mult_recs\0", 11, 0, NULL, 0) = 11
        [pid  2430] recvfrom(5, "test_read_peektest_read_peektest"..., 64, MSG_PEEK, NULL, NULL) = 64
      
      As can be seen from strace, there are two TLS records sent,
      i) 'test_read_peek' and ii) '_mult_recs\0' where we end up
      peeking 'test_read_peektest_read_peektest'. This is clearly
      wrong, and what happens is that given peek cannot call into
      tls_sw_advance_skb() to unpause strparser and proceed with
      the next skb, we end up looping over the current one, copying
      the 'test_read_peek' over and over into the user provided
      buffer.
      
      Here, we can only peek into the currently held skb (current,
      full TLS record) as otherwise we would end up having to hold
      all the original skb(s) (depending on the peek depth) in a
      separate queue when unpausing strparser to process next
      records, minimally intrusive is to return only up to the
      current record's size (which likely was what c46234eb
      ("tls: RX path for ktls") originally intended as well). Thus,
      after patch we properly peek the first record:
      
        [pid  2046] wait4(2075,  <unfinished ...>
        [pid  2075] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
        [pid  2075] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4
        [pid  2075] bind(4, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
        [pid  2075] listen(4, 10)               = 0
        [pid  2075] getsockname(4, {sa_family=AF_INET, sin_port=htons(55115), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0
        [pid  2075] connect(3, {sa_family=AF_INET, sin_port=htons(55115), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
        [pid  2075] setsockopt(3, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
        [pid  2075] setsockopt(3, 0x11a /* SOL_?? */, 1, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
        [pid  2075] accept(4, {sa_family=AF_INET, sin_port=htons(45732), sin_addr=inet_addr("127.0.0.1")}, [16]) = 5
        [pid  2075] setsockopt(5, SOL_TCP, 0x1f /* TCP_??? */, [7564404], 4) = 0
        [pid  2075] setsockopt(5, 0x11a /* SOL_?? */, 2, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0
        [pid  2075] close(4)                    = 0
        [pid  2075] sendto(3, "test_read_peek", 14, 0, NULL, 0) = 14
        [pid  2075] sendto(3, "_mult_recs\0", 11, 0, NULL, 0) = 11
        [pid  2075] recvfrom(5, "test_read_peek", 64, MSG_PEEK, NULL, NULL) = 14
      
      Fixes: c46234eb ("tls: RX path for ktls")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50c6b58a
  5. 14 9月, 2018 3 次提交
  6. 09 9月, 2018 1 次提交
    • V
      net/tls: Set count of SG entries if sk_alloc_sg returns -ENOSPC · 52ea992c
      Vakul Garg 提交于
      tls_sw_sendmsg() allocates plaintext and encrypted SG entries using
      function sk_alloc_sg(). In case the number of SG entries hit
      MAX_SKB_FRAGS, sk_alloc_sg() returns -ENOSPC and sets the variable for
      current SG index to '0'. This leads to calling of function
      tls_push_record() with 'sg_encrypted_num_elem = 0' and later causes
      kernel crash. To fix this, set the number of SG elements to the number
      of elements in plaintext/encrypted SG arrays in case sk_alloc_sg()
      returns -ENOSPC.
      
      Fixes: 3c4d7559 ("tls: kernel TLS support")
      Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52ea992c
  7. 23 8月, 2018 1 次提交
    • J
      tls: possible hang when do_tcp_sendpages hits sndbuf is full case · 67db7cd2
      John Fastabend 提交于
      Currently, the lower protocols sk_write_space handler is not called if
      TLS is sending a scatterlist via  tls_push_sg. However, normally
      tls_push_sg calls do_tcp_sendpage, which may be under memory pressure,
      that in turn may trigger a wait via sk_wait_event. Typically, this
      happens when the in-flight bytes exceed the sdnbuf size. In the normal
      case when enough ACKs are received sk_write_space() will be called and
      the sk_wait_event will be woken up allowing it to send more data
      and/or return to the user.
      
      But, in the TLS case because the sk_write_space() handler does not
      wake up the events the above send will wait until the sndtimeo is
      exceeded. By default this is MAX_SCHEDULE_TIMEOUT so it look like a
      hang to the user (especially this impatient user). To fix this pass
      the sk_write_space event to the lower layers sk_write_space event
      which in the TCP case will wake any pending events.
      
      I observed the above while integrating sockmap and ktls. It
      initially appeared as test_sockmap (modified to use ktls) occasionally
      hanging. To reliably reproduce this reduce the sndbuf size and stress
      the tls layer by sending many 1B sends. This results in every byte
      needing a header and each byte individually being sent to the crypto
      layer.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      67db7cd2
  8. 17 8月, 2018 1 次提交
    • D
      tcp, ulp: add alias for all ulp modules · 037b0b86
      Daniel Borkmann 提交于
      Lets not turn the TCP ULP lookup into an arbitrary module loader as
      we only intend to load ULP modules through this mechanism, not other
      unrelated kernel modules:
      
        [root@bar]# cat foo.c
        #include <sys/types.h>
        #include <sys/socket.h>
        #include <linux/tcp.h>
        #include <linux/in.h>
      
        int main(void)
        {
            int sock = socket(PF_INET, SOCK_STREAM, 0);
            setsockopt(sock, IPPROTO_TCP, TCP_ULP, "sctp", sizeof("sctp"));
            return 0;
        }
      
        [root@bar]# gcc foo.c -O2 -Wall
        [root@bar]# lsmod | grep sctp
        [root@bar]# ./a.out
        [root@bar]# lsmod | grep sctp
        sctp                 1077248  4
        libcrc32c              16384  3 nf_conntrack,nf_nat,sctp
        [root@bar]#
      
      Fix it by adding module alias to TCP ULP modules, so probing module
      via request_module() will be limited to tcp-ulp-[name]. The existing
      modules like kTLS will load fine given tcp-ulp-tls alias, but others
      will fail to load:
      
        [root@bar]# lsmod | grep sctp
        [root@bar]# ./a.out
        [root@bar]# lsmod | grep sctp
        [root@bar]#
      
      Sockmap is not affected from this since it's either built-in or not.
      
      Fixes: 734942cc ("tcp: ULP infrastructure")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      037b0b86
  9. 13 8月, 2018 1 次提交
    • V
      net/tls: Combined memory allocation for decryption request · 0b243d00
      Vakul Garg 提交于
      For preparing decryption request, several memory chunks are required
      (aead_req, sgin, sgout, iv, aad). For submitting the decrypt request to
      an accelerator, it is required that the buffers which are read by the
      accelerator must be dma-able and not come from stack. The buffers for
      aad and iv can be separately kmalloced each, but it is inefficient.
      This patch does a combined allocation for preparing decryption request
      and then segments into aead_req || sgin || sgout || iv || aad.
      Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0b243d00
  10. 06 8月, 2018 1 次提交
  11. 03 8月, 2018 1 次提交
  12. 02 8月, 2018 1 次提交
  13. 31 7月, 2018 1 次提交
    • V
      net/tls: Use socket data_ready callback on record availability · ad13acce
      Vakul Garg 提交于
      On receipt of a complete tls record, use socket's saved data_ready
      callback instead of state_change callback. In function tls_queue(),
      the TLS record is queued in encrypted state. But the decryption
      happen inline when tls_sw_recvmsg() or tls_sw_splice_read() get invoked.
      So it should be ok to notify the waiting context about the availability
      of data as soon as we could collect a full TLS record. For new data
      availability notification, sk_data_ready callback is more appropriate.
      It points to sock_def_readable() which wakes up specifically for EPOLLIN
      event. This is in contrast to the socket callback sk_state_change which
      points to sock_def_wakeup() which issues a wakeup unconditionally
      (without event mask).
      Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad13acce
  14. 29 7月, 2018 2 次提交
  15. 27 7月, 2018 2 次提交
  16. 21 7月, 2018 2 次提交
  17. 17 7月, 2018 1 次提交
    • D
      tls: Stricter error checking in zerocopy sendmsg path · 32da1221
      Dave Watson 提交于
      In the zerocopy sendmsg() path, there are error checks to revert
      the zerocopy if we get any error code.  syzkaller has discovered
      that tls_push_record can return -ECONNRESET, which is fatal, and
      happens after the point at which it is safe to revert the iter,
      as we've already passed the memory to do_tcp_sendpages.
      
      Previously this code could return -ENOMEM and we would want to
      revert the iter, but AFAIK this no longer returns ENOMEM after
      a447da7d ("tls: fix waitall behavior in tls_sw_recvmsg"),
      so we fail for all error codes.
      
      Reported-by: syzbot+c226690f7b3126c5ee04@syzkaller.appspotmail.com
      Reported-by: syzbot+709f2810a6a05f11d4d3@syzkaller.appspotmail.com
      Signed-off-by: NDave Watson <davejwatson@fb.com>
      Fixes: 3c4d7559 ("tls: kernel TLS support")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32da1221
  18. 16 7月, 2018 6 次提交
  19. 13 7月, 2018 1 次提交
  20. 03 7月, 2018 1 次提交
  21. 29 6月, 2018 1 次提交
    • L
      Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL · a11e1d43
      Linus Torvalds 提交于
      The poll() changes were not well thought out, and completely
      unexplained.  They also caused a huge performance regression, because
      "->poll()" was no longer a trivial file operation that just called down
      to the underlying file operations, but instead did at least two indirect
      calls.
      
      Indirect calls are sadly slow now with the Spectre mitigation, but the
      performance problem could at least be largely mitigated by changing the
      "->get_poll_head()" operation to just have a per-file-descriptor pointer
      to the poll head instead.  That gets rid of one of the new indirections.
      
      But that doesn't fix the new complexity that is completely unwarranted
      for the regular case.  The (undocumented) reason for the poll() changes
      was some alleged AIO poll race fixing, but we don't make the common case
      slower and more complex for some uncommon special case, so this all
      really needs way more explanations and most likely a fundamental
      redesign.
      
      [ This revert is a revert of about 30 different commits, not reverted
        individually because that would just be unnecessarily messy  - Linus ]
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a11e1d43
  22. 27 6月, 2018 1 次提交
  23. 24 6月, 2018 1 次提交
  24. 16 6月, 2018 2 次提交
    • D
      tls: fix waitall behavior in tls_sw_recvmsg · 06030dba
      Daniel Borkmann 提交于
      Current behavior in tls_sw_recvmsg() is to wait for incoming tls
      messages and copy up to exactly len bytes of data that the user
      provided. This is problematic in the sense that i) if no packet
      is currently queued in strparser we keep waiting until one has been
      processed and pushed into tls receive layer for tls_wait_data() to
      wake up and push the decrypted bits to user space. Given after
      tls decryption, we're back at streaming data, use sock_rcvlowat()
      hint from tcp socket instead. Retain current behavior with MSG_WAITALL
      flag and otherwise use the hint target for breaking the loop and
      returning to application. This is done if currently no ctx->recv_pkt
      is ready, otherwise continue to process it from our strparser
      backlog.
      
      Fixes: c46234eb ("tls: RX path for ktls")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06030dba
    • D
      tls: fix use-after-free in tls_push_record · a447da7d
      Daniel Borkmann 提交于
      syzkaller managed to trigger a use-after-free in tls like the
      following:
      
        BUG: KASAN: use-after-free in tls_push_record.constprop.15+0x6a2/0x810 [tls]
        Write of size 1 at addr ffff88037aa08000 by task a.out/2317
      
        CPU: 3 PID: 2317 Comm: a.out Not tainted 4.17.0+ #144
        Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016
        Call Trace:
         dump_stack+0x71/0xab
         print_address_description+0x6a/0x280
         kasan_report+0x258/0x380
         ? tls_push_record.constprop.15+0x6a2/0x810 [tls]
         tls_push_record.constprop.15+0x6a2/0x810 [tls]
         tls_sw_push_pending_record+0x2e/0x40 [tls]
         tls_sk_proto_close+0x3fe/0x710 [tls]
         ? tcp_check_oom+0x4c0/0x4c0
         ? tls_write_space+0x260/0x260 [tls]
         ? kmem_cache_free+0x88/0x1f0
         inet_release+0xd6/0x1b0
         __sock_release+0xc0/0x240
         sock_close+0x11/0x20
         __fput+0x22d/0x660
         task_work_run+0x114/0x1a0
         do_exit+0x71a/0x2780
         ? mm_update_next_owner+0x650/0x650
         ? handle_mm_fault+0x2f5/0x5f0
         ? __do_page_fault+0x44f/0xa50
         ? mm_fault_error+0x2d0/0x2d0
         do_group_exit+0xde/0x300
         __x64_sys_exit_group+0x3a/0x50
         do_syscall_64+0x9a/0x300
         ? page_fault+0x8/0x30
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      This happened through fault injection where aead_req allocation in
      tls_do_encryption() eventually failed and we returned -ENOMEM from
      the function. Turns out that the use-after-free is triggered from
      tls_sw_sendmsg() in the second tls_push_record(). The error then
      triggers a jump to waiting for memory in sk_stream_wait_memory()
      resp. returning immediately in case of MSG_DONTWAIT. What follows is
      the trim_both_sgl(sk, orig_size), which drops elements from the sg
      list added via tls_sw_sendmsg(). Now the use-after-free gets triggered
      when the socket is being closed, where tls_sk_proto_close() callback
      is invoked. The tls_complete_pending_work() will figure that there's
      a pending closed tls record to be flushed and thus calls into the
      tls_push_pending_closed_record() from there. ctx->push_pending_record()
      is called from the latter, which is the tls_sw_push_pending_record()
      from sw path. This again calls into tls_push_record(). And here the
      tls_fill_prepend() will panic since the buffer address has been freed
      earlier via trim_both_sgl(). One way to fix it is to move the aead
      request allocation out of tls_do_encryption() early into tls_push_record().
      This means we don't prep the tls header and advance state to the
      TLS_PENDING_CLOSED_RECORD before allocation which could potentially
      fail happened. That fixes the issue on my side.
      
      Fixes: 3c4d7559 ("tls: kernel TLS support")
      Reported-by: syzbot+5c74af81c547738e1684@syzkaller.appspotmail.com
      Reported-by: syzbot+709f2810a6a05f11d4d3@syzkaller.appspotmail.com
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a447da7d
  25. 12 6月, 2018 1 次提交
    • D
      tls: fix NULL pointer dereference on poll · f6fadff3
      Daniel Borkmann 提交于
      While hacking on kTLS, I ran into the following panic from an
      unprivileged netserver / netperf TCP session:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
        PGD 800000037f378067 P4D 800000037f378067 PUD 3c0e61067 PMD 0
        Oops: 0010 [#1] SMP KASAN PTI
        CPU: 1 PID: 2289 Comm: netserver Not tainted 4.17.0+ #139
        Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016
        RIP: 0010:          (null)
        Code: Bad RIP value.
        RSP: 0018:ffff88036abcf740 EFLAGS: 00010246
        RAX: dffffc0000000000 RBX: ffff88036f5f6800 RCX: 1ffff1006debed26
        RDX: ffff88036abcf920 RSI: ffff8803cb1a4f00 RDI: ffff8803c258c280
        RBP: ffff8803c258c280 R08: ffff8803c258c280 R09: ffffed006f559d48
        R10: ffff88037aacea43 R11: ffffed006f559d49 R12: ffff8803c258c280
        R13: ffff8803cb1a4f20 R14: 00000000000000db R15: ffffffffc168a350
        FS:  00007f7e631f4700(0000) GS:ffff8803d1c80000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: ffffffffffffffd6 CR3: 00000003ccf64005 CR4: 00000000003606e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         ? tls_sw_poll+0xa4/0x160 [tls]
         ? sock_poll+0x20a/0x680
         ? do_select+0x77b/0x11a0
         ? poll_schedule_timeout.constprop.12+0x130/0x130
         ? pick_link+0xb00/0xb00
         ? read_word_at_a_time+0x13/0x20
         ? vfs_poll+0x270/0x270
         ? deref_stack_reg+0xad/0xe0
         ? __read_once_size_nocheck.constprop.6+0x10/0x10
        [...]
      
      Debugging further, it turns out that calling into ctx->sk_poll() is
      invalid since sk_poll itself is NULL which was saved from the original
      TCP socket in order for tls_sw_poll() to invoke it.
      
      Looks like the recent conversion from poll to poll_mask callback started
      in 15252423 ("net: add support for ->poll_mask in proto_ops") missed
      to eventually convert kTLS, too: TCP's ->poll was converted over to the
      ->poll_mask in commit 2c7d3dac ("net/tcp: convert to ->poll_mask")
      and therefore kTLS wrongly saved the ->poll old one which is now NULL.
      
      Convert kTLS over to use ->poll_mask instead. Also instead of POLLIN |
      POLLRDNORM use the proper EPOLLIN | EPOLLRDNORM bits as the case in
      tcp_poll_mask() as well that is mangled here.
      
      Fixes: 2c7d3dac ("net/tcp: convert to ->poll_mask")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Watson <davejwatson@fb.com>
      Tested-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f6fadff3
  26. 07 6月, 2018 1 次提交
    • D
      strparser: Add __strp_unpause and use it in ktls. · 7170e604
      Doron Roberts-Kedes 提交于
      strp_unpause queues strp_work in order to parse any messages that
      arrived while the strparser was paused. However, the process invoking
      strp_unpause could eagerly parse a buffered message itself if it held
      the sock lock.
      
      __strp_unpause is an alternative to strp_pause that avoids the scheduling
      overhead that results when a receiving thread unpauses the strparser
      and waits for the next message to be delivered by the workqueue thread.
      
      This patch more than doubled the IOPS achieved in a benchmark of NBD
      traffic encrypted using ktls.
      Signed-off-by: NDoron Roberts-Kedes <doronrk@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7170e604
  27. 18 5月, 2018 1 次提交
  28. 11 5月, 2018 1 次提交
  29. 08 5月, 2018 1 次提交