1. 24 7月, 2018 1 次提交
  2. 11 4月, 2018 1 次提交
  3. 24 2月, 2018 1 次提交
  4. 22 2月, 2018 1 次提交
  5. 17 2月, 2018 2 次提交
    • S
      rds: zerocopy Tx support. · 0cebacce
      Sowmini Varadhan 提交于
      If the MSG_ZEROCOPY flag is specified with rds_sendmsg(), and,
      if the SO_ZEROCOPY socket option has been set on the PF_RDS socket,
      application pages sent down with rds_sendmsg() are pinned.
      
      The pinning uses the accounting infrastructure added by
      Commit a91dbff5 ("sock: ulimit on MSG_ZEROCOPY pages")
      
      The payload bytes in the message may not be modified for the
      duration that the message has been pinned. A multi-threaded
      application using this infrastructure may thus need to be notified
      about send-completion so that it can free/reuse the buffers
      passed to rds_sendmsg(). Notification of send-completion will
      identify each message-buffer by a cookie that the application
      must specify as ancillary data to rds_sendmsg().
      The ancillary data in this case has cmsg_level == SOL_RDS
      and cmsg_type == RDS_CMSG_ZCOPY_COOKIE.
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0cebacce
    • S
      rds: hold a sock ref from rds_message to the rds_sock · ea8994cb
      Sowmini Varadhan 提交于
      The existing model holds a reference from the rds_sock to the
      rds_message, but the rds_message does not itself hold a sock_put()
      on the rds_sock. Instead the m_rs field in the rds_message is
      assigned when the message is queued on the sock, and nulled when
      the message is dequeued from the sock.
      
      We want to be able to notify userspace when the rds_message
      is actually freed (from rds_message_purge(), after the refcounts
      to the rds_message go to 0). At the time that rds_message_purge()
      is called, the message is no longer on the rds_sock retransmit
      queue. Thus the explicit reference for the m_rs is needed to
      send a notification that will signal to userspace that
      it is now safe to free/reuse any pages that may have
      been pinned down for zerocopy.
      
      This patch manages the m_rs assignment in the rds_message with
      the necessary refcount book-keeping.
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea8994cb
  6. 09 2月, 2018 1 次提交
    • S
      rds: tcp: use rds_destroy_pending() to synchronize netns/module teardown and... · ebeeb1ad
      Sowmini Varadhan 提交于
      rds: tcp: use rds_destroy_pending() to synchronize netns/module teardown and rds connection/workq management
      
      An rds_connection can get added during netns deletion between lines 528
      and 529 of
      
        506 static void rds_tcp_kill_sock(struct net *net)
        :
        /* code to pull out all the rds_connections that should be destroyed */
        :
        528         spin_unlock_irq(&rds_tcp_conn_lock);
        529         list_for_each_entry_safe(tc, _tc, &tmp_list, t_tcp_node)
        530                 rds_conn_destroy(tc->t_cpath->cp_conn);
      
      Such an rds_connection would miss out the rds_conn_destroy()
      loop (that cancels all pending work) and (if it was scheduled
      after netns deletion) could trigger the use-after-free.
      
      A similar race-window exists for the module unload path
      in rds_tcp_exit -> rds_tcp_destroy_conns
      
      Concurrency with netns deletion (rds_tcp_kill_sock()) must be handled
      by checking check_net() before enqueuing new work or adding new
      connections.
      
      Concurrency with module-unload is handled by maintaining a module
      specific flag that is set at the start of the module exit function,
      and must be checked before enqueuing new work or adding new connections.
      
      This commit refactors existing RDS_DESTROY_PENDING checks added by
      commit 3db6e0d1 ("rds: use RCU to synchronize work-enqueue with
      connection teardown") and consolidates all the concurrency checks
      listed above into the function rds_destroy_pending().
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ebeeb1ad
  7. 06 1月, 2018 1 次提交
  8. 27 12月, 2017 1 次提交
    • A
      RDS: Check cmsg_len before dereferencing CMSG_DATA · 14e138a8
      Avinash Repaka 提交于
      RDS currently doesn't check if the length of the control message is
      large enough to hold the required data, before dereferencing the control
      message data. This results in following crash:
      
      BUG: KASAN: stack-out-of-bounds in rds_rdma_bytes net/rds/send.c:1013
      [inline]
      BUG: KASAN: stack-out-of-bounds in rds_sendmsg+0x1f02/0x1f90
      net/rds/send.c:1066
      Read of size 8 at addr ffff8801c928fb70 by task syzkaller455006/3157
      
      CPU: 0 PID: 3157 Comm: syzkaller455006 Not tainted 4.15.0-rc3+ #161
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:53
       print_address_description+0x73/0x250 mm/kasan/report.c:252
       kasan_report_error mm/kasan/report.c:351 [inline]
       kasan_report+0x25b/0x340 mm/kasan/report.c:409
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
       rds_rdma_bytes net/rds/send.c:1013 [inline]
       rds_sendmsg+0x1f02/0x1f90 net/rds/send.c:1066
       sock_sendmsg_nosec net/socket.c:628 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:638
       ___sys_sendmsg+0x320/0x8b0 net/socket.c:2018
       __sys_sendmmsg+0x1ee/0x620 net/socket.c:2108
       SYSC_sendmmsg net/socket.c:2139 [inline]
       SyS_sendmmsg+0x35/0x60 net/socket.c:2134
       entry_SYSCALL_64_fastpath+0x1f/0x96
      RIP: 0033:0x43fe49
      RSP: 002b:00007fffbe244ad8 EFLAGS: 00000217 ORIG_RAX: 0000000000000133
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fe49
      RDX: 0000000000000001 RSI: 000000002020c000 RDI: 0000000000000003
      RBP: 00000000006ca018 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000217 R12: 00000000004017b0
      R13: 0000000000401840 R14: 0000000000000000 R15: 0000000000000000
      
      To fix this, we verify that the cmsg_len is large enough to hold the
      data to be read, before proceeding further.
      Reported-by: Nsyzbot <syzkaller-bugs@googlegroups.com>
      Signed-off-by: NAvinash Repaka <avinash.repaka@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14e138a8
  9. 08 9月, 2017 1 次提交
  10. 06 9月, 2017 1 次提交
  11. 21 7月, 2017 1 次提交
  12. 22 6月, 2017 1 次提交
  13. 17 6月, 2017 1 次提交
  14. 03 1月, 2017 4 次提交
  15. 18 11月, 2016 1 次提交
    • S
      RDS: TCP: Track peer's connection generation number · 905dd418
      Sowmini Varadhan 提交于
      The RDS transport has to be able to distinguish between
      two types of failure events:
      (a) when the transport fails (e.g., TCP connection reset)
          but the RDS socket/connection layer on both sides stays
          the same
      (b) when the peer's RDS layer itself resets (e.g., due to module
          reload or machine reboot at the peer)
      In case (a) both sides must reconnect and continue the RDS messaging
      without any message loss or disruption to the message sequence numbers,
      and this is achieved by rds_send_path_reset().
      
      In case (b) we should reset all rds_connection state to the
      new incarnation of the peer. Examples of state that needs to
      be reset are next expected rx sequence number from, or messages to be
      retransmitted to, the new incarnation of the peer.
      
      To achieve this, the RDS handshake probe added as part of
      commit 5916e2c1 ("RDS: TCP: Enable multipath RDS for TCP")
      is enhanced so that sender and receiver of the RDS ping-probe
      will add a generation number as part of the RDS_EXTHDR_GEN_NUM
      extension header. Each peer stores local and remote generation
      numbers as part of each rds_connection. Changes in generation
      number will be detected via incoming handshake probe ping
      request or response and will allow the receiver to reset rds_connection
      state.
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      905dd418
  16. 16 7月, 2016 1 次提交
  17. 02 7月, 2016 1 次提交
  18. 15 6月, 2016 10 次提交
  19. 08 6月, 2016 1 次提交
  20. 25 11月, 2015 1 次提交
  21. 19 10月, 2015 1 次提交
    • S
      RDS: fix rds-ping deadlock over TCP transport · 7b4b0009
      santosh.shilimkar@oracle.com 提交于
      Sowmini found hang with rds-ping while testing RDS over TCP. Its
      a corner case and doesn't happen always. The issue is not reproducible
      with IB transport. Its clear from below dump why we see it with RDS TCP.
      
       [<ffffffff8153b7e5>] do_tcp_setsockopt+0xb5/0x740
       [<ffffffff8153bec4>] tcp_setsockopt+0x24/0x30
       [<ffffffff814d57d4>] sock_common_setsockopt+0x14/0x20
       [<ffffffffa096071d>] rds_tcp_xmit_prepare+0x5d/0x70 [rds_tcp]
       [<ffffffffa093b5f7>] rds_send_xmit+0xd7/0x740 [rds]
       [<ffffffffa093bda2>] rds_send_pong+0x142/0x180 [rds]
       [<ffffffffa0939d34>] rds_recv_incoming+0x274/0x330 [rds]
       [<ffffffff810815ae>] ? ttwu_queue+0x11e/0x130
       [<ffffffff814dcacd>] ? skb_copy_bits+0x6d/0x2c0
       [<ffffffffa0960350>] rds_tcp_data_recv+0x2f0/0x3d0 [rds_tcp]
       [<ffffffff8153d836>] tcp_read_sock+0x96/0x1c0
       [<ffffffffa0960060>] ? rds_tcp_recv_init+0x40/0x40 [rds_tcp]
       [<ffffffff814d6a90>] ? sock_def_write_space+0xa0/0xa0
       [<ffffffffa09604d1>] rds_tcp_data_ready+0xa1/0xf0 [rds_tcp]
       [<ffffffff81545249>] tcp_data_queue+0x379/0x5b0
       [<ffffffffa0960cdb>] ? rds_tcp_write_space+0xbb/0x110 [rds_tcp]
       [<ffffffff81547fd2>] tcp_rcv_established+0x2e2/0x6e0
       [<ffffffff81552602>] tcp_v4_do_rcv+0x122/0x220
       [<ffffffff81553627>] tcp_v4_rcv+0x867/0x880
       [<ffffffff8152e0b3>] ip_local_deliver_finish+0xa3/0x220
      
      This happens because rds_send_xmit() chain wants to take
      sock_lock which is already taken by tcp_v4_rcv() on its
      way to rds_tcp_data_ready(). Commit db6526dc ("RDS: use
      rds_send_xmit() state instead of RDS_LL_SEND_FULL") which
      was trying to opportunistically finish the send request
      in same thread context.
      
      But because of above recursive lock hang with RDS TCP,
      the send work from rds_send_pong() needs to deferred to
      worker to avoid lock up. Given RDS ping is more of connectivity
      test than performance critical path, its should be ok even
      for transport like IB.
      Reported-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Acked-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b4b0009
  22. 06 10月, 2015 3 次提交
  23. 26 8月, 2015 3 次提交