1. 06 10月, 2015 2 次提交
    • S
      RDS: use rds_send_xmit() state instead of RDS_LL_SEND_FULL · db6526dc
      Santosh Shilimkar 提交于
      In Transport indepedent rds_sendmsg(), we shouldn't make decisions based
      on RDS_LL_SEND_FULL which is used to manage the ring for RDMA based
      transports. We can safely issue rds_send_xmit() and the using its
      return value take decision on deferred work. This will also fix
      the scenario where at times we are seeing connections stuck with
      the LL_SEND_FULL bit getting set and never cleared.
      
      We kick krdsd after any time we see -ENOMEM or -EAGAIN from the
      ring allocation code.
      Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      db6526dc
    • S
      RDS: defer the over_batch work to send worker · 4bebdd7a
      Santosh Shilimkar 提交于
      Current process gives up if its send work over the batch limit.
      The work queue will get  kicked to finish off any other requests.
      This fixes remainder condition from commit 443be0e5 ("RDS: make
      sure not to loop forever inside rds_send_xmit").
      
      The restart condition is only for the case where we reached to
      over_batch code for some other reason so just retrying again
      before giving up.
      
      While at it, make sure we use already available 'send_batch_count'
      parameter instead of magic value. The batch count threshold value
      of 1024 came via commit 443be0e5 ("RDS: make sure not to loop
      forever inside rds_send_xmit"). The idea is to process as big a
      batch as we can but at the same time we don't hold other waiting
      processes for send. Hence back-off after the send_batch_count
      limit (1024) to avoid soft-lock ups.
      Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      4bebdd7a
  2. 01 10月, 2015 4 次提交
  3. 10 9月, 2015 1 次提交
    • S
      RDS: verify the underlying transport exists before creating a connection · 74e98eb0
      Sasha Levin 提交于
      There was no verification that an underlying transport exists when creating
      a connection, this would cause dereferencing a NULL ptr.
      
      It might happen on sockets that weren't properly bound before attempting to
      send a message, which will cause a NULL ptr deref:
      
      [135546.047719] kasan: GPF could be caused by NULL-ptr deref or user memory accessgeneral protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
      [135546.051270] Modules linked in:
      [135546.051781] CPU: 4 PID: 15650 Comm: trinity-c4 Not tainted 4.2.0-next-20150902-sasha-00041-gbaa1222-dirty #2527
      [135546.053217] task: ffff8800835bc000 ti: ffff8800bc708000 task.ti: ffff8800bc708000
      [135546.054291] RIP: __rds_conn_create (net/rds/connection.c:194)
      [135546.055666] RSP: 0018:ffff8800bc70fab0  EFLAGS: 00010202
      [135546.056457] RAX: dffffc0000000000 RBX: 0000000000000f2c RCX: ffff8800835bc000
      [135546.057494] RDX: 0000000000000007 RSI: ffff8800835bccd8 RDI: 0000000000000038
      [135546.058530] RBP: ffff8800bc70fb18 R08: 0000000000000001 R09: 0000000000000000
      [135546.059556] R10: ffffed014d7a3a23 R11: ffffed014d7a3a21 R12: 0000000000000000
      [135546.060614] R13: 0000000000000001 R14: ffff8801ec3d0000 R15: 0000000000000000
      [135546.061668] FS:  00007faad4ffb700(0000) GS:ffff880252000000(0000) knlGS:0000000000000000
      [135546.062836] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [135546.063682] CR2: 000000000000846a CR3: 000000009d137000 CR4: 00000000000006a0
      [135546.064723] Stack:
      [135546.065048]  ffffffffafe2055c ffffffffafe23fc1 ffffed00493097bf ffff8801ec3d0008
      [135546.066247]  0000000000000000 00000000000000d0 0000000000000000 ac194a24c0586342
      [135546.067438]  1ffff100178e1f78 ffff880320581b00 ffff8800bc70fdd0 ffff880320581b00
      [135546.068629] Call Trace:
      [135546.069028] ? __rds_conn_create (include/linux/rcupdate.h:856 net/rds/connection.c:134)
      [135546.069989] ? rds_message_copy_from_user (net/rds/message.c:298)
      [135546.071021] rds_conn_create_outgoing (net/rds/connection.c:278)
      [135546.071981] rds_sendmsg (net/rds/send.c:1058)
      [135546.072858] ? perf_trace_lock (include/trace/events/lock.h:38)
      [135546.073744] ? lockdep_init (kernel/locking/lockdep.c:3298)
      [135546.074577] ? rds_send_drop_to (net/rds/send.c:976)
      [135546.075508] ? __might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3795)
      [135546.076349] ? __might_fault (mm/memory.c:3795)
      [135546.077179] ? rds_send_drop_to (net/rds/send.c:976)
      [135546.078114] sock_sendmsg (net/socket.c:611 net/socket.c:620)
      [135546.078856] SYSC_sendto (net/socket.c:1657)
      [135546.079596] ? SYSC_connect (net/socket.c:1628)
      [135546.080510] ? trace_dump_stack (kernel/trace/trace.c:1926)
      [135546.081397] ? ring_buffer_unlock_commit (kernel/trace/ring_buffer.c:2479 kernel/trace/ring_buffer.c:2558 kernel/trace/ring_buffer.c:2674)
      [135546.082390] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
      [135546.083410] ? trace_event_raw_event_sys_enter (include/trace/events/syscalls.h:16)
      [135546.084481] ? do_audit_syscall_entry (include/trace/events/syscalls.h:16)
      [135546.085438] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
      [135546.085515] rds_ib_laddr_check(): addr 36.74.25.172 ret -99 node type -1
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      74e98eb0
  4. 06 9月, 2015 1 次提交
  5. 31 8月, 2015 4 次提交
  6. 26 8月, 2015 21 次提交
  7. 08 8月, 2015 2 次提交
  8. 04 8月, 2015 1 次提交
  9. 15 7月, 2015 1 次提交
    • W
      rds: rds_ib_device.refcount overflow · 4fabb594
      Wengang Wang 提交于
      Fixes: 3e0249f9 ("RDS/IB: add refcount tracking to struct rds_ib_device")
      
      There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr
      failed(mr pool running out). this lead to the refcount overflow.
      
      A complain in line 117(see following) is seen. From vmcore:
      s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is -2147475448.
      That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely
      to return ERR_PTR(-EAGAIN).
      
      115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev)
      116 {
      117         BUG_ON(atomic_read(&rds_ibdev->refcount) <= 0);
      118         if (atomic_dec_and_test(&rds_ibdev->refcount))
      119                 queue_work(rds_wq, &rds_ibdev->free_work);
      120 }
      
      fix is to drop refcount when rds_ib_alloc_fmr failed.
      Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
      Reviewed-by: NHaggai Eran <haggaie@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      4fabb594
  10. 04 7月, 2015 1 次提交
  11. 22 6月, 2015 1 次提交
  12. 13 6月, 2015 1 次提交