1. 22 6月, 2015 1 次提交
  2. 13 6月, 2015 1 次提交
  3. 02 6月, 2015 1 次提交
    • W
      rds: re-entry of rds_ib_xmit/rds_iw_xmit · d655a9fb
      Wengang Wang 提交于
      The BUG_ON at line 452/453 is triggered in function rds_send_xmit.
      
       441                         while (ret) {
       442                                 tmp = min_t(int, ret, sg->length -
       443                                                       conn->c_xmit_data_off);
       444                                 conn->c_xmit_data_off += tmp;
       445                                 ret -= tmp;
       446                                 if (conn->c_xmit_data_off == sg->length) {
       447                                         conn->c_xmit_data_off = 0;
       448                                         sg++;
       449                                         conn->c_xmit_sg++;
       450                                         if (ret != 0 && conn->c_xmit_sg == rm->data.op_nents)
       451                                                 printk(KERN_ERR "conn %p rm %p sg %p ret %d\n", conn, rm, sg, ret);
       452                                         BUG_ON(ret != 0 &&
       453                                                conn->c_xmit_sg == rm->data.op_nents);
       454                                 }
       455                         }
      
      it is complaining the total sent length is bigger that we want to send.
      
      rds_ib_xmit() is wrong for the second entry for the same rds_message returning
      wrong value.
      
      the sg and off passed by rds_send_xmit to rds_ib_xmit is based on
      scatterlist.offset/length, but the rds_ib_xmit action is based on
      scatterlist.dma_address/dma_length. in case dma_length is larger than length
      there is problem. for the 2nd and later entries of rds_ib_xmit for same
      rds_message, at least one of the following two is wrong:
      
      1) the scatterlist to start with,  the choosen one can far beyond the correct
         one.
      2) the offset to start with within the scatterlist.
      
      fix:
      add op_dmasg and op_dmaoff to rm_data_op structure indicating the scatterlist
      and offset within the it to start with for rds_ib_xmit respectively. op_dmasg
      and op_dmaoff are initialized to zero when doing dma mapping for the first see
      of the message and are changed when filling send slots.
      
      the same applies to rds_iw_xmit too.
      Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      d655a9fb
  4. 01 6月, 2015 3 次提交
  5. 19 5月, 2015 1 次提交
  6. 11 5月, 2015 1 次提交
  7. 10 5月, 2015 2 次提交
    • S
      net/rds: RDS-TCP: only initiate reconnect attempt on outgoing TCP socket. · c82ac7e6
      Sowmini Varadhan 提交于
      When the peer of an RDS-TCP connection restarts, a reconnect
      attempt should only be made from the active side  of the TCP
      connection, i.e. the side that has a transient TCP port
      number. Do not add the passive side of the TCP connection
      to the c_hash_node and thus avoid triggering rds_queue_reconnect()
      for passive rds connections.
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c82ac7e6
    • S
      net/rds: RDS-TCP: Always create a new rds_sock for an incoming connection. · f711a6ae
      Sowmini Varadhan 提交于
      When running RDS over TCP, the active (client) side connects to the
      listening ("passive") side at the RDS_TCP_PORT.  After the connection
      is established, if the client side reboots (potentially without even
      sending a FIN) the server still has a TCP socket in the esablished
      state.  If the server-side now gets a new SYN comes from the client
      with a different client port, TCP will create a new socket-pair, but
      the RDS layer will incorrectly pull up the old rds_connection (which
      is still associated with the stale t_sock and RDS socket state).
      
      This patch corrects this behavior by having rds_tcp_accept_one()
      always create a new connection for an incoming TCP SYN.
      The rds and tcp state associated with the old socket-pair is cleaned
      up via the rds_tcp_state_change() callback which would typically be
      invoked in most cases when the client-TCP sends a FIN on TCP restart,
      triggering a transition to CLOSE_WAIT state. In the rarer event of client
      death without a FIN, TCP_KEEPALIVE probes on the socket will detect
      the stale socket, and the TCP transition to CLOSE state will trigger
      the RDS state cleanup.
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f711a6ae
  8. 05 5月, 2015 1 次提交
    • D
      net/rds: Fix new sparse warning · e2783717
      David Ahern 提交于
      c0adf54a introduced new sparse warnings:
        CHECK   /home/dahern/kernels/linux.git/net/rds/ib_cm.c
      net/rds/ib_cm.c:191:34: warning: incorrect type in initializer (different base types)
      net/rds/ib_cm.c:191:34:    expected unsigned long long [unsigned] [usertype] dp_ack_seq
      net/rds/ib_cm.c:191:34:    got restricted __be64 <noident>
      net/rds/ib_cm.c:194:51: warning: cast to restricted __be64
      
      The temporary variable for sequence number should have been declared as __be64
      rather than u64. Make it so.
      Signed-off-by: NDavid Ahern <david.ahern@oracle.com>
      Cc: shamir rabinovitch <shamir.rabinovitch@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2783717
  9. 04 5月, 2015 1 次提交
    • S
      net/rds: fix unaligned memory access · c0adf54a
      shamir rabinovitch 提交于
      rdma_conn_param private data is copied using memcpy after headers such
      as cma_hdr (see cma_resolve_ib_udp as example). so the start of the
      private data is aligned to the end of the structure that come before. if
      this structure end with u32 the meaning is that the start of the private
      data will be 4 bytes aligned. structures that use u8/u16/u32/u64 are
      naturally aligned but in case the structure start is not 8 bytes aligned,
      all u64 members of this structure will not be aligned. to solve this issue
      we must use special macros that allow unaligned access to those
      unaligned members.
      
      Addresses the following kernel log seen when attempting to use RDMA:
      
      Kernel unaligned access at TPC[10507a88] rds_ib_cm_connect_complete+0x1bc/0x1e0 [rds_rdma]
      Acked-by: NChien Yen <chien.yen@oracle.com>
      Signed-off-by: Nshamir rabinovitch <shamir.rabinovitch@oracle.com>
      [Minor tweaks for top of tree by:]
      Signed-off-by: NDavid Ahern <david.ahern@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0adf54a
  10. 09 4月, 2015 2 次提交
    • S
      RDS: make sure not to loop forever inside rds_send_xmit · 443be0e5
      Sowmini Varadhan 提交于
      If a determined set of concurrent senders keep the send queue full,
      we can loop forever inside rds_send_xmit.  This fix has two parts.
      
      First we are dropping out of the while(1) loop after we've processed a
      large batch of messages.
      
      Second we add a generation number that gets bumped each time the
      xmit bit lock is acquired.  If someone else has jumped in and
      made progress in the queue, we skip our goto restart.
      
      Original patch by Chris Mason.
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      443be0e5
    • S
      RDS: only use passive connections when addresses match · 1789b2c0
      Sowmini Varadhan 提交于
      Passive connections were added for the case where one loopback IB
      connection between identical addresses needs another connection to store
      the second QP.  Unfortunately, they were also created in the case where
      the addesses differ and we already have both QPs.
      
      This lead to a message reordering bug.
      
      - two different IB interfaces and addresses on a machine: A B
      - traffic is sent from A to B
      - connection from A-B is created, connect request sent
      - listening accepts connect request, B-A is created
      - traffic flows, next_rx is incremented
      - unacked messages exist on the retrans list
      - connection A-B is shut down, new connect request sent
      - listen sees existing loopback B-A, creates new passive B-A
      - retrans messages are sent and delivered because of 0 next_rx
      
      The problem is that the second connection request saw the previously
      existing parent connection.  Instead of using it, and using the existing
      next_rx_seq state for the traffic between those IPs, it mistakenly
      thought that it had to create a passive connection.
      
      We fix this by only using passive connections in the special case where
      laddr and faddr match.  In this case we'll only ever have one parent
      sending connection requests and one passive connection created as the
      listening path sees the existing parent connection which initiated the
      request.
      
      Original patch by Zach Brown
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1789b2c0
  11. 12 3月, 2015 1 次提交
    • A
      rds: avoid potential stack overflow · f862e07c
      Arnd Bergmann 提交于
      The rds_iw_update_cm_id function stores a large 'struct rds_sock' object
      on the stack in order to pass a pair of addresses. This happens to just
      fit withint the 1024 byte stack size warning limit on x86, but just
      exceed that limit on ARM, which gives us this warning:
      
      net/rds/iw_rdma.c:200:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=]
      
      As the use of this large variable is basically bogus, we can rearrange
      the code to not do that. Instead of passing an rds socket into
      rds_iw_get_device, we now just pass the two addresses that we have
      available in rds_iw_update_cm_id, and we change rds_iw_get_mr accordingly,
      to create two address structures on the stack there.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f862e07c
  12. 03 3月, 2015 1 次提交
  13. 12 2月, 2015 1 次提交
    • S
      rds: rds_cong_queue_updates needs to defer the congestion update transmission · 80ad0d4a
      Sowmini Varadhan 提交于
      When the RDS transport is TCP, we cannot inline the call to rds_send_xmit
      from rds_cong_queue_update because
      (a) we are already holding the sock_lock in the recv path, and
          will deadlock when tcp_setsockopt/tcp_sendmsg try to get the sock
          lock
      (b) cong_queue_update does an irqsave on the rds_cong_lock, and this
          will trigger warnings (for a good reason) from functions called
          out of sock_lock.
      
      This patch reverts the change introduced by
      2fa57129 ("RDS: Bypass workqueue when queueing cong updates").
      
      The patch has been verified for both RDS/TCP as well as RDS/RDMA
      to ensure that there are not regressions for either transport:
       - for verification of  RDS/TCP a client-server unit-test was used,
         with the server blocked in gdb and thus unable to drain its rcvbuf,
         eventually triggering a RDS congestion update.
       - for RDS/RDMA, the standard IB regression tests were used
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80ad0d4a
  14. 08 2月, 2015 2 次提交
  15. 05 2月, 2015 1 次提交
  16. 16 12月, 2014 1 次提交
  17. 11 12月, 2014 1 次提交
  18. 10 12月, 2014 1 次提交
  19. 24 11月, 2014 2 次提交
  20. 15 10月, 2014 1 次提交
  21. 04 10月, 2014 3 次提交
    • H
      net/rds: fix possible double free on sock tear down · 593cbb3e
      Herton R. Krzesinski 提交于
      I got a report of a double free happening at RDS slab cache. One
      suspicion was that may be somewhere we were doing a sock_hold/sock_put
      on an already freed sock. Thus after providing a kernel with the
      following change:
      
       static inline void sock_hold(struct sock *sk)
       {
      -       atomic_inc(&sk->sk_refcnt);
      +       if (!atomic_inc_not_zero(&sk->sk_refcnt))
      +               WARN(1, "Trying to hold sock already gone: %p (family: %hd)\n",
      +                       sk, sk->sk_family);
       }
      
      The warning successfuly triggered:
      
      Trying to hold sock already gone: ffff81f6dda61280 (family: 21)
      WARNING: at include/net/sock.h:350 sock_hold()
      Call Trace:
      <IRQ>  [<ffffffff8adac135>] :rds:rds_send_remove_from_sock+0xf0/0x21b
      [<ffffffff8adad35c>] :rds:rds_send_drop_acked+0xbf/0xcf
      [<ffffffff8addf546>] :rds_rdma:rds_ib_recv_tasklet_fn+0x256/0x2dc
      [<ffffffff8009899a>] tasklet_action+0x8f/0x12b
      [<ffffffff800125a2>] __do_softirq+0x89/0x133
      [<ffffffff8005f30c>] call_softirq+0x1c/0x28
      [<ffffffff8006e644>] do_softirq+0x2c/0x7d
      [<ffffffff8006e4d4>] do_IRQ+0xee/0xf7
      [<ffffffff8005e625>] ret_from_intr+0x0/0xa
      <EOI>
      
      Looking at the call chain above, the only way I think this would be
      possible is if somewhere we already released the same socket->sock which
      is assigned to the rds_message at rds_send_remove_from_sock. Which seems
      only possible to happen after the tear down done on rds_release.
      
      rds_release properly calls rds_send_drop_to to drop the socket from any
      rds_message, and some proper synchronization is in place to avoid race
      with rds_send_drop_acked/rds_send_remove_from_sock. However, I still see
      a very narrow window where it may be possible we touch a sock already
      released: when rds_release races with rds_send_drop_acked, we check
      RDS_MSG_ON_CONN to avoid cleanup on the same rds_message, but in this
      specific case we don't clear rm->m_rs. In this case, it seems we could
      then go on at rds_send_drop_to and after it returns, the sock is freed
      by last sock_put on rds_release, with concurrently we being at
      rds_send_remove_from_sock; then at some point in the loop at
      rds_send_remove_from_sock we process an rds_message which didn't have
      rm->m_rs unset for a freed sock, and a possible sock_hold on an sock
      already gone at rds_release happens.
      
      This hopefully address the described condition above and avoids a double
      free on "second last" sock_put. In addition, I removed the comment about
      socket destruction on top of rds_send_drop_acked: we call rds_send_drop_to
      in rds_release and we should have things properly serialized there, thus
      I can't see the comment being accurate there.
      Signed-off-by: NHerton R. Krzesinski <herton@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      593cbb3e
    • H
      net/rds: do proper house keeping if connection fails in rds_tcp_conn_connect · eb74cc97
      Herton R. Krzesinski 提交于
      I see two problems if we consider the sock->ops->connect attempt to fail in
      rds_tcp_conn_connect. The first issue is that for example we don't remove the
      previously added rds_tcp_connection item to rds_tcp_tc_list at
      rds_tcp_set_callbacks, which means that on a next reconnect attempt for the
      same rds_connection, when rds_tcp_conn_connect is called we can again call
      rds_tcp_set_callbacks, resulting in duplicated items on rds_tcp_tc_list,
      leading to list corruption: to avoid this just make sure we call
      properly rds_tcp_restore_callbacks before we exit. The second issue
      is that we should also release the sock properly, by setting sock = NULL
      only if we are returning without error.
      Signed-off-by: NHerton R. Krzesinski <herton@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb74cc97
    • H
  22. 28 8月, 2014 1 次提交
  23. 27 8月, 2014 1 次提交
  24. 31 5月, 2014 2 次提交
  25. 19 5月, 2014 1 次提交
  26. 10 5月, 2014 1 次提交
  27. 18 4月, 2014 1 次提交
  28. 12 4月, 2014 1 次提交
    • D
      net: Fix use after free by removing length arg from sk_data_ready callbacks. · 676d2369
      David S. Miller 提交于
      Several spots in the kernel perform a sequence like:
      
      	skb_queue_tail(&sk->s_receive_queue, skb);
      	sk->sk_data_ready(sk, skb->len);
      
      But at the moment we place the SKB onto the socket receive queue it
      can be consumed and freed up.  So this skb->len access is potentially
      to freed up memory.
      
      Furthermore, the skb->len can be modified by the consumer so it is
      possible that the value isn't accurate.
      
      And finally, no actual implementation of this callback actually uses
      the length argument.  And since nobody actually cared about it's
      value, lots of call sites pass arbitrary values in such as '0' and
      even '1'.
      
      So just remove the length argument from the callback, that way there
      is no confusion whatsoever and all of these use-after-free cases get
      fixed as a side effect.
      
      Based upon a patch by Eric Dumazet and his suggestion to audit this
      issue tree-wide.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      676d2369
  29. 01 4月, 2014 1 次提交
  30. 19 1月, 2014 1 次提交
  31. 18 1月, 2014 1 次提交
    • G
      net: rds: fix per-cpu helper usage · c196403b
      Gerald Schaefer 提交于
      commit ae4b46e9 "net: rds: use this_cpu_* per-cpu helper" broke per-cpu
      handling for rds. chpfirst is the result of __this_cpu_read(), so it is
      an absolute pointer and not __percpu. Therefore, __this_cpu_write()
      should not operate on chpfirst, but rather on cache->percpu->first, just
      like __this_cpu_read() did before.
      
      Cc: <stable@vger.kernel.org> # 3.8+
      Signed-off-byd Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c196403b
反馈
建议
客服 返回
顶部