1. 28 6月, 2014 3 次提交
    • J
      tipc: introduce message evaluation function · 5a379074
      Jon Paul Maloy 提交于
      When a message arrives in a node and finds no destination
      socket, we may need to drop it, reject it, or forward it after
      a secondary destination lookup. The latter two cases currently
      results in a code path that is perceived as complex, because it
      follows a deep call chain via obscure functions such as
      net_route_named_msg() and net_route_msg().
      
      We now introduce a function, tipc_msg_eval(), that takes the
      decision about whether such a message should be rejected or
      forwarded, but leaves it to the caller to actually perform
      the indicated action.
      
      If the decision is 'reject', it is still the task of the recently
      introduced function tipc_msg_reverse() to take the final decision
      about whether the message is rejectable or not. In the latter case
      it drops the message.
      
      As a result of this change, we can finally eliminate the function
      net_route_named_msg(), and hence become independent of net_route_msg().
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a379074
    • J
      tipc: separate building and sending of rejected messages · 8db1bae3
      Jon Paul Maloy 提交于
      The way we build and send rejected message is currenty perceived as
      hard to follow, partly because we let the transmission go via deep
      call chains through functions such as tipc_reject_msg() and
      net_route_msg().
      
      We want to remove those functions, and make the call sequences shallower
      and simpler. For this purpose, we separate building and sending of
      rejected messages. We build the reject message using the new function
      tipc_msg_reverse(), and let the transmission go via the newly introduced
      tipc_link_xmit2() function, as all transmission eventually will do. We
      also ensure that all calls to tipc_link_xmit2() are made outside
      port_lock/bh_lock_sock.
      
      Finally, we replace all calls to tipc_reject_msg() with the two new
      calls at all locations in the code that we want to keep. The remaining
      calls are made from code that we are planning to remove, along with
      tipc_reject_msg() itself.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8db1bae3
    • J
      tipc: use negative error return values in functions · e4de5fab
      Jon Paul Maloy 提交于
      In some places, TIPC functions returns positive integers as return
      codes. This goes against standard Linux coding practice, and may
      even cause problems in some cases.
      
      We now change the return values of the functions filter_rcv()
      and filter_connect() to become signed integers, and return
      negative error codes when needed. The codes we use in these
      particular cases are still TIPC specific, since they are both
      part of the TIPC API and have no correspondence in errno.h
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4de5fab
  2. 12 6月, 2014 1 次提交
  3. 25 5月, 2014 1 次提交
  4. 15 5月, 2014 3 次提交
    • J
      tipc: merge port message reception into socket reception function · 9816f061
      Jon Paul Maloy 提交于
      In order to reduce complexity and save a call level during message
      reception at port/socket level, we remove the function tipc_port_rcv()
      and merge its functionality into tipc_sk_rcv().
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9816f061
    • J
      tipc: compensate for double accounting in socket rcv buffer · 4f4482dc
      Jon Paul Maloy 提交于
      The function net/core/sock.c::__release_sock() runs a tight loop
      to move buffers from the socket backlog queue to the receive queue.
      
      As a security measure, sk_backlog.len of the receiving socket
      is not set to zero until after the loop is finished, i.e., until
      the whole backlog queue has been transferred to the receive queue.
      During this transfer, the data that has already been moved is counted
      both in the backlog queue and the receive queue, hence giving an
      incorrect picture of the available queue space for new arriving buffers.
      
      This leads to unnecessary rejection of buffers by sk_add_backlog(),
      which in TIPC leads to unnecessarily broken connections.
      
      In this commit, we compensate for this double accounting by adding
      a counter that keeps track of it. The function socket.c::backlog_rcv()
      receives buffers one by one from __release_sock(), and adds them to the
      socket receive queue. If the transfer is successful, it increases a new
      atomic counter 'tipc_sock::dupl_rcvcnt' with 'truesize' of the
      transferred buffer. If a new buffer arrives during this transfer and
      finds the socket busy (owned), we attempt to add it to the backlog.
      However, when sk_add_backlog() is called, we adjust the 'limit'
      parameter with the value of the new counter, so that the risk of
      inadvertent rejection is eliminated.
      
      It should be noted that this change does not invalidate the original
      purpose of zeroing 'sk_backlog.len' after the full transfer. We set an
      upper limit for dupl_rcvcnt, so that if a 'wild' sender (i.e., one that
      doesn't respect the send window) keeps pumping in buffers to
      sk_add_backlog(), he will eventually reach an upper limit,
      (2 x TIPC_CONN_OVERLOAD_LIMIT). After that, no messages can be added
      to the backlog, and the connection will be broken. Ordinary, well-
      behaved senders will never reach this buffer limit at all.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f4482dc
    • J
      tipc: decrease connection flow control window · 6163a194
      Jon Paul Maloy 提交于
      Memory overhead when allocating big buffers for data transfer may
      be quite significant. E.g., truesize of a 64 KB buffer turns out
      to be 132 KB, 2 x the requested size.
      
      This invalidates the "worst case" calculation we have been
      using to determine the default socket receive buffer limit,
      which is based on the assumption that 1024x64KB = 67MB buffers
      may be queued up on a socket.
      
      Since TIPC connections cannot survive hitting the buffer limit,
      we have to compensate for this overhead.
      
      We do that in this commit by dividing the fix connection flow
      control window from 1024 (2*512) messages to 512 (2*256). Since
      older version nodes send out acks at 512 message intervals,
      compatibility with such nodes is guaranteed, although performance
      may be non-optimal in such cases.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6163a194
  5. 27 4月, 2014 1 次提交
  6. 12 4月, 2014 1 次提交
    • D
      net: Fix use after free by removing length arg from sk_data_ready callbacks. · 676d2369
      David S. Miller 提交于
      Several spots in the kernel perform a sequence like:
      
      	skb_queue_tail(&sk->s_receive_queue, skb);
      	sk->sk_data_ready(sk, skb->len);
      
      But at the moment we place the SKB onto the socket receive queue it
      can be consumed and freed up.  So this skb->len access is potentially
      to freed up memory.
      
      Furthermore, the skb->len can be modified by the consumer so it is
      possible that the value isn't accurate.
      
      And finally, no actual implementation of this callback actually uses
      the length argument.  And since nobody actually cared about it's
      value, lots of call sites pass arbitrary values in such as '0' and
      even '1'.
      
      So just remove the length argument from the callback, that way there
      is no confusion whatsoever and all of these use-after-free cases get
      fixed as a side effect.
      
      Based upon a patch by Eric Dumazet and his suggestion to audit this
      issue tree-wide.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      676d2369
  7. 08 4月, 2014 1 次提交
  8. 13 3月, 2014 6 次提交
  9. 07 3月, 2014 1 次提交
  10. 22 2月, 2014 1 次提交
    • Y
      tipc: remove all enabled flags from all tipc components · 9fe7ed47
      Ying Xue 提交于
      When tipc module is inserted, many tipc components are initialized
      one by one. During the initialization period, if one of them is
      failed, tipc_core_stop() will be called to stop all components
      whatever corresponding components are created or not. To avoid to
      release uncreated ones, relevant components have to add necessary
      enabled flags indicating whether they are created or not.
      
      But in the initialization stage, if one component is unsuccessfully
      created, we will just destroy successfully created components before
      the failed component instead of all components. All enabled flags
      defined in components, in turn, become redundant. Additionally it's
      also unnecessary to identify whether table.types is NULL in
      tipc_nametbl_stop() because name stable has been definitely created
      successfully when tipc_nametbl_stop() is called.
      
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Cc: Erik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9fe7ed47
  11. 19 2月, 2014 1 次提交
    • Y
      tipc: align tipc function names with common naming practice in the network · 247f0f3c
      Ying Xue 提交于
      Rename the following functions, which are shorter and more in line
      with common naming practice in the network subsystem.
      
      tipc_bclink_send_msg->tipc_bclink_xmit
      tipc_bclink_recv_pkt->tipc_bclink_rcv
      tipc_disc_recv_msg->tipc_disc_rcv
      tipc_link_send_proto_msg->tipc_link_proto_xmit
      link_recv_proto_msg->tipc_link_proto_rcv
      link_send_sections_long->tipc_link_iovec_long_xmit
      tipc_link_send_sections_fast->tipc_link_iovec_xmit_fast
      tipc_link_send_sync->tipc_link_sync_xmit
      tipc_link_recv_sync->tipc_link_sync_rcv
      tipc_link_send_buf->__tipc_link_xmit
      tipc_link_send->tipc_link_xmit
      tipc_link_send_names->tipc_link_names_xmit
      tipc_named_recv->tipc_named_rcv
      tipc_link_recv_bundle->tipc_link_bundle_rcv
      tipc_link_dup_send_queue->tipc_link_dup_queue_xmit
      link_send_long_buf->tipc_link_frag_xmit
      
      tipc_multicast->tipc_port_mcast_xmit
      tipc_port_recv_mcast->tipc_port_mcast_rcv
      tipc_port_reject_sections->tipc_port_iovec_reject
      tipc_port_recv_proto_msg->tipc_port_proto_rcv
      tipc_connect->tipc_port_connect
      __tipc_connect->__tipc_port_connect
      __tipc_disconnect->__tipc_port_disconnect
      tipc_disconnect->tipc_port_disconnect
      tipc_shutdown->tipc_port_shutdown
      tipc_port_recv_msg->tipc_port_rcv
      tipc_port_recv_sections->tipc_port_iovec_rcv
      
      release->tipc_release
      accept->tipc_accept
      bind->tipc_bind
      get_name->tipc_getname
      poll->tipc_poll
      send_msg->tipc_sendmsg
      send_packet->tipc_send_packet
      send_stream->tipc_send_stream
      recv_msg->tipc_recvmsg
      recv_stream->tipc_recv_stream
      connect->tipc_connect
      listen->tipc_listen
      shutdown->tipc_shutdown
      setsockopt->tipc_setsockopt
      getsockopt->tipc_getsockopt
      
      Above changes have no impact on current users of the functions.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      247f0f3c
  12. 19 1月, 2014 1 次提交
  13. 17 1月, 2014 5 次提交
    • Y
      tipc: standardize recvmsg routine · 9bbb4ecc
      Ying Xue 提交于
      Standardize the behaviour of waiting for events in TIPC recvmsg()
      so that all variables of socket or port structures are protected
      within socket lock, allowing the process of calling recvmsg() to
      be woken up at appropriate time.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9bbb4ecc
    • Y
      tipc: standardize sendmsg routine of connected socket · 391a6dd1
      Ying Xue 提交于
      Standardize the behaviour of waiting for events in TIPC send_packet()
      so that all variables of socket or port structures are protected within
      socket lock, allowing the process of calling sendmsg() to be woken up
      at appropriate time.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      391a6dd1
    • Y
      tipc: standardize sendmsg routine of connectionless socket · 3f40504f
      Ying Xue 提交于
      Comparing the behaviour of how to wait for events in TIPC sendmsg()
      with other stacks, the TIPC implementation might be perceived as
      different, and sometimes even incorrect. For instance, sk_sleep()
      and tport->congested variables associated with socket are exposed
      without socket lock protection while wait_event_interruptible_timeout()
      accesses them. So standardizing it with similar implementation
      in other stacks can help us correct these errors which the process
      of calling sendmsg() cannot be woken up event if an expected event
      arrive at socket or improperly woken up although the wake condition
      doesn't match.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f40504f
    • Y
      tipc: standardize accept routine · 6398e23c
      Ying Xue 提交于
      Comparing the behaviour of how to wait for events in TIPC accept()
      with other stacks, the TIPC implementation might be perceived as
      different, and sometimes even incorrect. As sk_sleep() and
      sk->sk_receive_queue variables associated with socket are not
      protected by socket lock, the process of calling accept() may be
      woken up improperly or sometimes cannot be woken up at all. After
      standardizing it with inet_csk_wait_for_connect routine, we can
      get benefits including: avoiding 'thundering herd' phenomenon,
      adding a timeout mechanism for accept(), coping with a pending
      signal, and having sk_sleep() and sk->sk_receive_queue being
      always protected within socket lock scope and so on.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6398e23c
    • Y
      tipc: standardize connect routine · 78eb3a53
      Ying Xue 提交于
      Comparing the behaviour of how to wait for events in TIPC connect()
      with other stacks, the TIPC implementation might be perceived as
      different, and sometimes even incorrect. For instance, as both
      sock->state and sk_sleep() are directly fed to
      wait_event_interruptible_timeout() as its arguments, and socket lock
      has to be released before we call wait_event_interruptible_timeout(),
      the two variables associated with socket are exposed out of socket
      lock protection, thereby probably getting stale values so that the
      process of calling connect() cannot be woken up exactly even if
      correct event arrives or it is woken up improperly even if the wake
      condition is not satisfied in practice. Therefore, standardizing its
      behaviour with sk_stream_wait_connect routine can avoid these risks.
      
      Additionally the implementation of connect routine is simplified as a
      whole, allowing it to return correct values in all different cases.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      78eb3a53
  14. 02 1月, 2014 1 次提交
  15. 30 12月, 2013 1 次提交
    • Y
      tipc: fix deadlock during socket release · 84602761
      Ying Xue 提交于
      A deadlock might occur if name table is withdrawn in socket release
      routine, and while packets are still being received from bearer.
      
             CPU0                       CPU1
      T0:   recv_msg()               release()
      T1:   tipc_recv_msg()          tipc_withdraw()
      T2:   [grab node lock]         [grab port lock]
      T3:   tipc_link_wakeup_ports() tipc_nametbl_withdraw()
      T4:   [grab port lock]*        named_cluster_distribute()
      T5:   wakeupdispatch()         tipc_link_send()
      T6:                            [grab node lock]*
      
      The opposite order of holding port lock and node lock on above two
      different paths may result in a deadlock. If socket lock instead of
      port lock is used to protect port instance in tipc_withdraw(), the
      reverse order of holding port lock and node lock will be eliminated,
      as a result, the deadlock is killed as well.
      Reported-by: NLars Everbrand <lars.everbrand@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84602761
  16. 17 12月, 2013 3 次提交
  17. 21 11月, 2013 1 次提交
    • H
      net: rework recvmsg handler msg_name and msg_namelen logic · f3d33426
      Hannes Frederic Sowa 提交于
      This patch now always passes msg->msg_namelen as 0. recvmsg handlers must
      set msg_namelen to the proper size <= sizeof(struct sockaddr_storage)
      to return msg_name to the user.
      
      This prevents numerous uninitialized memory leaks we had in the
      recvmsg handlers and makes it harder for new code to accidentally leak
      uninitialized memory.
      
      Optimize for the case recvfrom is called with NULL as address. We don't
      need to copy the address at all, so set it to NULL before invoking the
      recvmsg handler. We can do so, because all the recvmsg handlers must
      cope with the case a plain read() is called on them. read() also sets
      msg_name to NULL.
      
      Also document these changes in include/linux/net.h as suggested by David
      Miller.
      
      Changes since RFC:
      
      Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
      non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
      affect sendto as it would bail out earlier while trying to copy-in the
      address. It also more naturally reflects the logic by the callers of
      verify_iovec.
      
      With this change in place I could remove "
      if (!uaddr || msg_sys->msg_namelen == 0)
      	msg->msg_name = NULL
      ".
      
      This change does not alter the user visible error logic as we ignore
      msg_namelen as long as msg_name is NULL.
      
      Also remove two unnecessary curly brackets in ___sys_recvmsg and change
      comments to netdev style.
      
      Cc: David Miller <davem@davemloft.net>
      Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f3d33426
  18. 19 10月, 2013 2 次提交
  19. 31 8月, 2013 1 次提交
    • E
      tipc: set sk_err correctly when connection fails · 2c8d8518
      Erik Hugne 提交于
      Should a connect fail, if the publication/server is unavailable or
      due to some other error, a positive value will be returned and errno
      is never set. If the application code checks for an explicit zero
      return from connect (success) or a negative return (failure), it
      will not catch the error and subsequent send() calls will fail as
      shown from the strace snippet below.
      
      socket(0x1e /* PF_??? */, SOCK_SEQPACKET, 0) = 3
      connect(3, {sa_family=0x1e /* AF_??? */, sa_data="\2\1\322\4\0\0\322\4\0\0\0\0\0\0"}, 16) = 111
      sendto(3, "test", 4, 0, NULL, 0)        = -1 EPIPE (Broken pipe)
      
      The reason for this behaviour is that TIPC wrongly inverts error
      codes set in sk_err.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c8d8518
  20. 18 6月, 2013 5 次提交