1. 24 8月, 2014 4 次提交
    • J
      tipc: eliminate port_connect()/port_disconnect() functions · dadebc00
      Jon Paul Maloy 提交于
      tipc_port_connect()/tipc_port_disconnect() are remnants of the obsolete
      native API. Their only task is to grab port_lock and call the functions
      __tipc_port_connect()/__tipc_port_disconnect() respectively, which will
      perform the actual state change.
      
      Since socket/port exection now is single-threaded the use of port_lock
      is not needed any more, so we can safely replace the two functions with
      their lock-free counterparts.
      
      In this commit, we remove the two functions. Furthermore, the contents
      of __tipc_port_disconnect() is so trivial that we choose to eliminate
      that function too, expanding its functionality into tipc_shutdown().
      __tipc_port_connect() is simplified, moved to socket.c, and given the
      more correct name tipc_sk_finish_conn(). Finally, we eliminate the
      function auto_connect(), and expand its contents into filter_connect().
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dadebc00
    • J
      tipc: eliminate function tipc_port_shutdown() · 80e44c22
      Jon Paul Maloy 提交于
      tipc_port_shutdown() is a remnant from the now obsolete native
      interface. As such it grabs port_lock in order to protect itself
      from concurrent BH processing.
      
      However, after the recent changes to the port/socket upcalls, sockets
      are now basically single-threaded, and all execution, except the read-only
      tipc_sk_timer(), is executing within the protection of lock_sock(). So
      the use of port_lock is not needed here.
      
      In this commit we eliminate the whole function, and merge it into its
      only caller, tipc_shutdown().
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80e44c22
    • J
      tipc: clean up socket timer function · 57289015
      Jon Paul Maloy 提交于
      The last remaining BH upcall to the socket, apart for the message
      reception function tipc_sk_rcv(), is the timer function.
      
      We prefer to let this function continue executing in BH, since it only
      does read-acces to semi-permanent data, but we make three changes to it:
      
      1) We introduce a bh_lock_sock()/bh_unlock_sock() inside the scope
         of port_lock.  This is a preparation for replacing port_lock with
         bh_lock_sock() at the locations where it is still used.
      
      2) We move the function from port.c to socket.c, as a further step
         of eliminating the port code level altogether.
      
      3) We let it make use of the newly introduced tipc_msg_create()
         function. This enables us to get rid of three context specific
         functions (port_create_self_abort_msg() etc.) in port.c
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      57289015
    • J
      tipc: use pseudo message to wake up sockets after link congestion · 50100a5e
      Jon Paul Maloy 提交于
      The current link implementation keeps a linked list of blocked ports/
      sockets that is populated when there is link congestion. The purpose
      of this is to let the link know which users to wake up when the
      congestion abates.
      
      This adds unnecessary complexity to the data structure and the code,
      since it forces us to involve the link each time we want to delete
      a socket. It also forces us to grab the spinlock port_lock within
      the scope of node_lock. We want to get rid of this direct dependence,
      as well as the deadlock hazard resulting from the usage of port_lock.
      
      In this commit, we instead let the link keep list of a "wakeup" pseudo
      messages for use in such situations. Those messages are sent to the
      pending sockets via the ordinary message reception path, and wake up
      the socket's owner when they are received.
      
      This enables us to get rid of the 'waiting_ports' linked lists in struct
      tipc_port that manifest this direct reference. As a consequence, we can
      eliminate another BH entry into the socket, and hence the need to grab
      port_lock. This is a further step in our effort to remove port_lock
      altogether.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50100a5e
  2. 17 8月, 2014 1 次提交
  3. 30 7月, 2014 1 次提交
  4. 21 7月, 2014 1 次提交
  5. 17 7月, 2014 3 次提交
  6. 16 7月, 2014 1 次提交
  7. 09 7月, 2014 1 次提交
  8. 28 6月, 2014 8 次提交
    • J
      tipc: simplify connection congestion handling · 60120526
      Jon Paul Maloy 提交于
      As a consequence of the recently introduced serialized access
      to the socket in commit 8d94168a761819d10252bab1f8de6d7b202c3baa
      ("tipc: same receive code path for connection protocol and data
      messages") we can make a number of simplifications in the
      detection and handling of connection congestion situations.
      
      - We don't need to keep two counters, one for sent messages and one
        for acked messages. There is no longer any risk for races between
        acknowledge messages arriving in BH and data message sending
        running in user context. So we merge this into one counter,
        'sent_unacked', which is incremented at sending and subtracted
        from at acknowledge reception.
      
      - We don't need to set the 'congested' field in tipc_port to
        true before we sent the message, and clear it when sending
        is successful. (As a matter of fact, it was never necessary;
        the field was set in link_schedule_port() before any wakeup
        could arrive anyway.)
      
      - We keep the conditions for link congestion and connection connection
        congestion separated. There would otherwise be a risk that an arriving
        acknowledge message may wake up a user sleeping because of link
        congestion.
      
      - We can simplify reception of acknowledge messages.
      
      We also make some cosmetic/structural changes:
      
      - We rename the 'congested' field to the more correct 'link_cong´.
      
      - We rename 'conn_unacked' to 'rcv_unacked'
      
      - We move the above mentioned fields from struct tipc_port to
        struct tipc_sock.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60120526
    • J
      tipc: clean up connection protocol reception function · ac0074ee
      Jon Paul Maloy 提交于
      We simplify the code for receiving connection probes, leveraging the
      recently introduced tipc_msg_reverse() function. We also stick to
      the principle of sending a possible response message directly from
      the calling (tipc_sk_rcv or backlog_rcv) functions, hence making
      the call chain shallower and easier to follow.
      
      We make one small protocol change here, allowed according to
      the spec. If a protocol message arrives from a remote socket that
      is not the one we are connected to, we are currently generating a
      connection abort message and send it to the source. This behavior
      is unnecessary, and might even be a security risk, so instead we
      now choose to only ignore the message. The consequnce for the sender
      is that he will need longer time to discover his mistake (until the
      next timeout), but this is an extreme corner case, and may happen
      anyway under other circumstances, so we deem this change acceptable.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac0074ee
    • J
      tipc: same receive code path for connection protocol and data messages · ec8a2e56
      Jon Paul Maloy 提交于
      As a preparation to eliminate port_lock we need to bring reception
      of connection protocol messages under proper protection of bh_lock_sock
      or socket owner.
      
      We fix this by letting those messages follow the same code path as
      incoming data messages.
      
      As a side effect of this change, the last reference to the function
      net_route_msg() disappears, and we can eliminate that function.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec8a2e56
    • J
      tipc: connection oriented transport uses new send functions · 4ccfe5e0
      Jon Paul Maloy 提交于
      We move the message sending across established connections
      to use the message preparation and send functions introduced
      earlier in this series. We now do the message preparation
      and call to the link send function directly from the socket,
      instead of going via the port layer.
      
      As a consequence of this change, the functions tipc_send(),
      tipc_port_iovec_rcv(), tipc_port_iovec_reject() and tipc_reject_msg()
      become unreferenced and can be eliminated from port.c. For the same
      reason, the functions tipc_link_xmit_fast(), tipc_link_iovec_xmit_long()
      and tipc_link_iovec_fast() can be eliminated from link.c.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ccfe5e0
    • J
      tipc: RDM/DGRAM transport uses new fragmenting and sending functions · e2dafe87
      Jon Paul Maloy 提交于
      We merge the code for sending port name and port identity addressed
      messages into the corresponding send functions in socket.c, and start
      using the new fragmenting and transmit functions we just have introduced.
      
      This saves a call level and quite a few code lines, as well as making
      this part of the code easier to follow. As a consequence, the functions
      tipc_send2name() and tipc_send2port() in port.c can be removed.
      
      For practical reasons, we break out the code for sending multicast messages
      from tipc_sendmsg() and move it into a separate function, tipc_sendmcast(),
      but we do not yet convert it into using the new build/send functions.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2dafe87
    • J
      tipc: introduce message evaluation function · 5a379074
      Jon Paul Maloy 提交于
      When a message arrives in a node and finds no destination
      socket, we may need to drop it, reject it, or forward it after
      a secondary destination lookup. The latter two cases currently
      results in a code path that is perceived as complex, because it
      follows a deep call chain via obscure functions such as
      net_route_named_msg() and net_route_msg().
      
      We now introduce a function, tipc_msg_eval(), that takes the
      decision about whether such a message should be rejected or
      forwarded, but leaves it to the caller to actually perform
      the indicated action.
      
      If the decision is 'reject', it is still the task of the recently
      introduced function tipc_msg_reverse() to take the final decision
      about whether the message is rejectable or not. In the latter case
      it drops the message.
      
      As a result of this change, we can finally eliminate the function
      net_route_named_msg(), and hence become independent of net_route_msg().
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a379074
    • J
      tipc: separate building and sending of rejected messages · 8db1bae3
      Jon Paul Maloy 提交于
      The way we build and send rejected message is currenty perceived as
      hard to follow, partly because we let the transmission go via deep
      call chains through functions such as tipc_reject_msg() and
      net_route_msg().
      
      We want to remove those functions, and make the call sequences shallower
      and simpler. For this purpose, we separate building and sending of
      rejected messages. We build the reject message using the new function
      tipc_msg_reverse(), and let the transmission go via the newly introduced
      tipc_link_xmit2() function, as all transmission eventually will do. We
      also ensure that all calls to tipc_link_xmit2() are made outside
      port_lock/bh_lock_sock.
      
      Finally, we replace all calls to tipc_reject_msg() with the two new
      calls at all locations in the code that we want to keep. The remaining
      calls are made from code that we are planning to remove, along with
      tipc_reject_msg() itself.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8db1bae3
    • J
      tipc: use negative error return values in functions · e4de5fab
      Jon Paul Maloy 提交于
      In some places, TIPC functions returns positive integers as return
      codes. This goes against standard Linux coding practice, and may
      even cause problems in some cases.
      
      We now change the return values of the functions filter_rcv()
      and filter_connect() to become signed integers, and return
      negative error codes when needed. The codes we use in these
      particular cases are still TIPC specific, since they are both
      part of the TIPC API and have no correspondence in errno.h
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4de5fab
  9. 12 6月, 2014 1 次提交
  10. 25 5月, 2014 1 次提交
  11. 15 5月, 2014 3 次提交
    • J
      tipc: merge port message reception into socket reception function · 9816f061
      Jon Paul Maloy 提交于
      In order to reduce complexity and save a call level during message
      reception at port/socket level, we remove the function tipc_port_rcv()
      and merge its functionality into tipc_sk_rcv().
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9816f061
    • J
      tipc: compensate for double accounting in socket rcv buffer · 4f4482dc
      Jon Paul Maloy 提交于
      The function net/core/sock.c::__release_sock() runs a tight loop
      to move buffers from the socket backlog queue to the receive queue.
      
      As a security measure, sk_backlog.len of the receiving socket
      is not set to zero until after the loop is finished, i.e., until
      the whole backlog queue has been transferred to the receive queue.
      During this transfer, the data that has already been moved is counted
      both in the backlog queue and the receive queue, hence giving an
      incorrect picture of the available queue space for new arriving buffers.
      
      This leads to unnecessary rejection of buffers by sk_add_backlog(),
      which in TIPC leads to unnecessarily broken connections.
      
      In this commit, we compensate for this double accounting by adding
      a counter that keeps track of it. The function socket.c::backlog_rcv()
      receives buffers one by one from __release_sock(), and adds them to the
      socket receive queue. If the transfer is successful, it increases a new
      atomic counter 'tipc_sock::dupl_rcvcnt' with 'truesize' of the
      transferred buffer. If a new buffer arrives during this transfer and
      finds the socket busy (owned), we attempt to add it to the backlog.
      However, when sk_add_backlog() is called, we adjust the 'limit'
      parameter with the value of the new counter, so that the risk of
      inadvertent rejection is eliminated.
      
      It should be noted that this change does not invalidate the original
      purpose of zeroing 'sk_backlog.len' after the full transfer. We set an
      upper limit for dupl_rcvcnt, so that if a 'wild' sender (i.e., one that
      doesn't respect the send window) keeps pumping in buffers to
      sk_add_backlog(), he will eventually reach an upper limit,
      (2 x TIPC_CONN_OVERLOAD_LIMIT). After that, no messages can be added
      to the backlog, and the connection will be broken. Ordinary, well-
      behaved senders will never reach this buffer limit at all.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f4482dc
    • J
      tipc: decrease connection flow control window · 6163a194
      Jon Paul Maloy 提交于
      Memory overhead when allocating big buffers for data transfer may
      be quite significant. E.g., truesize of a 64 KB buffer turns out
      to be 132 KB, 2 x the requested size.
      
      This invalidates the "worst case" calculation we have been
      using to determine the default socket receive buffer limit,
      which is based on the assumption that 1024x64KB = 67MB buffers
      may be queued up on a socket.
      
      Since TIPC connections cannot survive hitting the buffer limit,
      we have to compensate for this overhead.
      
      We do that in this commit by dividing the fix connection flow
      control window from 1024 (2*512) messages to 512 (2*256). Since
      older version nodes send out acks at 512 message intervals,
      compatibility with such nodes is guaranteed, although performance
      may be non-optimal in such cases.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6163a194
  12. 27 4月, 2014 1 次提交
  13. 12 4月, 2014 1 次提交
    • D
      net: Fix use after free by removing length arg from sk_data_ready callbacks. · 676d2369
      David S. Miller 提交于
      Several spots in the kernel perform a sequence like:
      
      	skb_queue_tail(&sk->s_receive_queue, skb);
      	sk->sk_data_ready(sk, skb->len);
      
      But at the moment we place the SKB onto the socket receive queue it
      can be consumed and freed up.  So this skb->len access is potentially
      to freed up memory.
      
      Furthermore, the skb->len can be modified by the consumer so it is
      possible that the value isn't accurate.
      
      And finally, no actual implementation of this callback actually uses
      the length argument.  And since nobody actually cared about it's
      value, lots of call sites pass arbitrary values in such as '0' and
      even '1'.
      
      So just remove the length argument from the callback, that way there
      is no confusion whatsoever and all of these use-after-free cases get
      fixed as a side effect.
      
      Based upon a patch by Eric Dumazet and his suggestion to audit this
      issue tree-wide.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      676d2369
  14. 08 4月, 2014 1 次提交
  15. 13 3月, 2014 6 次提交
  16. 07 3月, 2014 1 次提交
  17. 22 2月, 2014 1 次提交
    • Y
      tipc: remove all enabled flags from all tipc components · 9fe7ed47
      Ying Xue 提交于
      When tipc module is inserted, many tipc components are initialized
      one by one. During the initialization period, if one of them is
      failed, tipc_core_stop() will be called to stop all components
      whatever corresponding components are created or not. To avoid to
      release uncreated ones, relevant components have to add necessary
      enabled flags indicating whether they are created or not.
      
      But in the initialization stage, if one component is unsuccessfully
      created, we will just destroy successfully created components before
      the failed component instead of all components. All enabled flags
      defined in components, in turn, become redundant. Additionally it's
      also unnecessary to identify whether table.types is NULL in
      tipc_nametbl_stop() because name stable has been definitely created
      successfully when tipc_nametbl_stop() is called.
      
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Cc: Erik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9fe7ed47
  18. 19 2月, 2014 1 次提交
    • Y
      tipc: align tipc function names with common naming practice in the network · 247f0f3c
      Ying Xue 提交于
      Rename the following functions, which are shorter and more in line
      with common naming practice in the network subsystem.
      
      tipc_bclink_send_msg->tipc_bclink_xmit
      tipc_bclink_recv_pkt->tipc_bclink_rcv
      tipc_disc_recv_msg->tipc_disc_rcv
      tipc_link_send_proto_msg->tipc_link_proto_xmit
      link_recv_proto_msg->tipc_link_proto_rcv
      link_send_sections_long->tipc_link_iovec_long_xmit
      tipc_link_send_sections_fast->tipc_link_iovec_xmit_fast
      tipc_link_send_sync->tipc_link_sync_xmit
      tipc_link_recv_sync->tipc_link_sync_rcv
      tipc_link_send_buf->__tipc_link_xmit
      tipc_link_send->tipc_link_xmit
      tipc_link_send_names->tipc_link_names_xmit
      tipc_named_recv->tipc_named_rcv
      tipc_link_recv_bundle->tipc_link_bundle_rcv
      tipc_link_dup_send_queue->tipc_link_dup_queue_xmit
      link_send_long_buf->tipc_link_frag_xmit
      
      tipc_multicast->tipc_port_mcast_xmit
      tipc_port_recv_mcast->tipc_port_mcast_rcv
      tipc_port_reject_sections->tipc_port_iovec_reject
      tipc_port_recv_proto_msg->tipc_port_proto_rcv
      tipc_connect->tipc_port_connect
      __tipc_connect->__tipc_port_connect
      __tipc_disconnect->__tipc_port_disconnect
      tipc_disconnect->tipc_port_disconnect
      tipc_shutdown->tipc_port_shutdown
      tipc_port_recv_msg->tipc_port_rcv
      tipc_port_recv_sections->tipc_port_iovec_rcv
      
      release->tipc_release
      accept->tipc_accept
      bind->tipc_bind
      get_name->tipc_getname
      poll->tipc_poll
      send_msg->tipc_sendmsg
      send_packet->tipc_send_packet
      send_stream->tipc_send_stream
      recv_msg->tipc_recvmsg
      recv_stream->tipc_recv_stream
      connect->tipc_connect
      listen->tipc_listen
      shutdown->tipc_shutdown
      setsockopt->tipc_setsockopt
      getsockopt->tipc_getsockopt
      
      Above changes have no impact on current users of the functions.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      247f0f3c
  19. 19 1月, 2014 1 次提交
  20. 17 1月, 2014 2 次提交