1. 06 2月, 2015 1 次提交
    • J
      tipc: reduce usage of context info in socket and link · c5898636
      Jon Paul Maloy 提交于
      The most common usage of namespace information is when we fetch the
      own node addess from the net structure. This leads to a lot of
      passing around of a parameter of type 'struct net *' between
      functions just to make them able to obtain this address.
      
      However, in many cases this is unnecessary. The own node address
      is readily available as a member of both struct tipc_sock and
      tipc_link, and can be fetched from there instead.
      The fact that the vast majority of functions in socket.c and link.c
      anyway are maintaining a pointer to their respective base structures
      makes this option even more compelling.
      
      In this commit, we introduce the inline functions tsk_own_node()
      and link_own_node() to make it easy for functions to fetch the node
      address from those structs instead of having to pass along and
      dereference the namespace struct.
      
      In particular, we make calls to the msg_xx() functions in msg.{h,c}
      context independent by directly passing them the own node address
      as parameter when needed. Those functions should be regarded as
      leaves in the code dependency tree, and it is hence desirable to
      keep them namspace unaware.
      
      Apart from a potential positive effect on cache behavior, these
      changes make it easier to introduce the changes that will follow
      later in this series.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5898636
  2. 05 2月, 2015 2 次提交
    • J
      tipc: avoid stale link after aborted failover · 7d24dcdb
      Jon Paul Maloy 提交于
      During link failover it may happen that the remaining link goes
      down while it is still in the process of taking over traffic
      from a previously failed link. When this happens, we currently
      abort the failover procedure and reset the first failed link to
      non-failover mode, so that it will be ready to re-establish
      contact with its peer when it comes available.
      
      However, if the first link goes down because its bearer was manually
      disabled, it is not enough to reset it; it must also be deleted;
      which is supposed to happen when the failover procedure is finished.
      Otherwise it will remain a zombie link: attached to the owner node
      structure, in mode LINK_STOPPED, and permanently blocking any re-
      establishing of the link to the peer via the interface in question.
      
      We fix this by amending the failover abort procedure. Apart from
      resetting the link to non-failover state, we test if the link is
      also in LINK_STOPPED mode. If so, we delete it, using the conditional
      tipc_link_delete() function introduced in the previous commit.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d24dcdb
    • J
      tipc: add reference count to struct tipc_link · 2d72d495
      Jon Paul Maloy 提交于
      When a bearer is disabled, all pertaining links will be reset and
      deleted. However, if there is a second active link towards a killed
      link's destination, the delete has to be postponed until the failover
      is finished. During this interval, we currently put the link in zombie
      mode, i.e., we take it out of traffic, delete its timer, but leave it
      attached to the owner node structure until all missing packets have
      been received.  When this is done, we detach the link from its node
      and delete it, assuming that the synchronous timer deletion that was
      initiated earlier in a different thread has finished.
      
      This is unsafe, as the failover may finish before del_timer_sync()
      has returned in the other thread.
      
      We fix this by adding an atomic reference counter of type kref in
      struct tipc_link. The counter keeps track of the references kept
      to the link by the owner node and the timer. We then do a conditional
      delete, based on the reference counter, both after the failover has
      been finished and when the timer expires, if applicable. Whoever
      comes last, will actually delete the link. This approach also implies
      that we can make the deletion of the timer asynchronous.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d72d495
  3. 13 1月, 2015 5 次提交
  4. 27 11月, 2014 7 次提交
  5. 22 11月, 2014 4 次提交
  6. 24 8月, 2014 1 次提交
    • J
      tipc: use pseudo message to wake up sockets after link congestion · 50100a5e
      Jon Paul Maloy 提交于
      The current link implementation keeps a linked list of blocked ports/
      sockets that is populated when there is link congestion. The purpose
      of this is to let the link know which users to wake up when the
      congestion abates.
      
      This adds unnecessary complexity to the data structure and the code,
      since it forces us to involve the link each time we want to delete
      a socket. It also forces us to grab the spinlock port_lock within
      the scope of node_lock. We want to get rid of this direct dependence,
      as well as the deadlock hazard resulting from the usage of port_lock.
      
      In this commit, we instead let the link keep list of a "wakeup" pseudo
      messages for use in such situations. Those messages are sent to the
      pending sockets via the ordinary message reception path, and wake up
      the socket's owner when they are received.
      
      This enables us to get rid of the 'waiting_ports' linked lists in struct
      tipc_port that manifest this direct reference. As a consequence, we can
      eliminate another BH entry into the socket, and hence the need to grab
      port_lock. This is a further step in our effort to remove port_lock
      altogether.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50100a5e
  7. 17 7月, 2014 3 次提交
  8. 28 6月, 2014 1 次提交
    • J
      tipc: introduce send functions for chained buffers in link · 4f1688b2
      Jon Paul Maloy 提交于
      The current link implementation provides several different transmit
      functions, depending on the characteristics of the message to be
      sent: if it is an iovec or an sk_buff, if it needs fragmentation or
      not, if the caller holds the node_lock or not. The permutation of
      these options gives us an unwanted amount of unnecessarily complex
      code.
      
      As a first step towards simplifying the send path for all messages,
      we introduce two new send functions at link level, tipc_link_xmit2()
      and __tipc_link_xmit2(). The former looks up a link to the message
      destination, and if one is found, it grabs the node lock and calls
      the second function, which works exclusively inside the node lock
      protection. If no link is found, and the destination is on the same
      node, it delivers the message directly to the local destination
      socket.
      
      The new functions take a buffer chain where all packet headers are
      already prepared, and the correct MTU has been used. These two
      functions will later replace all other link-level transmit functions.
      
      The functions are not backwards compatible, so we have added them
      as new functions with temporary names. They are tested, but have no
      users yet. Those will be added later in this series.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f1688b2
  9. 15 5月, 2014 1 次提交
  10. 06 5月, 2014 1 次提交
  11. 23 4月, 2014 1 次提交
    • Y
      tipc: decouple the relationship between bearer and link · 7a2f7d18
      Ying Xue 提交于
      Currently on both paths of message transmission and reception, the
      read lock of tipc_net_lock must be held before bearer is accessed,
      while the write lock of tipc_net_lock has to be taken before bearer
      is configured. Although it can ensure that bearer is always valid on
      the two data paths, link and bearer is closely bound together.
      
      So as the part of effort of removing tipc_net_lock, the locking
      policy of bearer protection will be adjusted as below: on the two
      data paths, RCU is used, and on the configuration path of bearer,
      RTNL lock is applied.
      
      Now RCU just covers the path of message reception. To make it possible
      to protect the path of message transmission with RCU, link should not
      use its stored bearer pointer to access bearer, but it should use the
      bearer identity of its attached bearer as index to get bearer instance
      from bearer_list array, which can help us decouple the relationship
      between bearer and link. As a result, bearer on the path of message
      transmission can be safely protected by RCU when we access bearer_list
      array within RCU lock protection.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Tested-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a2f7d18
  12. 19 2月, 2014 1 次提交
    • Y
      tipc: align tipc function names with common naming practice in the network · 247f0f3c
      Ying Xue 提交于
      Rename the following functions, which are shorter and more in line
      with common naming practice in the network subsystem.
      
      tipc_bclink_send_msg->tipc_bclink_xmit
      tipc_bclink_recv_pkt->tipc_bclink_rcv
      tipc_disc_recv_msg->tipc_disc_rcv
      tipc_link_send_proto_msg->tipc_link_proto_xmit
      link_recv_proto_msg->tipc_link_proto_rcv
      link_send_sections_long->tipc_link_iovec_long_xmit
      tipc_link_send_sections_fast->tipc_link_iovec_xmit_fast
      tipc_link_send_sync->tipc_link_sync_xmit
      tipc_link_recv_sync->tipc_link_sync_rcv
      tipc_link_send_buf->__tipc_link_xmit
      tipc_link_send->tipc_link_xmit
      tipc_link_send_names->tipc_link_names_xmit
      tipc_named_recv->tipc_named_rcv
      tipc_link_recv_bundle->tipc_link_bundle_rcv
      tipc_link_dup_send_queue->tipc_link_dup_queue_xmit
      link_send_long_buf->tipc_link_frag_xmit
      
      tipc_multicast->tipc_port_mcast_xmit
      tipc_port_recv_mcast->tipc_port_mcast_rcv
      tipc_port_reject_sections->tipc_port_iovec_reject
      tipc_port_recv_proto_msg->tipc_port_proto_rcv
      tipc_connect->tipc_port_connect
      __tipc_connect->__tipc_port_connect
      __tipc_disconnect->__tipc_port_disconnect
      tipc_disconnect->tipc_port_disconnect
      tipc_shutdown->tipc_port_shutdown
      tipc_port_recv_msg->tipc_port_rcv
      tipc_port_recv_sections->tipc_port_iovec_rcv
      
      release->tipc_release
      accept->tipc_accept
      bind->tipc_bind
      get_name->tipc_getname
      poll->tipc_poll
      send_msg->tipc_sendmsg
      send_packet->tipc_send_packet
      send_stream->tipc_send_stream
      recv_msg->tipc_recvmsg
      recv_stream->tipc_recv_stream
      connect->tipc_connect
      listen->tipc_listen
      shutdown->tipc_shutdown
      setsockopt->tipc_setsockopt
      getsockopt->tipc_getsockopt
      
      Above changes have no impact on current users of the functions.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      247f0f3c
  13. 14 2月, 2014 6 次提交
    • J
      tipc: delay delete of link when failover is needed · 7d33939f
      Jon Paul Maloy 提交于
      When a bearer is disabled, all its attached links are deleted.
      Ideally, we should do link failover to redundant links on other bearers,
      if there are any, in such cases. This would be consistent with current
      behavior when a link is reset, but not deleted. However, due to the
      complexity involved, and the (wrongly) perceived low demand for this
      feature, it was never implemented until now.
      
      We mark the doomed link for deletion with a new flag, but wait until the
      failover process is finished before we actually delete it. With the
      improved link tunnelling/failover code introduced earlier in this commit
      series, it is now easy to identify a spot in the code where the failover
      is finished and it is safe to delete the marked link. Moreover, the test
      for the flag and the deletion can be done synchronously, and outside the
      most time critical data path.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d33939f
    • Y
      tipc: remove 'links' list from tipc_bearer struct · c61dd61d
      Ying Xue 提交于
      In our ongoing effort to simplify the TIPC locking structure,
      we see a need to remove the linked list for tipc_links
      in the bearer. This can be explained as follows.
      
      Currently, we have three different ways to access a link,
      via three different lists/tables:
      
      1: Via a node hash table:
         Used by the time-critical outgoing/incoming data paths.
         (e.g. link_send_sections_fast() and tipc_recv_msg() ):
      
      grab net_lock(read)
         find node from node hash table
         grab node_lock
             select link
             grab bearer_lock
                send_msg()
             release bearer_lock
         release node lock
      release net_lock
      
      2: Via a global linked list for nodes:
         Used by configuration commands (link_cmd_set_value())
      
      grab net_lock(read)
         find node and link from global node list (using link name)
         grab node_lock
             update link
         release node lock
      release net_lock
      
      (Same locking order as above. No problem.)
      
      3: Via the bearer's linked link list:
         Used by notifications from interface (e.g. tipc_disable_bearer() )
      
      grab net_lock(write)
         grab bearer_lock
            get link ptr from bearer's link list
            get node from link
            grab node_lock
               delete link
            release node lock
         release bearer_lock
      release net_lock
      
      (Different order from above, but works because we grab the
      outer net_lock in write mode first, excluding all other access.)
      
      The first major goal in our simplification effort is to get rid
      of the "big" net_lock, replacing it with rcu-locks when accessing
      the node list and node hash array. This will come in a later patch
      series.
      
      But to get there we first need to rewrite access methods ##2 and 3,
      since removal of net_lock would introduce three major problems:
      
      a) In access method #2, we access the link before taking the
         protecting node_lock. This will not work once net_lock is gone,
         so we will have to change the access order. We will deal with
         this in a later commit in this series, "tipc: add node lock
         protection to link found by link_find_link()".
      
      b) When the outer protection from net_lock is gone, taking
         bearer_lock and node_lock in opposite order of method 1) and 2)
         will become an obvious deadlock hazard. This is fixed in the
         commit ("tipc: remove bearer_lock from tipc_bearer struct")
         later in this series.
      
      c) Similar to what is described in problem a), access method #3
         starts with using a link pointer that is unprotected by node_lock,
         in order to via that pointer find the correct node struct and
         lock it. Before we remove net_lock, this access order must be
         altered. This is what we do with this commit.
      
      We can avoid introducing problem problem c) by even here using the
      global node list to find the node, before accessing its links. When
      we loop though the node list we use the own bearer identity as search
      criteria, thus easily finding the links that are associated to the
      resetting/disabling bearer. It should be noted that although this
      method is somewhat slower than the current list traversal, it is in
      no way time critical. This is only about resetting or deleting links,
      something that must be considered relatively infrequent events.
      
      As a bonus, we can get rid of the mutual pointers between links and
      bearers. After this commit, pointer dependency go in one direction
      only: from the link to the bearer.
      
      This commit pre-empts introduction of problem c) as described above.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c61dd61d
    • Y
      tipc: redefine 'started' flag in struct link to bitmap · 135daee6
      Ying Xue 提交于
      Currently, the 'started' field in struct tipc_link represents only a
      binary state, 'started' or 'not started'. We need it to represent
      more link execution states in the coming commits in this series.
      Hence, we rename the field to 'flags', and define the current
      started/non-started state to be represented by the LSB bit of
      that field.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      135daee6
    • Y
      tipc: move code for deleting links from bearer.c to link.c · 8d8439b6
      Ying Xue 提交于
      We break out the code for deleting attached links in the
      function bearer_disable(), and define a new function named
      tipc_link_delete_list() to do this job.
      
      This commit incurs no functional changes, but makes the code of
      function bearer_disable() cleaner. It is also a preparation
      for a more important change to the bearer code, in a subsequent
      commit in this series.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d8439b6
    • Y
      tipc: move code for resetting links from bearer.c to link.c · e0ca2c30
      Ying Xue 提交于
      We break out the code for resetting attached links in the
      function tipc_reset_bearer(), and define a new function named
      tipc_link_reset_list() to do this job.
      
      This commit incurs no functional changes, but makes the code
      of function tipc_reset_bearer() cleaner. It is also a preparation
      for a more important change to the bearer code, in a subsequent
      commit in this series.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e0ca2c30
    • J
      tipc: stricter behavior of message reassembly function · 03b92017
      Jon Paul Maloy 提交于
      The function tipc_link_recv_fragment(struct sk_buff **buf) currently
      leaves the value of the input buffer pointer undefined when it returns,
      except when the return code indicates that the reassembly is complete.
      This despite the fact that it always consumes the input buffer.
      
      Here, we enforce a stricter behavior by this function, ensuring that
      the returned buffer pointer is non-NULL if and only if the reassembly
      is complete. This makes it possible to test for the buffer pointer as
      criteria for successful reassembly.
      
      We also rename the function to tipc_link_frag_rcv(), which is both
      shorter and more in line with common naming practice in the network
      subsystem.
      
      Apart from the new name, these changes have no impact on current
      users of the function, but makes it more practical for use in some
      planned future commits.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      03b92017
  14. 08 1月, 2014 2 次提交
  15. 05 1月, 2014 1 次提交
  16. 11 12月, 2013 1 次提交
  17. 08 11月, 2013 1 次提交
    • E
      tipc: message reassembly using fragment chain · 40ba3cdf
      Erik Hugne 提交于
      When the first fragment of a long data data message is received on a link, a
      reassembly buffer large enough to hold the data from this and all subsequent
      fragments of the message is allocated. The payload of each new fragment is
      copied into this buffer upon arrival. When the last fragment is received, the
      reassembled message is delivered upwards to the port/socket layer.
      
      Not only is this an inefficient approach, but it may also cause bursts of
      reassembly failures in low memory situations. since we may fail to allocate
      the necessary large buffer in the first place. Furthermore, after 100 subsequent
      such failures the link will be reset, something that in reality aggravates the
      situation.
      
      To remedy this problem, this patch introduces a different approach. Instead of
      allocating a big reassembly buffer, we now append the arriving fragments
      to a reassembly chain on the link, and deliver the whole chain up to the
      socket layer once the last fragment has been received. This is safe because
      the retransmission layer of a TIPC link always delivers packets in strict
      uninterrupted order, to the reassembly layer as to all other upper layers.
      Hence there can never be more than one fragment chain pending reassembly at
      any given time in a link, and we can trust (but still verify) that the
      fragments will be chained up in the correct order.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40ba3cdf
  18. 19 10月, 2013 1 次提交