1. 26 2月, 2016 2 次提交
    • J
      tipc: fix crash during node removal · d25a0125
      Jon Paul Maloy 提交于
      When the TIPC module is unloaded, we have identified a race condition
      that allows a node reference counter to go to zero and the node instance
      being freed before the node timer is finished with accessing it. This
      leads to occasional crashes, especially in multi-namespace environments.
      
      The scenario goes as follows:
      
      CPU0:(node_stop)                       CPU1:(node_timeout)  // ref == 2
      
      1:                                          if(!mod_timer())
      2: if (del_timer())
      3:   tipc_node_put()                                        // ref -> 1
      4: tipc_node_put()                                          // ref -> 0
      5:   kfree_rcu(node);
      6:                                               tipc_node_get(node)
      7:                                               // BOOM!
      
      We now clean up this functionality as follows:
      
      1) We remove the node pointer from the node lookup table before we
         attempt deactivating the timer. This way, we reduce the risk that
         tipc_node_find() may obtain a valid pointer to an instance marked
         for deletion; a harmless but undesirable situation.
      
      2) We use del_timer_sync() instead of del_timer() to safely deactivate
         the node timer without any risk that it might be reactivated by the
         timeout handler. There is no risk of deadlock here, since the two
         functions never touch the same spinlocks.
      
      3: We remove a pointless tipc_node_get() + tipc_node_put() from the
         timeout handler.
      Reported-by: NZhijiang Hu <huzhijiang@gmail.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d25a0125
    • J
      tipc: eliminate risk of finding to-be-deleted node instance · b170997a
      Jon Paul Maloy 提交于
      Although we have never seen it happen, we have identified the
      following problematic scenario when nodes are stopped and deleted:
      
      CPU0:                            CPU1:
      
      tipc_node_xxx()                                   //ref == 1
         tipc_node_put()                                //ref -> 0
                                       tipc_node_find() // node still in table
             tipc_node_delete()
               list_del_rcu(n. list)
                                       tipc_node_get()  //ref -> 1, bad
               kfree_rcu()
      
                                       tipc_node_put() //ref to 0 again.
                                       kfree_rcu()     // BOOM!
      
      We fix this by introducing use of the conditional kref_get_if_not_zero()
      instead of kref_get() in the function tipc_node_find(). This eliminates
      any risk of post-mortem access.
      Reported-by: NZhijiang Hu <huzhijiang@gmail.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b170997a
  2. 17 2月, 2016 2 次提交
  3. 06 2月, 2016 1 次提交
    • R
      tipc: fix link attribute propagation bug · d01332f1
      Richard Alpe 提交于
      Changing certain link attributes (link tolerance and link priority)
      from the TIPC management tool is supposed to automatically take
      effect at both endpoints of the affected link.
      
      Currently the media address is not instantiated for the link and is
      used uninstantiated when crafting protocol messages designated for the
      peer endpoint. This means that changing a link property currently
      results in the property being changed on the local machine but the
      protocol message designated for the peer gets lost. Resulting in
      property discrepancy between the endpoints.
      
      In this patch we resolve this by using the media address from the
      link entry and using the bearer transmit function to send it. Hence,
      we can now eliminate the redundant function tipc_link_prot_xmit() and
      the redundant field tipc_link::media_addr.
      
      Fixes: 2af5ae37 (tipc: clean up unused code and structures)
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Reported-by: NJason Hu <huzhijiang@gmail.com>
      Signed-off-by: NRichard Alpe <richard.alpe@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d01332f1
  4. 04 12月, 2015 1 次提交
  5. 21 11月, 2015 7 次提交
    • J
      tipc: eliminate remnants of hungarian notation · 1a90632d
      Jon Paul Maloy 提交于
      The number of variables with Hungarian notation (l_ptr, n_ptr etc.)
      has been significantly reduced over the last couple of years.
      
      We now root out the last traces of this practice.
      There are no functional changes in this commit.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a90632d
    • J
      tipc: narrow down interface towards struct tipc_link · 38206d59
      Jon Paul Maloy 提交于
      We move the definition of struct tipc_link from link.h to link.c in
      order to minimize its exposure to the rest of the code.
      
      When needed, we define new functions to make it possible for external
      entities to access and set data in the link.
      
      Apart from the above, there are no functional changes.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38206d59
    • J
      tipc: narrow down exposure of struct tipc_node · 5be9c086
      Jon Paul Maloy 提交于
      In our effort to have less code and include dependencies between
      entities such as node, link and bearer, we try to narrow down
      the exposed interface towards the node as much as possible.
      
      In this commit, we move the definition of struct tipc_node, along
      with many of its associated function declarations, from node.h to
      node.c. We also move some function definitions from link.c and
      name_distr.c to node.c, since they access fields in struct tipc_node
      that should not be externally visible. The moved functions are renamed
      according to new location, and made static whenever possible.
      
      There are no functional changes in this commit.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5be9c086
    • J
      tipc: convert node lock to rwlock · 5405ff6e
      Jon Paul Maloy 提交于
      According to the node FSM a node in state SELF_UP_PEER_UP cannot
      change state inside a lock context, except when a TUNNEL_PROTOCOL
      (SYNCH or FAILOVER) packet arrives. However, the node's individual
      links may still change state.
      
      Since each link now is protected by its own spinlock, we finally have
      the conditions in place to convert the node spinlock to an rwlock_t.
      If the node state and arriving packet type are rigth, we can let the
      link directly receive the packet under protection of its own spinlock
      and the node lock in read mode. In all other cases we use the node
      lock in write mode. This enables full concurrent execution between
      parallel links during steady-state traffic situations, i.e., 99+ %
      of the time.
      
      This commit implements this change.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5405ff6e
    • J
      tipc: introduce per-link spinlock · 2312bf61
      Jon Paul Maloy 提交于
      As a preparation to allow parallel links to work more independently
      from each other we introduce a per-link spinlock, to be stored in the
      struct nodes's link entry area. Since the node lock still is a regular
      spinlock there is no increase in parallellism at this stage.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2312bf61
    • J
      tipc: reduce code dependency between binding table and node layer · 1d7e1c25
      Jon Paul Maloy 提交于
      The file name_distr.c currently contains three functions,
      named_cluster_distribute(), tipc_publ_subcscribe() and
      tipc_publ_unsubscribe() that all directly access fields in
      struct tipc_node. We want to eliminate such dependencies, so
      we move those functions to the file node.c and rename them to
      tipc_node_broadcast(), tipc_node_subscribe() and tipc_node_unsubscribe()
      respectively.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d7e1c25
    • J
      tipc: small cleanup of function tipc_node_check_state() · 5c10e979
      Jon Paul Maloy 提交于
      The function tipc_node_check_state() contains the core logics
      for handling link synchronization and failover. For this reason,
      it is important to keep it as comprehensible as possible.
      
      In this commit, we make three small cleanups.
      
      1) If the node is in state SELF_DOWN_PEER_LEAVING and the received
         packet confirms that the peer has lost contact, there will be no
         further action in this function. To make this clearer, we return
         from the function directly after the state change.
      
      2) Since commit 0f8b8e28 ("tipc: eliminate risk of stalled
         link synchronization") only the logically first TUNNEL_PROTO/SYNCH
         packet can alter the link state and set the synch point,
         independently of arrival order. Hence, there is not any longer any
         need to adjust the synch value in case such packets arrive in
         disorder. We remove this adjustment.
      
      3) It is the intention that any message arriving on any of the links
         may trig a check for and possible termination of a node SYNCH state.
         A redundant and unnoticed check for tipc_link_is_synching() obviously
         beats this purpose, with the effect that only packets arriving on the
         synching link may currently end the synch state. We remove this check.
         This change will further shorten the synchronization period between
         parallel links.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5c10e979
  6. 25 10月, 2015 1 次提交
  7. 24 10月, 2015 7 次提交
    • J
      tipc: clean up unused code and structures · 2af5ae37
      Jon Paul Maloy 提交于
      After the previous changes in this series, we can now remove some
      unused code and structures, both in the broadcast, link aggregation
      and link code.
      
      There are no functional changes in this commit.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2af5ae37
    • J
      tipc: ensure binding table initial distribution is sent via first link · c49a0a84
      Jon Paul Maloy 提交于
      Correct synchronization of the broadcast link at first contact between
      two nodes is dependent on the assumption that the binding table "bulk"
      update passes via the same link as the initial broadcast syncronization
      message, i.e., via the first link that is established.
      
      This is not guaranteed in the current implementation. If two link
      come up very close to each other in time, the "bulk" may quite well
      pass via the second link, and hence void the guarantee of a correct
      initial synchronization before the broadcast link is opened.
      
      This commit makes two small changes to strengthen this guarantee.
      
      1) We let the second established link occupy slot 1 of the
         "active_links" array, while the first link will retain slot 0.
         (This is in reality a cosmetic change, we could just as well keep
          the current, opposite order)
      
      2) We let the name distributor always use link selector/slot 0 when
         it sends it binding table updates.
      
      The extra traffic bias on the first link caused by this change should
      be negligible, since binding table updates constitutes a very small
      fraction of the total traffic.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c49a0a84
    • J
      tipc: eliminate link's reference to owner node · c72fa872
      Jon Paul Maloy 提交于
      With the recent commit series, we have established a one-way dependency
      between the link aggregation (struct tipc_node) instances and their
      pertaining tipc_link instances. This has enabled quite significant code
      and structure simplifications.
      
      In this commit, we eliminate the field 'owner', which points to an
      instance of struct tipc_node, from struct tipc_link, and replace it with
      a pointer to struct net, which is the only external reference now needed
      by a link instance.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c72fa872
    • J
      tipc: simplify bearer level broadcast · b06b281e
      Jon Paul Maloy 提交于
      Until now, we have been keeping track of the exact set of broadcast
      destinations though the help structure tipc_node_map. This leads us to
      have to maintain a whole infrastructure for supporting this, including
      a pseudo-bearer and a number of functions to manipulate both the bearers
      and the node map correctly. Apart from the complexity, this approach is
      also limiting, as struct tipc_node_map only can support cluster local
      broadcast if we want to avoid it becoming excessively large. We want to
      eliminate this limitation, in order to enable introduction of scoped
      multicast in the future.
      
      A closer analysis reveals that it is unnecessary maintaining this "full
      set" overview; it is sufficient to keep a counter per bearer, indicating
      how many nodes can be reached via this bearer at the moment. The protocol
      is now robust enough to handle transitional discrepancies between the
      nominal number of reachable destinations, as expected by the broadcast
      protocol itself, and the number which is actually reachable at the
      moment. The initial broadcast synchronization, in conjunction with the
      retransmission mechanism, ensures that all packets will eventually be
      acknowledged by the correct set of destinations.
      
      This commit introduces these changes.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b06b281e
    • J
      tipc: let broadcast packet reception use new link receive function · 52666986
      Jon Paul Maloy 提交于
      The code path for receiving broadcast packets is currently distinct
      from the unicast path. This leads to unnecessary code and data
      duplication, something that can be avoided with some effort.
      
      We now introduce separate per-peer tipc_link instances for handling
      broadcast packet reception. Each receive link keeps a pointer to the
      common, single, broadcast link instance, and can hence handle release
      and retransmission of send buffers as if they belonged to the own
      instance.
      
      Furthermore, we let each unicast link instance keep a reference to both
      the pertaining broadcast receive link, and to the common send link.
      This makes it possible for the unicast links to easily access data for
      broadcast link synchronization, as well as for carrying acknowledges for
      received broadcast packets.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52666986
    • J
      tipc: introduce capability bit for broadcast synchronization · fd556f20
      Jon Paul Maloy 提交于
      Until now, we have tried to support both the newer, dedicated broadcast
      synchronization mechanism along with the older, less safe, RESET_MSG/
      ACTIVATE_MSG based one. The latter method has turned out to be a hazard
      in a highly dynamic cluster, so we find it safer to disable it completely
      when we find that the former mechanism is supported by the peer node.
      
      For this purpose, we now introduce a new capabability bit,
      TIPC_BCAST_SYNCH, to inform any peer nodes that dedicated broadcast
      syncronization is supported by the present node. The new bit is conveyed
      between peers in the 'capabilities' field of neighbor discovery messages.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd556f20
    • J
      tipc: make link implementation independent from struct tipc_bearer · 0e05498e
      Jon Paul Maloy 提交于
      In reality, the link implementation is already independent from
      struct tipc_bearer, in that it doesn't store any reference to it.
      However, we still pass on a pointer to a bearer instance in the
      function tipc_link_create(), just to have it extract some
      initialization information from it.
      
      I later commits, we need to create instances of tipc_link without
      having any associated struct tipc_bearer. To facilitate this, we
      want to extract the initialization data already in the creator
      function in node.c, before calling tipc_link_create(), and pass
      this info on as individual parameters in the call.
      
      This commit introduces this change.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e05498e
  8. 16 10月, 2015 3 次提交
    • J
      tipc: update node FSM when peer RESET message is received · c8199300
      Jon Paul Maloy 提交于
      The change made in the previous commit revealed a small flaw in the way
      the node FSM is updated. When the function tipc_node_link_down() is
      called for the last link to a node, we should check whether this was
      caused by a local reset or by a received RESET message from the peer.
      In the latter case, we can directly issue a PEER_LOST_CONTACT_EVT to
      the node FSM, so that it is ready to re-establish contact. If this is
      not done, the peer node will sometimes have to go through a second
      establish cycle before the link becomes stable.
      
      We fix this in this commit by conditionally issuing the mentioned
      event in the function tipc_node_link_down(). We also move LINK_RESET
      FSM even away from the link_reset() function and into the caller
      function, partially because it is easier to follow the code when state
      changes are gathered at a limited number of locations, partially
      because there will be cases in future commits where we don't want the
      link to go RESET mode when link_reset() is called.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c8199300
    • J
      tipc: send out RESET immediately when link goes down · 282b3a05
      Jon Paul Maloy 提交于
      When a link is taken down because of a node local event, such as
      disabling of a bearer or an interface, we currently leave it to the
      peer node to discover the broken communication. The default time for
      such failure discovery is 1.5-2 seconds.
      
      If we instead allow the terminating link endpoint to send out a RESET
      message at the moment it is reset, we can achieve the impression that
      both endpoints are going down instantly. Since this is a very common
      scenario, we find it worthwhile to make this small modification.
      
      Apart from letting the link produce the said message, we also have to
      ensure that the interface is able to transmit it before TIPC is
      detached. We do this by performing the disabling of a bearer in three
      steps:
      
      1) Disable reception of TIPC packets from the interface in question.
      2) Take down the links, while allowing them so send out a RESET message.
      3) Disable transmission of TIPC packets on the interface.
      
      Apart from this, we now have to react on the NETDEV_GOING_DOWN event,
      instead of as currently the NEDEV_DOWN event, to ensure that such
      transmission is possible during the teardown phase.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      282b3a05
    • J
      tipc: delay ESTABLISH state event when link is established · 73f646ce
      Jon Paul Maloy 提交于
      Link establishing, just like link teardown, is a non-atomic action, in
      the sense that discovering that conditions are right to establish a link,
      and the actual adding of the link to one of the node's send slots is done
      in two different lock contexts. The link FSM is designed to help bridging
      the gap between the two contexts in a safe manner.
      
      We have now discovered a weakness in the implementaton of this FSM.
      Because we directly let the link go from state LINK_ESTABLISHING to
      state LINK_ESTABLISHED already in the first lock context, we are unable
      to distinguish between a fully established link, i.e., a link that has
      been added to its slot, and a link that has not yet reached the second
      lock context. It may hence happen that a manual intervention, e.g., when
      disabling an interface, causes the function tipc_node_link_down() to try
      removing the link from the node slots, decrementing its active link
      counter etc, although the link was never added there in the first place.
      
      We solve this by delaying the actual state change until we reach the
      second lock context, inside the function tipc_node_link_up(). This
      makes it possible for potentail callers of __tipc_node_link_down() to
      know if they should proceed or not, and the problem is solved.
      
      Unforunately, the situation described above also has a second problem.
      Since there by necessity is a tipc_node_link_up() call pending once
      the node lock has been released, we must defuse that call by setting
      the link back from LINK_ESTABLISHING to LINK_RESET state. This forces
      us to make a slight modification to the link FSM, which will now look
      as follows.
      
       +------------------------------------+
       |RESET_EVT                           |
       |                                    |
       |                             +--------------+
       |           +-----------------|   SYNCHING   |-----------------+
       |           |FAILURE_EVT      +--------------+   PEER_RESET_EVT|
       |           |                  A            |                  |
       |           |                  |            |                  |
       |           |                  |            |                  |
       |           |                  |SYNCH_      |SYNCH_            |
       |           |                  |BEGIN_EVT   |END_EVT           |
       |           |                  |            |                  |
       |           V                  |            V                  V
       |    +-------------+          +--------------+          +------------+
       |    |  RESETTING  |<---------|  ESTABLISHED |--------->| PEER_RESET |
       |    +-------------+ FAILURE_ +--------------+ PEER_    +------------+
       |           |        EVT        |    A         RESET_EVT       |
       |           |                   |    |                         |
       |           |  +----------------+    |                         |
       |  RESET_EVT|  |RESET_EVT            |                         |
       |           |  |                     |                         |
       |           |  |                     |ESTABLISH_EVT            |
       |           |  |  +-------------+    |                         |
       |           |  |  | RESET_EVT   |    |                         |
       |           |  |  |             |    |                         |
       |           V  V  V             |    |                         |
       |    +-------------+          +--------------+        RESET_EVT|
       +--->|    RESET    |--------->| ESTABLISHING |<----------------+
            +-------------+ PEER_    +--------------+
             |           A  RESET_EVT       |
             |           |                  |
             |           |                  |
             |FAILOVER_  |FAILOVER_         |FAILOVER_
             |BEGIN_EVT  |END_EVT           |BEGIN_EVT
             |           |                  |
             V           |                  |
            +-------------+                 |
            | FAILINGOVER |<----------------+
            +-------------+
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73f646ce
  9. 14 10月, 2015 1 次提交
    • J
      tipc: eliminate risk of stalled link synchronization · 0f8b8e28
      Jon Paul Maloy 提交于
      In commit 6e498158 ("tipc: move link synch and failover to link aggregation level")
      we introduced a new mechanism for performing link failover and
      synchronization. We have now detected a bug in this mechanism.
      
      During link synchronization we use the arrival of any packet on
      the tunnel link to trig a check for whether it has reached the
      synchronization point or not. This has turned out to be too
      permissive, since it may cause an arriving non-last SYNCH packet to
      end the synch state, just to see the next SYNCH packet initiate a
      new synch state with a new, higher synch point. This is not fatal,
      but should be avoided, because it may significantly extend the
      synchronization period, while at the same time we are not allowed
      to send NACKs if packets are lost. In the worst case, a low-traffic
      user may see its traffic stall until a LINK_PROTOCOL state message
      trigs the link to leave synchronization state.
      
      At the same time, LINK_PROTOCOL packets which happen to have a (non-
      valid) sequence number lower than the tunnel link's rcv_nxt value will
      be consistently dropped, and will never be able to resolve the situation
      described above.
      
      We fix this by exempting LINK_PROTOCOL packets from the sequence number
      check, as they should be. We also reduce (but don't completely
      eliminate) the risk of entering multiple synchronization states by only
      allowing the (logically) first SYNCH packet to initiate a synchronization
      state. This works independently of actual packet arrival order.
      
      Fixes: commit 6e498158 ("tipc: move link synch and failover to link aggregation level")
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0f8b8e28
  10. 24 8月, 2015 3 次提交
    • J
      tipc: fix stale link problem during synchronization · 2be80c2d
      Jon Paul Maloy 提交于
      Recent changes to the link synchronization means that we can now just
      drop packets arriving on the synchronizing link before the synch point
      is reached. This has lead to significant simplifications to the
      implementation, but also turns out to have a flip side that we need
      to consider.
      
      Under unlucky circumstances, the two endpoints may end up
      repeatedly dropping each other's packets, while immediately
      asking for retransmission of the same packets, just to drop
      them once more. This pattern will eventually be broken when
      the synch point is reached on the other link, but before that,
      the endpoints may have arrived at the retransmission limit
      (stale counter) that indicates that the link should be broken.
      We see this happen at rare occasions.
      
      The fix for this is to not ask for retransmissions when a link is in
      state LINK_SYNCHING. The fact that the link has reached this state
      means that it has already received the first SYNCH packet, and that it
      knows the synch point. Hence, it doesn't need any more packets until the
      other link has reached the synch point, whereafter it can go ahead and
      ask for the missing packets.
      
      However, because of the reduced traffic on the synching link that
      follows this change, it may now take longer to discover that the
      synch point has been reached. We compensate for this by letting all
      packets, on any of the links, trig a check for synchronization
      termination. This is possible because the packets themselves don't
      contain any information that is needed for discovering this condition.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2be80c2d
    • J
      tipc: interrupt link synchronization when a link goes down · 5ae2f8e6
      Jon Paul Maloy 提交于
      When we introduced the new link failover/synch mechanism
      in commit 6e498158
      ("tipc: move link synch and failover to link aggregation level"),
      we missed the case when the non-tunnel link goes down during the link
      synchronization period. In this case the tunnel link will remain in
      state LINK_SYNCHING, something leading to unpredictable behavior when
      the failover procedure is initiated.
      
      In this commit, we ensure that the node and remaining link goes
      back to regular communication state (SELF_UP_PEER_UP/LINK_ESTABLISHED)
      when one of the parallel links goes down. We also ensure that we don't
      re-enter synch mode if subsequent SYNCH packets arrive on the remaining
      link.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5ae2f8e6
    • J
      tipc: eliminate risk of premature link setup during failover · 17b20630
      Jon Paul Maloy 提交于
      When a link goes down, and there is still a working link towards its
      destination node, a failover is initiated, and the failed link is not
      allowed to re-establish until that procedure is finished. To ensure
      this, the concerned link endpoints are set to state LINK_FAILINGOVER,
      and the node endpoints to NODE_FAILINGOVER during the failover period.
      
      However, if the link reset is due to a disabled bearer, the corres-
      ponding link endpoint is deleted, and only the node endpoint knows
      about the ongoing failover. Now, if the disabled bearer is re-enabled
      during the failover period, the discovery mechanism may create a new
      link endpoint that is ready to be established, despite that this is not
      permitted. This situation may cause both the ongoing failover and any
      subsequent link synchronization to fail.
      
      In this commit, we ensure that a newly created link goes directly to
      state LINK_FAILINGOVER if the corresponding node state is
      NODE_FAILINGOVER. This eliminates the problem described above.
      
      Furthermore, we tighten the criteria for which packets are allowed
      to end a failover state in the function tipc_node_check_state().
      By checking that the receiving link is up and running, instead of just
      checking that it is not in failover mode, we eliminate the risk that
      protocol packets from the re-created link may cause the failover to
      be prematurely terminated.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17b20630
  11. 31 7月, 2015 11 次提交
    • J
      tipc: clean up link creation · 440d8963
      Jon Paul Maloy 提交于
      We simplify the link creation function tipc_link_create() and the way
      the link struct it is connected to the node struct. In particular, we
      remove the duplicate initialization of some fields which are anyway set
      in tipc_link_reset().
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      440d8963
    • J
      tipc: remove implicit message delivery in node_unlock() · 23d8335d
      Jon Paul Maloy 提交于
      After the most recent changes, all access calls to a link which
      may entail addition of messages to the link's input queue are
      postpended by an explicit call to tipc_sk_rcv(), using a reference
      to the correct queue.
      
      This means that the potentially hazardous implicit delivery, using
      tipc_node_unlock() in combination with a binary flag and a cached
      queue pointer, now has become redundant.
      
      This commit removes this implicit delivery mechanism both for regular
      data messages and for binding table update messages.
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23d8335d
    • J
      tipc: make resetting of links non-atomic · 598411d7
      Jon Paul Maloy 提交于
      In order to facilitate future improvements to the locking structure, we
      want to make resetting and establishing of links non-atomic. I.e., the
      functions tipc_node_link_up() and tipc_node_link_down() should be called
      from outside the node lock context, and grab/release the node lock
      themselves. This requires that we can freeze the link state from the
      moment it is set to RESETTING or PEER_RESET in one lock context until
      it is set to RESET or ESTABLISHING in a later context. The recently
      introduced link FSM makes this possible, so we are now ready to introduce
      the above change.
      
      This commit implements this.
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      598411d7
    • J
      tipc: move received discovery data evaluation inside node.c · cf148816
      Jon Paul Maloy 提交于
      The node lock is currently grabbed and and released in the function
      tipc_disc_rcv() in the file discover.c. As a preparation for the next
      commits, we need to move this node lock handling, along with the code
      area it is covering, to node.c.
      
      This commit introduces this change.
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf148816
    • J
      tipc: merge link->exec_mode and link->state into one FSM · 662921cd
      Jon Paul Maloy 提交于
      Until now, we have been handling link failover and synchronization
      by using an additional link state variable, "exec_mode". This variable
      is not independent of the link FSM state, something causing a risk of
      inconsistencies, apart from the fact that it clutters the code.
      
      The conditions are now in place to define a new link FSM that covers
      all existing use cases, including failover and synchronization, and
      eliminate the "exec_mode" field altogether. The FSM must also support
      non-atomic resetting of links, which will be introduced later.
      
      The new link FSM is shown below, with 7 states and 8 events.
      Only events leading to state change are shown as edges.
      
      +------------------------------------+
      |RESET_EVT                           |
      |                                    |
      |                             +--------------+
      |           +-----------------|   SYNCHING   |-----------------+
      |           |FAILURE_EVT      +--------------+   PEER_RESET_EVT|
      |           |                  A            |                  |
      |           |                  |            |                  |
      |           |                  |            |                  |
      |           |                  |SYNCH_      |SYNCH_            |
      |           |                  |BEGIN_EVT   |END_EVT           |
      |           |                  |            |                  |
      |           V                  |            V                  V
      |    +-------------+          +--------------+          +------------+
      |    |  RESETTING  |<---------|  ESTABLISHED |--------->| PEER_RESET |
      |    +-------------+ FAILURE_ +--------------+ PEER_    +------------+
      |           |        EVT        |    A         RESET_EVT       |
      |           |                   |    |                         |
      |           |                   |    |                         |
      |           |    +--------------+    |                         |
      |  RESET_EVT|    |RESET_EVT          |ESTABLISH_EVT            |
      |           |    |                   |                         |
      |           |    |                   |                         |
      |           V    V                   |                         |
      |    +-------------+          +--------------+        RESET_EVT|
      +--->|    RESET    |--------->| ESTABLISHING |<----------------+
           +-------------+ PEER_    +--------------+
            |           A  RESET_EVT       |
            |           |                  |
            |           |                  |
            |FAILOVER_  |FAILOVER_         |FAILOVER_
            |BEGIN_EVT  |END_EVT           |BEGIN_EVT
            |           |                  |
            V           |                  |
           +-------------+                 |
           | FAILINGOVER |<----------------+
           +-------------+
      
      These changes are fully backwards compatible.
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      662921cd
    • J
      tipc: move protocol message sending away from link FSM · 5045f7b9
      Jon Paul Maloy 提交于
      The implementation of the link FSM currently takes decisions about and
      sends out link protocol messages. This is unnecessary, since such
      actions are not the result of any link state change, and are even
      decided based on non-FSM state information ("silent_intv_cnt").
      
      We now move the sending of unicast link protocol messages to the
      function tipc_link_timeout(), and the initial broadcast synchronization
      message to tipc_node_link_up(). The latter is done because a link
      instance should not need to know whether it is the first or second
      link to a destination. Such information is now restricted to and
      handled by the link aggregation layer in node.c
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5045f7b9
    • J
      tipc: move link synch and failover to link aggregation level · 6e498158
      Jon Paul Maloy 提交于
      Link failover and synchronization have until now been handled by the
      links themselves, forcing them to have knowledge about and to access
      parallel links in order to make the two algorithms work correctly.
      
      In this commit, we move the control part of this functionality to the
      link aggregation level in node.c, which is the right location for this.
      As a result, the two algorithms become easier to follow, and the link
      implementation becomes simpler.
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e498158
    • J
      tipc: extend node FSM · 66996b6c
      Jon Paul Maloy 提交于
      In the next commit, we will move link synch/failover orchestration to
      the link aggregation level. In order to do this, we first need to extend
      the node FSM with two more states, NODE_SYNCHING and NODE_FAILINGOVER,
      plus four new events to enter and leave those states.
      
      This commit introduces this change, without yet making use of it.
      The node FSM now looks as follows:
      
                                 +-----------------------------------------+
                                 |                            PEER_DOWN_EVT|
                                 |                                         |
        +------------------------+----------------+                        |
        |SELF_DOWN_EVT           |                |                        |
        |                        |                |                        |
        |              +-----------+          +-----------+                |
        |              |NODE_      |          |NODE_      |                |
        |   +----------|FAILINGOVER|<---------|SYNCHING   |------------+   |
        |   |SELF_     +-----------+ FAILOVER_+-----------+    PEER_   |   |
        |   |DOWN_EVT   |         A  BEGIN_EVT A         |     DOWN_EVT|   |
        |   |           |         |            |         |             |   |
        |   |           |         |            |         |             |   |
        |   |           |FAILOVER_|FAILOVER_   |SYNCH_   |SYNCH_       |   |
        |   |           |END_EVT  |BEGIN_EVT   |BEGIN_EVT|END_EVT      |   |
        |   |           |         |            |         |             |   |
        |   |           |         |            |         |             |   |
        |   |           |        +--------------+        |             |   |
        |   |           +------->|   SELF_UP_   |<-------+             |   |
        |   |   +----------------|   PEER_UP    |------------------+   |   |
        |   |   |SELF_DOWN_EVT   +--------------+     PEER_DOWN_EVT|   |   |
        |   |   |                   A          A                   |   |   |
        |   |   |                   |          |                   |   |   |
        |   |   |        PEER_UP_EVT|          |SELF_UP_EVT        |   |   |
        |   |   |                   |          |                   |   |   |
        V   V   V                   |          |                   V   V   V
      +------------+       +-----------+    +-----------+       +------------+
      |SELF_DOWN_  |       |SELF_UP_   |    |PEER_UP_   |       |PEER_DOWN   |
      |PEER_LEAVING|<------|PEER_COMING|    |SELF_COMING|------>|SELF_LEAVING|
      +------------+ SELF_ +-----------+    +-----------+ PEER_ +------------+
             |       DOWN_EVT       A          A          DOWN_EVT     |
             |                      |          |                       |
             |                      |          |                       |
             |           SELF_UP_EVT|          |PEER_UP_EVT            |
             |                      |          |                       |
             |                      |          |                       |
             |PEER_DOWN_EVT       +--------------+        SELF_DOWN_EVT|
             +------------------->|  SELF_DOWN_  |<--------------------+
                                  |  PEER_DOWN   |
                                  +--------------+
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66996b6c
    • J
      tipc: reverse call order for link_reset()->node_link_down() · 655fb243
      Jon Paul Maloy 提交于
      In many cases the call order when a link is reset goes as follows:
      tipc_node_xx()->tipc_link_reset()->tipc_node_link_down()
      
      This is not the right order if we want the node to be in control,
      so in this commit we change the order to:
      tipc_node_xx()->tipc_node_link_down()->tipc_link_reset()
      
      The fact that tipc_link_reset() now is called from only one
      location with a well-defined state will also facilitate later
      simplifications of tipc_link_reset() and the link FSM.
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      655fb243
    • J
      tipc: move all link_reset() calls to link aggregation level · 6144a996
      Jon Paul Maloy 提交于
      In line with our effort to let the node level have full control over
      its links, we want to move all link reset calls from link.c to node.c.
      Some of the calls can be moved by simply moving the calling function,
      when this is the right thing to do. For the remaining calls we use
      the now established technique of returning a TIPC_LINK_DOWN_EVT
      flag from tipc_link_rcv(), whereafter we perform the reset call when
      the call returns.
      
      This change serves as a preparation for the coming commits.
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6144a996
    • J
      tipc: eliminate function tipc_link_activate() · cbeb83ca
      Jon Paul Maloy 提交于
      The function tipc_link_activate() is redundant, since it mostly performs
      settings that have already been done in a preceding tipc_link_reset().
      
      There are three exceptions to this:
      - The actual state change to TIPC_LINK_WORKING. This should anyway be done
        in the FSM, and not in a separate function.
      - Registration of the link with the bearer. This should be done by the
        node, since we don't want the link to have any knowledge about its
        specific bearer.
      - Call to tipc_node_link_up() for user access registration. With the new
        role distribution between link aggregation and link level this becomes
        the wrong call order; tipc_node_link_up() should instead be called
        directly as a result of a TIPC_LINK_UP event, hence by the node itself.
      
      This commit implements those changes.
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cbeb83ca
  12. 21 7月, 2015 1 次提交
    • J
      tipc: reduce locking scope during packet reception · d999297c
      Jon Paul Maloy 提交于
      We convert packet/message reception according to the same principle
      we have been using for message sending and timeout handling:
      
      We move the function tipc_rcv() to node.c, hence handling the initial
      packet reception at the link aggregation level. The function grabs
      the node lock, selects the receiving link, and accesses it via a new
      call tipc_link_rcv(). This function appends buffers to the input
      queue for delivery upwards, but it may also append outgoing packets
      to the xmit queue, just as we do during regular message sending. The
      latter will happen when buffers are forwarded from the link backlog,
      or when retransmission is requested.
      
      Upon return of this function, and after having released the node lock,
      tipc_rcv() delivers/tranmsits the contents of those queues, but it may
      also perform actions such as link activation or reset, as indicated by
      the return flags from the link.
      
      This reduces the number of cpu cycles spent inside the node spinlock,
      and reduces contention on that lock.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d999297c