1. 24 10月, 2015 3 次提交
    • J
      tipc: make struct tipc_link generic to support broadcast · c1ab3f1d
      Jon Paul Maloy 提交于
      Realizing that unicast is just a special case of broadcast, we also see
      that we can go in the other direction, i.e., that modest changes to the
      current unicast link can make it generic enough to support broadcast.
      
      The following changes are introduced here:
      
      - A new counter ("ackers") in struct tipc_link, to indicate how many
        peers need to ack a packet before it can be released.
      - A corresponding counter in the skb user area, to keep track of how
        many peers a are left to ack before a buffer can be released.
      - A new counter ("acked"), to keep persistent track of how far a peer
        has acked at the moment, i.e., where in the transmission queue to
        start updating buffers when the next ack arrives. This is to avoid
        double acknowledgements from a peer, with inadvertent relase of
        packets as a result.
      - A more generic tipc_link_retrans() function, where retransmit starts
        from a given sequence number, instead of the first packet in the
        transmision queue. This is to minimize the number of retransmitted
        packets on the broadcast media.
      
      When the new functionality is taken into use in the next commits,
      we expect it to have minimal effect on unicast mode performance.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c1ab3f1d
    • J
      tipc: use explicit allocation of broadcast send link · 32301906
      Jon Paul Maloy 提交于
      The broadcast link instance (struct tipc_link) used for sending is
      currently aggregated into struct tipc_bclink. This means that we cannot
      use the regular tipc_link_create() function for initiating the link, but
      do instead have to initiate numerous fields directly from the
      bcast_init() function.
      
      We want to reduce dependencies between the broadcast functionality
      and the inner workings of tipc_link. In this commit, we introduce
      a new function tipc_bclink_create() to link.c, and allocate the
      instance of the link separately using this function.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32301906
    • J
      tipc: make link implementation independent from struct tipc_bearer · 0e05498e
      Jon Paul Maloy 提交于
      In reality, the link implementation is already independent from
      struct tipc_bearer, in that it doesn't store any reference to it.
      However, we still pass on a pointer to a bearer instance in the
      function tipc_link_create(), just to have it extract some
      initialization information from it.
      
      I later commits, we need to create instances of tipc_link without
      having any associated struct tipc_bearer. To facilitate this, we
      want to extract the initialization data already in the creator
      function in node.c, before calling tipc_link_create(), and pass
      this info on as individual parameters in the call.
      
      This commit introduces this change.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e05498e
  2. 16 10月, 2015 7 次提交
    • J
      tipc: update node FSM when peer RESET message is received · c8199300
      Jon Paul Maloy 提交于
      The change made in the previous commit revealed a small flaw in the way
      the node FSM is updated. When the function tipc_node_link_down() is
      called for the last link to a node, we should check whether this was
      caused by a local reset or by a received RESET message from the peer.
      In the latter case, we can directly issue a PEER_LOST_CONTACT_EVT to
      the node FSM, so that it is ready to re-establish contact. If this is
      not done, the peer node will sometimes have to go through a second
      establish cycle before the link becomes stable.
      
      We fix this in this commit by conditionally issuing the mentioned
      event in the function tipc_node_link_down(). We also move LINK_RESET
      FSM even away from the link_reset() function and into the caller
      function, partially because it is easier to follow the code when state
      changes are gathered at a limited number of locations, partially
      because there will be cases in future commits where we don't want the
      link to go RESET mode when link_reset() is called.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c8199300
    • J
      tipc: send out RESET immediately when link goes down · 282b3a05
      Jon Paul Maloy 提交于
      When a link is taken down because of a node local event, such as
      disabling of a bearer or an interface, we currently leave it to the
      peer node to discover the broken communication. The default time for
      such failure discovery is 1.5-2 seconds.
      
      If we instead allow the terminating link endpoint to send out a RESET
      message at the moment it is reset, we can achieve the impression that
      both endpoints are going down instantly. Since this is a very common
      scenario, we find it worthwhile to make this small modification.
      
      Apart from letting the link produce the said message, we also have to
      ensure that the interface is able to transmit it before TIPC is
      detached. We do this by performing the disabling of a bearer in three
      steps:
      
      1) Disable reception of TIPC packets from the interface in question.
      2) Take down the links, while allowing them so send out a RESET message.
      3) Disable transmission of TIPC packets on the interface.
      
      Apart from this, we now have to react on the NETDEV_GOING_DOWN event,
      instead of as currently the NEDEV_DOWN event, to ensure that such
      transmission is possible during the teardown phase.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      282b3a05
    • J
      tipc: delay ESTABLISH state event when link is established · 73f646ce
      Jon Paul Maloy 提交于
      Link establishing, just like link teardown, is a non-atomic action, in
      the sense that discovering that conditions are right to establish a link,
      and the actual adding of the link to one of the node's send slots is done
      in two different lock contexts. The link FSM is designed to help bridging
      the gap between the two contexts in a safe manner.
      
      We have now discovered a weakness in the implementaton of this FSM.
      Because we directly let the link go from state LINK_ESTABLISHING to
      state LINK_ESTABLISHED already in the first lock context, we are unable
      to distinguish between a fully established link, i.e., a link that has
      been added to its slot, and a link that has not yet reached the second
      lock context. It may hence happen that a manual intervention, e.g., when
      disabling an interface, causes the function tipc_node_link_down() to try
      removing the link from the node slots, decrementing its active link
      counter etc, although the link was never added there in the first place.
      
      We solve this by delaying the actual state change until we reach the
      second lock context, inside the function tipc_node_link_up(). This
      makes it possible for potentail callers of __tipc_node_link_down() to
      know if they should proceed or not, and the problem is solved.
      
      Unforunately, the situation described above also has a second problem.
      Since there by necessity is a tipc_node_link_up() call pending once
      the node lock has been released, we must defuse that call by setting
      the link back from LINK_ESTABLISHING to LINK_RESET state. This forces
      us to make a slight modification to the link FSM, which will now look
      as follows.
      
       +------------------------------------+
       |RESET_EVT                           |
       |                                    |
       |                             +--------------+
       |           +-----------------|   SYNCHING   |-----------------+
       |           |FAILURE_EVT      +--------------+   PEER_RESET_EVT|
       |           |                  A            |                  |
       |           |                  |            |                  |
       |           |                  |            |                  |
       |           |                  |SYNCH_      |SYNCH_            |
       |           |                  |BEGIN_EVT   |END_EVT           |
       |           |                  |            |                  |
       |           V                  |            V                  V
       |    +-------------+          +--------------+          +------------+
       |    |  RESETTING  |<---------|  ESTABLISHED |--------->| PEER_RESET |
       |    +-------------+ FAILURE_ +--------------+ PEER_    +------------+
       |           |        EVT        |    A         RESET_EVT       |
       |           |                   |    |                         |
       |           |  +----------------+    |                         |
       |  RESET_EVT|  |RESET_EVT            |                         |
       |           |  |                     |                         |
       |           |  |                     |ESTABLISH_EVT            |
       |           |  |  +-------------+    |                         |
       |           |  |  | RESET_EVT   |    |                         |
       |           |  |  |             |    |                         |
       |           V  V  V             |    |                         |
       |    +-------------+          +--------------+        RESET_EVT|
       +--->|    RESET    |--------->| ESTABLISHING |<----------------+
            +-------------+ PEER_    +--------------+
             |           A  RESET_EVT       |
             |           |                  |
             |           |                  |
             |FAILOVER_  |FAILOVER_         |FAILOVER_
             |BEGIN_EVT  |END_EVT           |BEGIN_EVT
             |           |                  |
             V           |                  |
            +-------------+                 |
            | FAILINGOVER |<----------------+
            +-------------+
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73f646ce
    • J
      tipc: disallow packet duplicates in link deferred queue · 8306f99a
      Jon Paul Maloy 提交于
      After the previous commits, we are guaranteed that no packets
      of type LINK_PROTOCOL or with illegal sequence numbers will be
      attempted added to the link deferred queue. This makes it possible to
      make some simplifications to the sorting algorithm in the function
      tipc_skb_queue_sorted().
      
      We also alter the function so that it will drop packets if one with
      the same seqeunce number is already present in the queue. This is
      necessary because we have identified weird packet sequences, involving
      duplicate packets, where a legitimate in-sequence packet may advance to
      the head of the queue without being detected and de-queued.
      
      Finally, we make this function outline, since it will now be called only
      in exceptional cases.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8306f99a
    • J
      tipc: improve sequence number checking · 81204c49
      Jon Paul Maloy 提交于
      The sequence number of an incoming packet is currently only checked
      for less than, equality to, or bigger than the next expected number,
      meaning that the receive window in practice becomes one half sequence
      number cycle, or U16_MAX/2. This does not make sense, and may not even
      be safe if there are extreme delays in the network. Any packet sent by
      the peer during the ongoing cycle must belong inside his current send
      window, or should otherwise be dropped if possible.
      
      Since a link endpoint cannot know its peer's current send window, it
      has to base this sanity check on a worst-case assumption, i.e., that
      the peer is using a maximum sized window of 8191 packets. Using this
      assumption, we now add a check that the sequence number is not bigger
      than next_expected + TIPC_MAX_LINK_WIN. We also re-order the checks
      done, so that the receive window test is performed before the gap test.
      This way, we are guaranteed that no packet with illegal sequence numbers
      are ever added to the deferred queue.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      81204c49
    • J
      tipc: simplify tipc_link_rcv() reception loop · f9aa358a
      Jon Paul Maloy 提交于
      Currently, all packets received in tipc_link_rcv() are unconditionally
      added to the packet deferred queue, whereafter that queue is walked and
      all its buffers evaluated for delivery. This is both non-optimal and
      and makes the queue sorting function unnecessary complex.
      
      This commit changes the loop so that an arrived packet is evaluated
      first, and added to the deferred queue only when a sequence number gap
      is discovered. A non-empty deferred queue is walked until it is empty
      or until its head's sequence number doesn't fit.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9aa358a
    • J
      tipc: limit usage of temporary skb list during packet reception · 9945e804
      Jon Paul Maloy 提交于
      During packet reception, the function tipc_link_rcv() adds its accepted
      packets to a temporary buffer queue, before finally splicing this queue
      into the lock protected input queue that will be delivered up to the
      socket layer. The purpose is to reduce potential contention on the input
      queue lock. However, since the vast majority of packets arrive in
      sequence, they will anyway be added one by one to the input queue, and
      the use of the temporary queue becomes a sub-optimization.
      
      The only case where this queue makes sense is when unpacking buffers
      from a bundle packet; here we want to avoid dozens of small buffers
      to be added individually to the lock-protected input queue in a tight
      loop.
      
      In this commit, we remove the general usage of the temporary queue,
      and keep it only for the packet unbundling case.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9945e804
  3. 24 8月, 2015 2 次提交
    • J
      tipc: fix stale link problem during synchronization · 2be80c2d
      Jon Paul Maloy 提交于
      Recent changes to the link synchronization means that we can now just
      drop packets arriving on the synchronizing link before the synch point
      is reached. This has lead to significant simplifications to the
      implementation, but also turns out to have a flip side that we need
      to consider.
      
      Under unlucky circumstances, the two endpoints may end up
      repeatedly dropping each other's packets, while immediately
      asking for retransmission of the same packets, just to drop
      them once more. This pattern will eventually be broken when
      the synch point is reached on the other link, but before that,
      the endpoints may have arrived at the retransmission limit
      (stale counter) that indicates that the link should be broken.
      We see this happen at rare occasions.
      
      The fix for this is to not ask for retransmissions when a link is in
      state LINK_SYNCHING. The fact that the link has reached this state
      means that it has already received the first SYNCH packet, and that it
      knows the synch point. Hence, it doesn't need any more packets until the
      other link has reached the synch point, whereafter it can go ahead and
      ask for the missing packets.
      
      However, because of the reduced traffic on the synching link that
      follows this change, it may now take longer to discover that the
      synch point has been reached. We compensate for this by letting all
      packets, on any of the links, trig a check for synchronization
      termination. This is possible because the packets themselves don't
      contain any information that is needed for discovering this condition.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2be80c2d
    • J
      tipc: interrupt link synchronization when a link goes down · 5ae2f8e6
      Jon Paul Maloy 提交于
      When we introduced the new link failover/synch mechanism
      in commit 6e498158
      ("tipc: move link synch and failover to link aggregation level"),
      we missed the case when the non-tunnel link goes down during the link
      synchronization period. In this case the tunnel link will remain in
      state LINK_SYNCHING, something leading to unpredictable behavior when
      the failover procedure is initiated.
      
      In this commit, we ensure that the node and remaining link goes
      back to regular communication state (SELF_UP_PEER_UP/LINK_ESTABLISHED)
      when one of the parallel links goes down. We also ensure that we don't
      re-enter synch mode if subsequent SYNCH packets arrive on the remaining
      link.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5ae2f8e6
  4. 31 7月, 2015 10 次提交
    • J
      tipc: clean up link creation · 440d8963
      Jon Paul Maloy 提交于
      We simplify the link creation function tipc_link_create() and the way
      the link struct it is connected to the node struct. In particular, we
      remove the duplicate initialization of some fields which are anyway set
      in tipc_link_reset().
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      440d8963
    • J
      tipc: use temporary, non-protected skb queue for bundle reception · 9073fb8b
      Jon Paul Maloy 提交于
      Currently, when we extract small messages from a message bundle, or
      when many messages have accumulated in the link arrival queue, those
      messages are added one by one to the lock protected link input queue.
      This may increase contention with the reader of that queue, in
      the function tipc_sk_rcv().
      
      This commit introduces a temporary, unprotected input queue in
      tipc_link_rcv() for such cases. Only when the arrival queue has been
      emptied, and the function is ready to return, does it splice the whole
      temporary queue into the real input queue.
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9073fb8b
    • J
      tipc: remove implicit message delivery in node_unlock() · 23d8335d
      Jon Paul Maloy 提交于
      After the most recent changes, all access calls to a link which
      may entail addition of messages to the link's input queue are
      postpended by an explicit call to tipc_sk_rcv(), using a reference
      to the correct queue.
      
      This means that the potentially hazardous implicit delivery, using
      tipc_node_unlock() in combination with a binary flag and a cached
      queue pointer, now has become redundant.
      
      This commit removes this implicit delivery mechanism both for regular
      data messages and for binding table update messages.
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23d8335d
    • J
      tipc: make resetting of links non-atomic · 598411d7
      Jon Paul Maloy 提交于
      In order to facilitate future improvements to the locking structure, we
      want to make resetting and establishing of links non-atomic. I.e., the
      functions tipc_node_link_up() and tipc_node_link_down() should be called
      from outside the node lock context, and grab/release the node lock
      themselves. This requires that we can freeze the link state from the
      moment it is set to RESETTING or PEER_RESET in one lock context until
      it is set to RESET or ESTABLISHING in a later context. The recently
      introduced link FSM makes this possible, so we are now ready to introduce
      the above change.
      
      This commit implements this.
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      598411d7
    • J
      tipc: merge link->exec_mode and link->state into one FSM · 662921cd
      Jon Paul Maloy 提交于
      Until now, we have been handling link failover and synchronization
      by using an additional link state variable, "exec_mode". This variable
      is not independent of the link FSM state, something causing a risk of
      inconsistencies, apart from the fact that it clutters the code.
      
      The conditions are now in place to define a new link FSM that covers
      all existing use cases, including failover and synchronization, and
      eliminate the "exec_mode" field altogether. The FSM must also support
      non-atomic resetting of links, which will be introduced later.
      
      The new link FSM is shown below, with 7 states and 8 events.
      Only events leading to state change are shown as edges.
      
      +------------------------------------+
      |RESET_EVT                           |
      |                                    |
      |                             +--------------+
      |           +-----------------|   SYNCHING   |-----------------+
      |           |FAILURE_EVT      +--------------+   PEER_RESET_EVT|
      |           |                  A            |                  |
      |           |                  |            |                  |
      |           |                  |            |                  |
      |           |                  |SYNCH_      |SYNCH_            |
      |           |                  |BEGIN_EVT   |END_EVT           |
      |           |                  |            |                  |
      |           V                  |            V                  V
      |    +-------------+          +--------------+          +------------+
      |    |  RESETTING  |<---------|  ESTABLISHED |--------->| PEER_RESET |
      |    +-------------+ FAILURE_ +--------------+ PEER_    +------------+
      |           |        EVT        |    A         RESET_EVT       |
      |           |                   |    |                         |
      |           |                   |    |                         |
      |           |    +--------------+    |                         |
      |  RESET_EVT|    |RESET_EVT          |ESTABLISH_EVT            |
      |           |    |                   |                         |
      |           |    |                   |                         |
      |           V    V                   |                         |
      |    +-------------+          +--------------+        RESET_EVT|
      +--->|    RESET    |--------->| ESTABLISHING |<----------------+
           +-------------+ PEER_    +--------------+
            |           A  RESET_EVT       |
            |           |                  |
            |           |                  |
            |FAILOVER_  |FAILOVER_         |FAILOVER_
            |BEGIN_EVT  |END_EVT           |BEGIN_EVT
            |           |                  |
            V           |                  |
           +-------------+                 |
           | FAILINGOVER |<----------------+
           +-------------+
      
      These changes are fully backwards compatible.
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      662921cd
    • J
      tipc: move protocol message sending away from link FSM · 5045f7b9
      Jon Paul Maloy 提交于
      The implementation of the link FSM currently takes decisions about and
      sends out link protocol messages. This is unnecessary, since such
      actions are not the result of any link state change, and are even
      decided based on non-FSM state information ("silent_intv_cnt").
      
      We now move the sending of unicast link protocol messages to the
      function tipc_link_timeout(), and the initial broadcast synchronization
      message to tipc_node_link_up(). The latter is done because a link
      instance should not need to know whether it is the first or second
      link to a destination. Such information is now restricted to and
      handled by the link aggregation layer in node.c
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5045f7b9
    • J
      tipc: move link synch and failover to link aggregation level · 6e498158
      Jon Paul Maloy 提交于
      Link failover and synchronization have until now been handled by the
      links themselves, forcing them to have knowledge about and to access
      parallel links in order to make the two algorithms work correctly.
      
      In this commit, we move the control part of this functionality to the
      link aggregation level in node.c, which is the right location for this.
      As a result, the two algorithms become easier to follow, and the link
      implementation becomes simpler.
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e498158
    • J
      tipc: reverse call order for link_reset()->node_link_down() · 655fb243
      Jon Paul Maloy 提交于
      In many cases the call order when a link is reset goes as follows:
      tipc_node_xx()->tipc_link_reset()->tipc_node_link_down()
      
      This is not the right order if we want the node to be in control,
      so in this commit we change the order to:
      tipc_node_xx()->tipc_node_link_down()->tipc_link_reset()
      
      The fact that tipc_link_reset() now is called from only one
      location with a well-defined state will also facilitate later
      simplifications of tipc_link_reset() and the link FSM.
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      655fb243
    • J
      tipc: move all link_reset() calls to link aggregation level · 6144a996
      Jon Paul Maloy 提交于
      In line with our effort to let the node level have full control over
      its links, we want to move all link reset calls from link.c to node.c.
      Some of the calls can be moved by simply moving the calling function,
      when this is the right thing to do. For the remaining calls we use
      the now established technique of returning a TIPC_LINK_DOWN_EVT
      flag from tipc_link_rcv(), whereafter we perform the reset call when
      the call returns.
      
      This change serves as a preparation for the coming commits.
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6144a996
    • J
      tipc: eliminate function tipc_link_activate() · cbeb83ca
      Jon Paul Maloy 提交于
      The function tipc_link_activate() is redundant, since it mostly performs
      settings that have already been done in a preceding tipc_link_reset().
      
      There are three exceptions to this:
      - The actual state change to TIPC_LINK_WORKING. This should anyway be done
        in the FSM, and not in a separate function.
      - Registration of the link with the bearer. This should be done by the
        node, since we don't want the link to have any knowledge about its
        specific bearer.
      - Call to tipc_node_link_up() for user access registration. With the new
        role distribution between link aggregation and link level this becomes
        the wrong call order; tipc_node_link_up() should instead be called
        directly as a result of a TIPC_LINK_UP event, hence by the node itself.
      
      This commit implements those changes.
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cbeb83ca
  5. 30 7月, 2015 1 次提交
    • J
      tipc: fix bug in broadcast synch message create function · 5a4c3552
      Jon Maloy 提交于
      In commit d999297c
      ("tipc: reduce locking scope during packet reception") we introduced
      a new function tipc_build_bcast_sync_msg(), which carries initial
      synchronization data between two nodes at first contact and at
      re-contact. In this function, we missed to add synchronization data,
      with the effect that the broadcast link endpoints will fail to
      synchronize correctly at re-contact between a running and a restarted
      node. All other cases work as intended.
      
      With this commit, we fix this bug.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a4c3552
  6. 22 7月, 2015 1 次提交
    • J
      tipc: fix compatibility bug · 16040894
      Jon Paul Maloy 提交于
      In commit d999297c
      ("tipc: reduce locking scope during packet reception") we introduced
      a new function tipc_link_proto_rcv(). This function contains a bug,
      so that it sometimes by error sends out a non-zero link priority value
      in created protocol messages.
      
      The bug may lead to an extra link reset at initial link establising
      with older nodes. This will never happen more than once, whereafter
      the link will work as intended.
      
      We fix this bug in this commit.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16040894
  7. 21 7月, 2015 11 次提交
    • J
      tipc: reduce locking scope during packet reception · d999297c
      Jon Paul Maloy 提交于
      We convert packet/message reception according to the same principle
      we have been using for message sending and timeout handling:
      
      We move the function tipc_rcv() to node.c, hence handling the initial
      packet reception at the link aggregation level. The function grabs
      the node lock, selects the receiving link, and accesses it via a new
      call tipc_link_rcv(). This function appends buffers to the input
      queue for delivery upwards, but it may also append outgoing packets
      to the xmit queue, just as we do during regular message sending. The
      latter will happen when buffers are forwarded from the link backlog,
      or when retransmission is requested.
      
      Upon return of this function, and after having released the node lock,
      tipc_rcv() delivers/tranmsits the contents of those queues, but it may
      also perform actions such as link activation or reset, as indicated by
      the return flags from the link.
      
      This reduces the number of cpu cycles spent inside the node spinlock,
      and reduces contention on that lock.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d999297c
    • J
      tipc: introduce node contact FSM · 1a20cc25
      Jon Paul Maloy 提交于
      The logics for determining when a node is permitted to establish
      and maintain contact with its peer node becomes non-trivial in the
      presence of multiple parallel links that may come and go independently.
      
      A known failure scenario is that one endpoint registers both its links
      to the peer lost, cleans up it binding table, and prepares for a table
      update once contact is re-establihed, while the other endpoint may
      see its links reset and re-established one by one, hence seeing
      no need to re-synchronize the binding table. To avoid this, a node
      must not allow re-establishing contact until it has confirmation that
      even the peer has lost both links.
      
      Currently, the mechanism for handling this consists of setting and
      resetting two state flags from different locations in the code. This
      solution is hard to understand and maintain. A closer analysis even
      reveals that it is not completely safe.
      
      In this commit we do instead introduce an FSM that keeps track of
      the conditions for when the node can establish and maintain links.
      It has six states and four events, and is strictly based on explicit
      knowledge about the own node's and the peer node's contact states.
      Only events leading to state change are shown as edges in the figure
      below.
      
                                   +--------------+
                                   | SELF_UP/     |
                 +---------------->| PEER_COMING  |-----------------+
          SELF_  |                 +--------------+                 |PEER_
          ESTBL_ |                        |                         |ESTBL_
          CONTACT|      SELF_LOST_CONTACT |                         |CONTACT
                 |                        v                         |
                 |                 +--------------+                 |
                 |      PEER_      | SELF_DOWN/   |     SELF_       |
                 |      LOST_   +--| PEER_LEAVING |<--+ LOST_       v
      +-------------+   CONTACT |  +--------------+   | CONTACT  +-----------+
      | SELF_DOWN/  |<----------+                     +----------| SELF_UP/  |
      | PEER_DOWN   |<----------+                     +----------| PEER_UP   |
      +-------------+   SELF_   |  +--------------+   | PEER_    +-----------+
                 |      LOST_   +--| SELF_LEAVING/|<--+ LOST_       A
                 |      CONTACT    | PEER_DOWN    |     CONTACT     |
                 |                 +--------------+                 |
                 |                         A                        |
          PEER_  |       PEER_LOST_CONTACT |                        |SELF_
          ESTBL_ |                         |                        |ESTBL_
          CONTACT|                 +--------------+                 |CONTACT
                 +---------------->| PEER_UP/     |-----------------+
                                   | SELF_COMING  |
                                   +--------------+
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a20cc25
    • J
      tipc: move link supervision timer to node level · 8a1577c9
      Jon Paul Maloy 提交于
      In our effort to move control of the links to the link aggregation
      layer, we move the perodic link supervision timer to struct tipc_node.
      The new timer is shared between all links belonging to the node, thus
      saving resources, while still kicking the FSM on both its pertaining
      links at each expiration.
      
      The current link timer and corresponding functions are removed.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a1577c9
    • J
      tipc: simplify link timer implementation · 333ef69e
      Jon Paul Maloy 提交于
      We create a second, simpler, link timer function, tipc_link_timeout().
      The new function  makes use of the new FSM function introduced in the
      previous commit, and just like it, takes a buffer queue as parameter.
      It returns an event bit field and potentially a link protocol packet
      to the caller.
      
      The existing timer function, link_timeout(), is still needed for a
      while, so we redesign it to become a wrapper around the new function.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      333ef69e
    • J
      tipc: improve link FSM implementation · 6ab30f9c
      Jon Paul Maloy 提交于
      The link FSM implementation is currently unnecessarily complex.
      It sometimes checks for conditional state outside the FSM data
      before deciding next state, and often performs actions directly
      inside the FSM logics.
      
      In this commit, we create a second, simpler FSM implementation,
      that as far as possible acts only on states and events that it is
      strictly defined for, and postpone any actions until it is finished
      with its decisions. It also returns an event flag field and an a
      buffer queue which may potentially contain a protocol message to
      be sent by the caller.
      
      Unfortunately, we cannot yet make the FSM "clean", in the sense
      that its decisions are only based on FSM state and event, and that
      state changes happen only here. That will have to wait until the
      activate/reset logics has been cleaned up in a future commit.
      
      We also rename the link states as follows:
      
      WORKING_WORKING -> TIPC_LINK_WORKING
      WORKING_UNKNOWN -> TIPC_LINK_PROBING
      RESET_UNKNOWN   -> TIPC_LINK_RESETTING
      RESET_RESET     -> TIPC_LINK_ESTABLISHING
      
      The existing FSM function, link_state_event(), is still needed for
      a while, so we redesign it to make use of the new function.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ab30f9c
    • J
      tipc: introduce new link protocol msg create function · 426cc2b8
      Jon Paul Maloy 提交于
      As a preparation for later changes, we introduce a new function
      tipc_link_build_proto_msg(). Instead of actually sending the created
      protocol message, it only creates it and adds it to the head of a
      skb queue provided by the caller.
      
      Since we still need the existing function tipc_link_protocol_xmit()
      for a while, we redesign it to make use of the new function.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      426cc2b8
    • J
      tipc: clean up definitions and usage of link flags · d3504c34
      Jon Paul Maloy 提交于
      The status flag LINK_STOPPED is not needed any more, since the
      mechanism for delayed deletion of links has been removed.
      Likewise, LINK_STARTED and LINK_START_EVT are unnecessary,
      because we can just as well start the link timer directly from
      inside tipc_link_create().
      
      We eliminate these flags in this commit.
      
      Instead of the above flags, we now introduce three new link modes,
      TIPC_LINK_OPEN, TIPC_LINK_BLOCKED and TIPC_LINK_TUNNEL. The values
      indicate whether, and in the case of TIPC_LINK_TUNNEL, which, messages
      the link is allowed to receive in this state. TIPC_LINK_BLOCKED also
      blocks timer-driven protocol messages to be sent out, and any change
      to the link FSM. Since the modes are mutually exclusive, we convert
      them to state values, and rename the 'flags' field in struct tipc_link
      to 'exec_mode'.
      
      Finally, we move the #defines for link FSM states and events from link.h
      into enums inside the file link.c, which is the real usage scope of
      these definitions.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3504c34
    • J
      tipc: make media xmit call outside node spinlock context · af9b028e
      Jon Paul Maloy 提交于
      Currently, message sending is performed through a deep call chain,
      where the node spinlock is grabbed and held during a significant
      part of the transmission time. This is clearly detrimental to
      overall throughput performance; it would be better if we could send
      the message after the spinlock has been released.
      
      In this commit, we do instead let the call revert on the stack after
      the buffer chain has been added to the transmission queue, whereafter
      clones of the buffers are transmitted to the device layer outside the
      spinlock scope.
      
      As a further step in our effort to separate the roles of the node
      and link entities we also move the function tipc_link_xmit() to
      node.c, and rename it to tipc_node_xmit().
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af9b028e
    • J
      tipc: change sk_buffer handling in tipc_link_xmit() · 22d85c79
      Jon Paul Maloy 提交于
      When the function tipc_link_xmit() is given a buffer list for
      transmission, it currently consumes the list both when transmission
      is successful and when it fails, except for the special case when
      it encounters link congestion.
      
      This behavior is inconsistent, and needs to be corrected if we want
      to avoid problems in later commits in this series.
      
      In this commit, we change this to let the function consume the list
      only when transmission is successful, and leave the list with the
      sender in all other cases. We also modifiy the socket code so that
      it adapts to this change, i.e., purges the list when a non-congestion
      error code is returned.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22d85c79
    • J
      tipc: move link input queue to tipc_node · d39bbd44
      Jon Paul Maloy 提交于
      At present, the link input queue and the name distributor receive
      queues are fields aggregated in struct tipc_link. This is a hazard,
      because a link might be deleted while a receiving socket still keeps
      reference to one of the queues.
      
      This commit fixes this bug. However, rather than adding yet another
      reference counter to the critical data path, we move the two queues
      to safe ground inside struct tipc_node, which is already protected, and
      let the link code only handle references to the queues. This is also
      in line with planned later changes in this area.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d39bbd44
    • J
      tipc: introduce link entry structure to struct tipc_node · 9d13ec65
      Jon Paul Maloy 提交于
      struct 'tipc_node' currently contains two arrays for link attributes,
      one for the link pointers, and one for the usable link MTUs.
      
      We now group those into a new struct 'tipc_link_entry', and intoduce
      one single array consisting of such enties. Apart from being a cosmetic
      improvement, this is a starting point for the strict master-slave
      relation between node and link that we will introduce in the following
      commits.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d13ec65
  8. 29 6月, 2015 1 次提交
    • J
      tipc: purge backlog queue counters when broadcast link is reset · 7d967b67
      Jon Paul Maloy 提交于
      In commit 1f66d161
      ("tipc: introduce starvation free send algorithm")
      we introduced a counter per priority level for buffers
      in the link backlog queue. We also introduced a new
      function tipc_link_purge_backlog(), to reset these
      counters to zero when the link is reset.
      
      Unfortunately, we missed to call this function when
      the broadcast link is reset, with the result that the
      values of these counters might be permanently skewed
      when new nodes are attached. This may in the worst case
      lead to permananent, but spurious, broadcast link
      congestion, where no broadcast packets can be sent at
      all.
      
      We fix this bug with this commit.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d967b67
  9. 27 5月, 2015 1 次提交
    • J
      tipc: fix bug in link protocol message create function · f3903bcc
      Jon Paul Maloy 提交于
      In commit dd3f9e70
      ("tipc: add packet sequence number at instant of transmission") we
      made a change with the consequence that packets in the link backlog
      queue don't contain valid sequence numbers.
      
      However, when we create a link protocol message, we still use the
      sequence number of the first packet in the backlog, if there is any,
      as "next_sent" indicator in the message. This may entail unnecessary
      retransissions or stale packet transmission when there is very low
      traffic on the link.
      
      This commit fixes this issue by only using the current value of
      tipc_link::snd_nxt as indicator.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f3903bcc
  10. 15 5月, 2015 3 次提交
    • J
      tipc: add packet sequence number at instant of transmission · dd3f9e70
      Jon Paul Maloy 提交于
      Currently, the packet sequence number is updated and added to each
      packet at the moment a packet is added to the link backlog queue.
      This is wasteful, since it forces the code to traverse the send
      packet list packet by packet when adding them to the backlog queue.
      It would be better to just splice the whole packet list into the
      backlog queue when that is the right action to do.
      
      In this commit, we do this change. Also, since the sequence numbers
      cannot now be assigned to the packets at the moment they are added
      the backlog queue, we do instead calculate and add them at the moment
      of transmission, when the backlog queue has to be traversed anyway.
      We do this in the function tipc_link_push_packet().
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dd3f9e70
    • J
      tipc: improve link congestion algorithm · f21e897e
      Jon Paul Maloy 提交于
      The link congestion algorithm used until now implies two problems.
      
      - It is too generous towards lower-level messages in situations of high
        load by giving "absolute" bandwidth guarantees to the different
        priority levels. LOW traffic is guaranteed 10%, MEDIUM is guaranted
        20%, HIGH is guaranteed 30%, and CRITICAL is guaranteed 40% of the
        available bandwidth. But, in the absence of higher level traffic, the
        ratio between two distinct levels becomes unreasonable. E.g. if there
        is only LOW and MEDIUM traffic on a system, the former is guaranteed
        1/3 of the bandwidth, and the latter 2/3. This again means that if
        there is e.g. one LOW user and 10 MEDIUM users, the  former will have
        33.3% of the bandwidth, and the others will have to compete for the
        remainder, i.e. each will end up with 6.7% of the capacity.
      
      - Packets of type MSG_BUNDLER are created at SYSTEM importance level,
        but only after the packets bundled into it have passed the congestion
        test for their own respective levels. Since bundled packets don't
        result in incrementing the level counter for their own importance,
        only occasionally for the SYSTEM level counter, they do in practice
        obtain SYSTEM level importance. Hence, the current implementation
        provides a gap in the congestion algorithm that in the worst case
        may lead to a link reset.
      
      We now refine the congestion algorithm as follows:
      
      - A message is accepted to the link backlog only if its own level
        counter, and all superior level counters, permit it.
      
      - The importance of a created bundle packet is set according to its
        contents. A bundle packet created from messges at levels LOW to
        CRITICAL is given importance level CRITICAL, while a bundle created
        from a SYSTEM level message is given importance SYSTEM. In the latter
        case only subsequent SYSTEM level messages are allowed to be bundled
        into it.
      
      This solves the first problem described above, by making the bandwidth
      guarantee relative to the total number of users at all levels; only
      the upper limit for each level remains absolute. In the example
      described above, the single LOW user would use 1/11th of the bandwidth,
      the same as each of the ten MEDIUM users, but he still has the same
      guarantee against starvation as the latter ones.
      
      The fix also solves the second problem. If the CRITICAL level is filled
      up by bundle packets of that level, no lower level packets will be
      accepted any more.
      Suggested-by: NGergely Kiss <gergely.kiss@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f21e897e
    • J
      tipc: simplify link supervision checkpointing · cd4eee3c
      Jon Paul Maloy 提交于
      We change the sequence number checkpointing that is performed
      by the timer in order to discover if the peer is active. Currently,
      we store a checkpoint of the next expected sequence number "rcv_nxt"
      at each timer expiration, and compare it to the current expected
      number at next timeout expiration. Instead, we now use the already
      existing field "silent_intv_cnt" for this task. We step the counter
      at each timeout expiration, and zero it at each valid received packet.
      If no valid packet has been received from the peer after "abort_limit"
      number of silent timer intervals, the link is declared faulty and reset.
      
      We also remove the multiple instances of timer activation from inside
      the FSM function "link_state_event()", and now do it at only one place;
      at the end of the timer function itself.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd4eee3c