1. 31 7月, 2015 7 次提交
    • J
      tipc: make resetting of links non-atomic · 598411d7
      Jon Paul Maloy 提交于
      In order to facilitate future improvements to the locking structure, we
      want to make resetting and establishing of links non-atomic. I.e., the
      functions tipc_node_link_up() and tipc_node_link_down() should be called
      from outside the node lock context, and grab/release the node lock
      themselves. This requires that we can freeze the link state from the
      moment it is set to RESETTING or PEER_RESET in one lock context until
      it is set to RESET or ESTABLISHING in a later context. The recently
      introduced link FSM makes this possible, so we are now ready to introduce
      the above change.
      
      This commit implements this.
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      598411d7
    • J
      tipc: merge link->exec_mode and link->state into one FSM · 662921cd
      Jon Paul Maloy 提交于
      Until now, we have been handling link failover and synchronization
      by using an additional link state variable, "exec_mode". This variable
      is not independent of the link FSM state, something causing a risk of
      inconsistencies, apart from the fact that it clutters the code.
      
      The conditions are now in place to define a new link FSM that covers
      all existing use cases, including failover and synchronization, and
      eliminate the "exec_mode" field altogether. The FSM must also support
      non-atomic resetting of links, which will be introduced later.
      
      The new link FSM is shown below, with 7 states and 8 events.
      Only events leading to state change are shown as edges.
      
      +------------------------------------+
      |RESET_EVT                           |
      |                                    |
      |                             +--------------+
      |           +-----------------|   SYNCHING   |-----------------+
      |           |FAILURE_EVT      +--------------+   PEER_RESET_EVT|
      |           |                  A            |                  |
      |           |                  |            |                  |
      |           |                  |            |                  |
      |           |                  |SYNCH_      |SYNCH_            |
      |           |                  |BEGIN_EVT   |END_EVT           |
      |           |                  |            |                  |
      |           V                  |            V                  V
      |    +-------------+          +--------------+          +------------+
      |    |  RESETTING  |<---------|  ESTABLISHED |--------->| PEER_RESET |
      |    +-------------+ FAILURE_ +--------------+ PEER_    +------------+
      |           |        EVT        |    A         RESET_EVT       |
      |           |                   |    |                         |
      |           |                   |    |                         |
      |           |    +--------------+    |                         |
      |  RESET_EVT|    |RESET_EVT          |ESTABLISH_EVT            |
      |           |    |                   |                         |
      |           |    |                   |                         |
      |           V    V                   |                         |
      |    +-------------+          +--------------+        RESET_EVT|
      +--->|    RESET    |--------->| ESTABLISHING |<----------------+
           +-------------+ PEER_    +--------------+
            |           A  RESET_EVT       |
            |           |                  |
            |           |                  |
            |FAILOVER_  |FAILOVER_         |FAILOVER_
            |BEGIN_EVT  |END_EVT           |BEGIN_EVT
            |           |                  |
            V           |                  |
           +-------------+                 |
           | FAILINGOVER |<----------------+
           +-------------+
      
      These changes are fully backwards compatible.
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      662921cd
    • J
      tipc: move protocol message sending away from link FSM · 5045f7b9
      Jon Paul Maloy 提交于
      The implementation of the link FSM currently takes decisions about and
      sends out link protocol messages. This is unnecessary, since such
      actions are not the result of any link state change, and are even
      decided based on non-FSM state information ("silent_intv_cnt").
      
      We now move the sending of unicast link protocol messages to the
      function tipc_link_timeout(), and the initial broadcast synchronization
      message to tipc_node_link_up(). The latter is done because a link
      instance should not need to know whether it is the first or second
      link to a destination. Such information is now restricted to and
      handled by the link aggregation layer in node.c
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5045f7b9
    • J
      tipc: move link synch and failover to link aggregation level · 6e498158
      Jon Paul Maloy 提交于
      Link failover and synchronization have until now been handled by the
      links themselves, forcing them to have knowledge about and to access
      parallel links in order to make the two algorithms work correctly.
      
      In this commit, we move the control part of this functionality to the
      link aggregation level in node.c, which is the right location for this.
      As a result, the two algorithms become easier to follow, and the link
      implementation becomes simpler.
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e498158
    • J
      tipc: reverse call order for link_reset()->node_link_down() · 655fb243
      Jon Paul Maloy 提交于
      In many cases the call order when a link is reset goes as follows:
      tipc_node_xx()->tipc_link_reset()->tipc_node_link_down()
      
      This is not the right order if we want the node to be in control,
      so in this commit we change the order to:
      tipc_node_xx()->tipc_node_link_down()->tipc_link_reset()
      
      The fact that tipc_link_reset() now is called from only one
      location with a well-defined state will also facilitate later
      simplifications of tipc_link_reset() and the link FSM.
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      655fb243
    • J
      tipc: move all link_reset() calls to link aggregation level · 6144a996
      Jon Paul Maloy 提交于
      In line with our effort to let the node level have full control over
      its links, we want to move all link reset calls from link.c to node.c.
      Some of the calls can be moved by simply moving the calling function,
      when this is the right thing to do. For the remaining calls we use
      the now established technique of returning a TIPC_LINK_DOWN_EVT
      flag from tipc_link_rcv(), whereafter we perform the reset call when
      the call returns.
      
      This change serves as a preparation for the coming commits.
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6144a996
    • J
      tipc: eliminate function tipc_link_activate() · cbeb83ca
      Jon Paul Maloy 提交于
      The function tipc_link_activate() is redundant, since it mostly performs
      settings that have already been done in a preceding tipc_link_reset().
      
      There are three exceptions to this:
      - The actual state change to TIPC_LINK_WORKING. This should anyway be done
        in the FSM, and not in a separate function.
      - Registration of the link with the bearer. This should be done by the
        node, since we don't want the link to have any knowledge about its
        specific bearer.
      - Call to tipc_node_link_up() for user access registration. With the new
        role distribution between link aggregation and link level this becomes
        the wrong call order; tipc_node_link_up() should instead be called
        directly as a result of a TIPC_LINK_UP event, hence by the node itself.
      
      This commit implements those changes.
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cbeb83ca
  2. 30 7月, 2015 1 次提交
    • J
      tipc: fix bug in broadcast synch message create function · 5a4c3552
      Jon Maloy 提交于
      In commit d999297c
      ("tipc: reduce locking scope during packet reception") we introduced
      a new function tipc_build_bcast_sync_msg(), which carries initial
      synchronization data between two nodes at first contact and at
      re-contact. In this function, we missed to add synchronization data,
      with the effect that the broadcast link endpoints will fail to
      synchronize correctly at re-contact between a running and a restarted
      node. All other cases work as intended.
      
      With this commit, we fix this bug.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a4c3552
  3. 22 7月, 2015 1 次提交
    • J
      tipc: fix compatibility bug · 16040894
      Jon Paul Maloy 提交于
      In commit d999297c
      ("tipc: reduce locking scope during packet reception") we introduced
      a new function tipc_link_proto_rcv(). This function contains a bug,
      so that it sometimes by error sends out a non-zero link priority value
      in created protocol messages.
      
      The bug may lead to an extra link reset at initial link establising
      with older nodes. This will never happen more than once, whereafter
      the link will work as intended.
      
      We fix this bug in this commit.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16040894
  4. 21 7月, 2015 11 次提交
    • J
      tipc: reduce locking scope during packet reception · d999297c
      Jon Paul Maloy 提交于
      We convert packet/message reception according to the same principle
      we have been using for message sending and timeout handling:
      
      We move the function tipc_rcv() to node.c, hence handling the initial
      packet reception at the link aggregation level. The function grabs
      the node lock, selects the receiving link, and accesses it via a new
      call tipc_link_rcv(). This function appends buffers to the input
      queue for delivery upwards, but it may also append outgoing packets
      to the xmit queue, just as we do during regular message sending. The
      latter will happen when buffers are forwarded from the link backlog,
      or when retransmission is requested.
      
      Upon return of this function, and after having released the node lock,
      tipc_rcv() delivers/tranmsits the contents of those queues, but it may
      also perform actions such as link activation or reset, as indicated by
      the return flags from the link.
      
      This reduces the number of cpu cycles spent inside the node spinlock,
      and reduces contention on that lock.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d999297c
    • J
      tipc: introduce node contact FSM · 1a20cc25
      Jon Paul Maloy 提交于
      The logics for determining when a node is permitted to establish
      and maintain contact with its peer node becomes non-trivial in the
      presence of multiple parallel links that may come and go independently.
      
      A known failure scenario is that one endpoint registers both its links
      to the peer lost, cleans up it binding table, and prepares for a table
      update once contact is re-establihed, while the other endpoint may
      see its links reset and re-established one by one, hence seeing
      no need to re-synchronize the binding table. To avoid this, a node
      must not allow re-establishing contact until it has confirmation that
      even the peer has lost both links.
      
      Currently, the mechanism for handling this consists of setting and
      resetting two state flags from different locations in the code. This
      solution is hard to understand and maintain. A closer analysis even
      reveals that it is not completely safe.
      
      In this commit we do instead introduce an FSM that keeps track of
      the conditions for when the node can establish and maintain links.
      It has six states and four events, and is strictly based on explicit
      knowledge about the own node's and the peer node's contact states.
      Only events leading to state change are shown as edges in the figure
      below.
      
                                   +--------------+
                                   | SELF_UP/     |
                 +---------------->| PEER_COMING  |-----------------+
          SELF_  |                 +--------------+                 |PEER_
          ESTBL_ |                        |                         |ESTBL_
          CONTACT|      SELF_LOST_CONTACT |                         |CONTACT
                 |                        v                         |
                 |                 +--------------+                 |
                 |      PEER_      | SELF_DOWN/   |     SELF_       |
                 |      LOST_   +--| PEER_LEAVING |<--+ LOST_       v
      +-------------+   CONTACT |  +--------------+   | CONTACT  +-----------+
      | SELF_DOWN/  |<----------+                     +----------| SELF_UP/  |
      | PEER_DOWN   |<----------+                     +----------| PEER_UP   |
      +-------------+   SELF_   |  +--------------+   | PEER_    +-----------+
                 |      LOST_   +--| SELF_LEAVING/|<--+ LOST_       A
                 |      CONTACT    | PEER_DOWN    |     CONTACT     |
                 |                 +--------------+                 |
                 |                         A                        |
          PEER_  |       PEER_LOST_CONTACT |                        |SELF_
          ESTBL_ |                         |                        |ESTBL_
          CONTACT|                 +--------------+                 |CONTACT
                 +---------------->| PEER_UP/     |-----------------+
                                   | SELF_COMING  |
                                   +--------------+
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a20cc25
    • J
      tipc: move link supervision timer to node level · 8a1577c9
      Jon Paul Maloy 提交于
      In our effort to move control of the links to the link aggregation
      layer, we move the perodic link supervision timer to struct tipc_node.
      The new timer is shared between all links belonging to the node, thus
      saving resources, while still kicking the FSM on both its pertaining
      links at each expiration.
      
      The current link timer and corresponding functions are removed.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a1577c9
    • J
      tipc: simplify link timer implementation · 333ef69e
      Jon Paul Maloy 提交于
      We create a second, simpler, link timer function, tipc_link_timeout().
      The new function  makes use of the new FSM function introduced in the
      previous commit, and just like it, takes a buffer queue as parameter.
      It returns an event bit field and potentially a link protocol packet
      to the caller.
      
      The existing timer function, link_timeout(), is still needed for a
      while, so we redesign it to become a wrapper around the new function.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      333ef69e
    • J
      tipc: improve link FSM implementation · 6ab30f9c
      Jon Paul Maloy 提交于
      The link FSM implementation is currently unnecessarily complex.
      It sometimes checks for conditional state outside the FSM data
      before deciding next state, and often performs actions directly
      inside the FSM logics.
      
      In this commit, we create a second, simpler FSM implementation,
      that as far as possible acts only on states and events that it is
      strictly defined for, and postpone any actions until it is finished
      with its decisions. It also returns an event flag field and an a
      buffer queue which may potentially contain a protocol message to
      be sent by the caller.
      
      Unfortunately, we cannot yet make the FSM "clean", in the sense
      that its decisions are only based on FSM state and event, and that
      state changes happen only here. That will have to wait until the
      activate/reset logics has been cleaned up in a future commit.
      
      We also rename the link states as follows:
      
      WORKING_WORKING -> TIPC_LINK_WORKING
      WORKING_UNKNOWN -> TIPC_LINK_PROBING
      RESET_UNKNOWN   -> TIPC_LINK_RESETTING
      RESET_RESET     -> TIPC_LINK_ESTABLISHING
      
      The existing FSM function, link_state_event(), is still needed for
      a while, so we redesign it to make use of the new function.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ab30f9c
    • J
      tipc: introduce new link protocol msg create function · 426cc2b8
      Jon Paul Maloy 提交于
      As a preparation for later changes, we introduce a new function
      tipc_link_build_proto_msg(). Instead of actually sending the created
      protocol message, it only creates it and adds it to the head of a
      skb queue provided by the caller.
      
      Since we still need the existing function tipc_link_protocol_xmit()
      for a while, we redesign it to make use of the new function.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      426cc2b8
    • J
      tipc: clean up definitions and usage of link flags · d3504c34
      Jon Paul Maloy 提交于
      The status flag LINK_STOPPED is not needed any more, since the
      mechanism for delayed deletion of links has been removed.
      Likewise, LINK_STARTED and LINK_START_EVT are unnecessary,
      because we can just as well start the link timer directly from
      inside tipc_link_create().
      
      We eliminate these flags in this commit.
      
      Instead of the above flags, we now introduce three new link modes,
      TIPC_LINK_OPEN, TIPC_LINK_BLOCKED and TIPC_LINK_TUNNEL. The values
      indicate whether, and in the case of TIPC_LINK_TUNNEL, which, messages
      the link is allowed to receive in this state. TIPC_LINK_BLOCKED also
      blocks timer-driven protocol messages to be sent out, and any change
      to the link FSM. Since the modes are mutually exclusive, we convert
      them to state values, and rename the 'flags' field in struct tipc_link
      to 'exec_mode'.
      
      Finally, we move the #defines for link FSM states and events from link.h
      into enums inside the file link.c, which is the real usage scope of
      these definitions.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3504c34
    • J
      tipc: make media xmit call outside node spinlock context · af9b028e
      Jon Paul Maloy 提交于
      Currently, message sending is performed through a deep call chain,
      where the node spinlock is grabbed and held during a significant
      part of the transmission time. This is clearly detrimental to
      overall throughput performance; it would be better if we could send
      the message after the spinlock has been released.
      
      In this commit, we do instead let the call revert on the stack after
      the buffer chain has been added to the transmission queue, whereafter
      clones of the buffers are transmitted to the device layer outside the
      spinlock scope.
      
      As a further step in our effort to separate the roles of the node
      and link entities we also move the function tipc_link_xmit() to
      node.c, and rename it to tipc_node_xmit().
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af9b028e
    • J
      tipc: change sk_buffer handling in tipc_link_xmit() · 22d85c79
      Jon Paul Maloy 提交于
      When the function tipc_link_xmit() is given a buffer list for
      transmission, it currently consumes the list both when transmission
      is successful and when it fails, except for the special case when
      it encounters link congestion.
      
      This behavior is inconsistent, and needs to be corrected if we want
      to avoid problems in later commits in this series.
      
      In this commit, we change this to let the function consume the list
      only when transmission is successful, and leave the list with the
      sender in all other cases. We also modifiy the socket code so that
      it adapts to this change, i.e., purges the list when a non-congestion
      error code is returned.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22d85c79
    • J
      tipc: move link input queue to tipc_node · d39bbd44
      Jon Paul Maloy 提交于
      At present, the link input queue and the name distributor receive
      queues are fields aggregated in struct tipc_link. This is a hazard,
      because a link might be deleted while a receiving socket still keeps
      reference to one of the queues.
      
      This commit fixes this bug. However, rather than adding yet another
      reference counter to the critical data path, we move the two queues
      to safe ground inside struct tipc_node, which is already protected, and
      let the link code only handle references to the queues. This is also
      in line with planned later changes in this area.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d39bbd44
    • J
      tipc: introduce link entry structure to struct tipc_node · 9d13ec65
      Jon Paul Maloy 提交于
      struct 'tipc_node' currently contains two arrays for link attributes,
      one for the link pointers, and one for the usable link MTUs.
      
      We now group those into a new struct 'tipc_link_entry', and intoduce
      one single array consisting of such enties. Apart from being a cosmetic
      improvement, this is a starting point for the strict master-slave
      relation between node and link that we will introduce in the following
      commits.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d13ec65
  5. 29 6月, 2015 1 次提交
    • J
      tipc: purge backlog queue counters when broadcast link is reset · 7d967b67
      Jon Paul Maloy 提交于
      In commit 1f66d161
      ("tipc: introduce starvation free send algorithm")
      we introduced a counter per priority level for buffers
      in the link backlog queue. We also introduced a new
      function tipc_link_purge_backlog(), to reset these
      counters to zero when the link is reset.
      
      Unfortunately, we missed to call this function when
      the broadcast link is reset, with the result that the
      values of these counters might be permanently skewed
      when new nodes are attached. This may in the worst case
      lead to permananent, but spurious, broadcast link
      congestion, where no broadcast packets can be sent at
      all.
      
      We fix this bug with this commit.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d967b67
  6. 27 5月, 2015 1 次提交
    • J
      tipc: fix bug in link protocol message create function · f3903bcc
      Jon Paul Maloy 提交于
      In commit dd3f9e70
      ("tipc: add packet sequence number at instant of transmission") we
      made a change with the consequence that packets in the link backlog
      queue don't contain valid sequence numbers.
      
      However, when we create a link protocol message, we still use the
      sequence number of the first packet in the backlog, if there is any,
      as "next_sent" indicator in the message. This may entail unnecessary
      retransissions or stale packet transmission when there is very low
      traffic on the link.
      
      This commit fixes this issue by only using the current value of
      tipc_link::snd_nxt as indicator.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f3903bcc
  7. 15 5月, 2015 7 次提交
    • J
      tipc: add packet sequence number at instant of transmission · dd3f9e70
      Jon Paul Maloy 提交于
      Currently, the packet sequence number is updated and added to each
      packet at the moment a packet is added to the link backlog queue.
      This is wasteful, since it forces the code to traverse the send
      packet list packet by packet when adding them to the backlog queue.
      It would be better to just splice the whole packet list into the
      backlog queue when that is the right action to do.
      
      In this commit, we do this change. Also, since the sequence numbers
      cannot now be assigned to the packets at the moment they are added
      the backlog queue, we do instead calculate and add them at the moment
      of transmission, when the backlog queue has to be traversed anyway.
      We do this in the function tipc_link_push_packet().
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dd3f9e70
    • J
      tipc: improve link congestion algorithm · f21e897e
      Jon Paul Maloy 提交于
      The link congestion algorithm used until now implies two problems.
      
      - It is too generous towards lower-level messages in situations of high
        load by giving "absolute" bandwidth guarantees to the different
        priority levels. LOW traffic is guaranteed 10%, MEDIUM is guaranted
        20%, HIGH is guaranteed 30%, and CRITICAL is guaranteed 40% of the
        available bandwidth. But, in the absence of higher level traffic, the
        ratio between two distinct levels becomes unreasonable. E.g. if there
        is only LOW and MEDIUM traffic on a system, the former is guaranteed
        1/3 of the bandwidth, and the latter 2/3. This again means that if
        there is e.g. one LOW user and 10 MEDIUM users, the  former will have
        33.3% of the bandwidth, and the others will have to compete for the
        remainder, i.e. each will end up with 6.7% of the capacity.
      
      - Packets of type MSG_BUNDLER are created at SYSTEM importance level,
        but only after the packets bundled into it have passed the congestion
        test for their own respective levels. Since bundled packets don't
        result in incrementing the level counter for their own importance,
        only occasionally for the SYSTEM level counter, they do in practice
        obtain SYSTEM level importance. Hence, the current implementation
        provides a gap in the congestion algorithm that in the worst case
        may lead to a link reset.
      
      We now refine the congestion algorithm as follows:
      
      - A message is accepted to the link backlog only if its own level
        counter, and all superior level counters, permit it.
      
      - The importance of a created bundle packet is set according to its
        contents. A bundle packet created from messges at levels LOW to
        CRITICAL is given importance level CRITICAL, while a bundle created
        from a SYSTEM level message is given importance SYSTEM. In the latter
        case only subsequent SYSTEM level messages are allowed to be bundled
        into it.
      
      This solves the first problem described above, by making the bandwidth
      guarantee relative to the total number of users at all levels; only
      the upper limit for each level remains absolute. In the example
      described above, the single LOW user would use 1/11th of the bandwidth,
      the same as each of the ten MEDIUM users, but he still has the same
      guarantee against starvation as the latter ones.
      
      The fix also solves the second problem. If the CRITICAL level is filled
      up by bundle packets of that level, no lower level packets will be
      accepted any more.
      Suggested-by: NGergely Kiss <gergely.kiss@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f21e897e
    • J
      tipc: simplify link supervision checkpointing · cd4eee3c
      Jon Paul Maloy 提交于
      We change the sequence number checkpointing that is performed
      by the timer in order to discover if the peer is active. Currently,
      we store a checkpoint of the next expected sequence number "rcv_nxt"
      at each timer expiration, and compare it to the current expected
      number at next timeout expiration. Instead, we now use the already
      existing field "silent_intv_cnt" for this task. We step the counter
      at each timeout expiration, and zero it at each valid received packet.
      If no valid packet has been received from the peer after "abort_limit"
      number of silent timer intervals, the link is declared faulty and reset.
      
      We also remove the multiple instances of timer activation from inside
      the FSM function "link_state_event()", and now do it at only one place;
      at the end of the timer function itself.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd4eee3c
    • J
      tipc: rename fields in struct tipc_link · a97b9d3f
      Jon Paul Maloy 提交于
      We rename some fields in struct tipc_link, in order to give them more
      descriptive names:
      
      next_in_no -> rcv_nxt
      next_out_no-> snd_nxt
      fsm_msg_cnt-> silent_intv_cnt
      cont_intv  -> keepalive_intv
      last_retransmitted -> last_retransm
      
      There are no functional changes in this commit.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a97b9d3f
    • J
      tipc: simplify packet sequence number handling · e4bf4f76
      Jon Paul Maloy 提交于
      Although the sequence number in the TIPC protocol is 16 bits, we have
      until now stored it internally as an unsigned 32 bits integer.
      We got around this by always doing explicit modulo-65535 operations
      whenever we need to access a sequence number.
      
      We now make the incoming and outgoing sequence numbers to unsigned
      16-bit integers, and remove the modulo operations where applicable.
      
      We also move the arithmetic inline functions for 16 bit integers
      to core.h, and the function buf_seqno() to msg.h, so they can easily
      be accessed from anywhere in the code.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4bf4f76
    • J
      tipc: simplify link timer handling · 75b44b01
      Jon Paul Maloy 提交于
      Prior to this commit, the link timer has been running at a "continuity
      interval" of configured link tolerance/4. When a timer wakes up and
      discovers that there has been no sign of life from the peer during the
      previous interval, it divides its own timer interval by another factor
      four, and starts sending one probe per new interval. When the configured
      link tolerance time has passed without answer, i.e. after 16 unacked
      probes, the link is declared faulty and reset.
      
      This is unnecessary complex. It is sufficient to continue with the
      original continuity interval, and instead reset the link after four
      missed probe responses. This makes the timer handling in the link
      simpler, and opens up for some planned later changes in this area.
      This commit implements this change.
      Reviewed-by: NRichard Alpe <richard.alpe@ericsson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      75b44b01
    • J
      tipc: simplify resetting and disabling of bearers · b1c29f6b
      Jon Paul Maloy 提交于
      Since commit 4b475e3f2f8e4e241de101c8240f1d74d0470494
      ("tipc: eliminate delayed link deletion at link failover") the extra
      boolean parameter "shutting_down" is not any longer needed for the
      functions bearer_disable() and tipc_link_delete_list().
      
      Furhermore, the function tipc_link_reset_links(), called from
      bearer_reset()  is now unnecessary. We can just as well delete
      all the links, as we do in bearer_disable(), and start over with
      creating new links.
      
      This commit introduces those changes.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1c29f6b
  8. 10 5月, 2015 1 次提交
  9. 30 4月, 2015 2 次提交
    • J
      tipc: fix problem with parallel link synchronization mechanism · 0d699f28
      Jon Paul Maloy 提交于
      Currently, we try to accumulate arrived packets in the links's
      'deferred' queue during the parallel link syncronization phase.
      
      This entails two problems:
      
      - With an unlucky combination of arriving packets the algorithm
        may go into a lockstep with the out-of-sequence handling function,
        where the synch mechanism is adding a packet to the deferred queue,
        while the out-of-sequence handling is retrieving it again, thus
        ending up in a loop inside the node_lock scope.
      
      - Even if this is avoided, the link will very often send out
        unnecessary protocol messages, in the worst case leading to
        redundant retransmissions.
      
      We fix this by just dropping arriving packets on the upcoming link
      during the synchronization phase, thus relying on the retransmission
      protocol to resolve the situation once the two links have arrived to
      a synchronized state.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d699f28
    • N
      tipc: remove wrong use of NLM_F_MULTI · f2f67390
      Nicolas Dichtel 提交于
      NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact,
      it is sent only at the end of a dump.
      
      Libraries like libnl will wait forever for NLMSG_DONE.
      
      Fixes: 35b9dd76 ("tipc: add bearer get/dump to new netlink api")
      Fixes: 7be57fc6 ("tipc: add link get/dump to new netlink api")
      Fixes: 46f15c67 ("tipc: add media get/dump to new netlink api")
      CC: Richard Alpe <richard.alpe@ericsson.com>
      CC: Jon Maloy <jon.maloy@ericsson.com>
      CC: Ying Xue <ying.xue@windriver.com>
      CC: tipc-discussion@lists.sourceforge.net
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f2f67390
  10. 23 4月, 2015 1 次提交
    • E
      tipc: fix node refcount issue · 73a31737
      Erik Hugne 提交于
      When link statistics is dumped over netlink, we iterate over
      the list of peer nodes and append each links statistics to
      the netlink msg. In the case where the dump is resumed after
      filling up a nlmsg, the node refcnt is decremented without
      having been incremented previously which may cause the node
      reference to be freed. When this happens, the following
      info/stacktrace will be generated, followed by a crash or
      undefined behavior.
      We fix this by removing the erroneous call to tipc_node_put
      inside the loop that iterates over nodes.
      
      [  384.312303] INFO: trying to register non-static key.
      [  384.313110] the code is fine but needs lockdep annotation.
      [  384.313290] turning off the locking correctness validator.
      [  384.313290] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.0.0+ #13
      [  384.313290] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  384.313290]  ffff88003c6d0290 ffff88003cc03ca8 ffffffff8170adf1 0000000000000007
      [  384.313290]  ffffffff82728730 ffff88003cc03d38 ffffffff810a6a6d 00000000001d7200
      [  384.313290]  ffff88003c6d0ab0 ffff88003cc03ce8 0000000000000285 0000000000000001
      [  384.313290] Call Trace:
      [  384.313290]  <IRQ>  [<ffffffff8170adf1>] dump_stack+0x4c/0x65
      [  384.313290]  [<ffffffff810a6a6d>] __lock_acquire+0xf3d/0xf50
      [  384.313290]  [<ffffffff810a7375>] lock_acquire+0xd5/0x290
      [  384.313290]  [<ffffffffa0043e8c>] ? link_timeout+0x1c/0x170 [tipc]
      [  384.313290]  [<ffffffffa0043e70>] ? link_state_event+0x4e0/0x4e0 [tipc]
      [  384.313290]  [<ffffffff81712890>] _raw_spin_lock_bh+0x40/0x80
      [  384.313290]  [<ffffffffa0043e8c>] ? link_timeout+0x1c/0x170 [tipc]
      [  384.313290]  [<ffffffffa0043e8c>] link_timeout+0x1c/0x170 [tipc]
      [  384.313290]  [<ffffffff810c4698>] call_timer_fn+0xb8/0x490
      [  384.313290]  [<ffffffff810c45e0>] ? process_timeout+0x10/0x10
      [  384.313290]  [<ffffffff810c5a2c>] run_timer_softirq+0x21c/0x420
      [  384.313290]  [<ffffffffa0043e70>] ? link_state_event+0x4e0/0x4e0 [tipc]
      [  384.313290]  [<ffffffff8105a954>] __do_softirq+0xf4/0x630
      [  384.313290]  [<ffffffff8105afdd>] irq_exit+0x5d/0x60
      [  384.313290]  [<ffffffff8103ade1>] smp_apic_timer_interrupt+0x41/0x50
      [  384.313290]  [<ffffffff817144a0>] apic_timer_interrupt+0x70/0x80
      [  384.313290]  <EOI>  [<ffffffff8100db10>] ? default_idle+0x20/0x210
      [  384.313290]  [<ffffffff8100db0e>] ? default_idle+0x1e/0x210
      [  384.313290]  [<ffffffff8100e61a>] arch_cpu_idle+0xa/0x10
      [  384.313290]  [<ffffffff81099803>] cpu_startup_entry+0x2c3/0x530
      [  384.313290]  [<ffffffff810d2893>] ? clockevents_register_device+0x113/0x200
      [  384.313290]  [<ffffffff81038b0f>] start_secondary+0x13f/0x170
      
      Fixes: 8a0f6ebe ("tipc: involve reference counter for node structure")
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73a31737
  11. 03 4月, 2015 3 次提交
    • J
      tipc: simplify link mtu negotiation · ed193ece
      Jon Paul Maloy 提交于
      When a link is being established, the two endpoints advertise their
      respective interface MTU in the transmitted RESET and ACTIVATE messages.
      If there is any difference, the lower of the two MTUs will be selected
      for use by both endpoints.
      
      However, as a remnant of earlier attempts to introduce TIPC level
      routing. there also exists an MTU discovery mechanism. If an intermediate
      node has a lower MTU than the two endpoints, they will discover this
      through a bisectional approach, and finally adopt this MTU for common use.
      
      Since there is no TIPC level routing, and probably never will be,
      this mechanism doesn't make any sense, and only serves to make the
      link level protocol unecessarily complex.
      
      In this commit, we eliminate the MTU discovery algorithm,and fall back
      to the simple MTU advertising approach. This change is fully backwards
      compatible.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed193ece
    • J
      tipc: eliminate delayed link deletion at link failover · dff29b1a
      Jon Paul Maloy 提交于
      When a bearer is disabled manually, all its links have to be reset
      and deleted. However, if there is a remaining, parallel link ready
      to take over a deleted link's traffic, we currently delay the delete
      of the removed link until the failover procedure is finished. This
      is because the remaining link needs to access state from the reset
      link, such as the last received packet number, and any partially
      reassembled buffer, in order to perform a successful failover.
      
      In this commit, we do instead move the state data over to the new
      link, so that it can fulfill the procedure autonomously, without
      accessing any data on the old link. This means that we can now
      proceed and delete all pertaining links immediately when a bearer
      is disabled. This saves us from some unnecessary complexity in such
      situations.
      
      We also choose to change the confusing definitions CHANGEOVER_PROTOCOL,
      ORIGINAL_MSG and DUPLICATE_MSG to the more descriptive TUNNEL_PROTOCOL,
      FAILOVER_MSG and SYNCH_MSG respectively.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dff29b1a
    • J
      tipc: drop tunneled packet duplicates at reception · 2da71425
      Jon Paul Maloy 提交于
      In commit 8b4ed863
      ("tipc: eliminate race condition at dual link establishment")
      we introduced a parallel link synchronization mechanism that
      guarentees sequential delivery even for users switching from
      an old to a newly established link. The new mechanism makes it
      unnecessary to deliver the tunneled duplicate packets back to
      the old link, as we are currently doing. It is now sufficient
      to use the last tunneled packet's inner sequence number as
      synchronization point between the two parallel links, whereafter
      it can be dropped.
      
      In this commit, we drop the duplicate packets arriving on the new
      link, after updating the synchronization point at each new arrival.
      
      Although it would now have been sufficient for the other endpoint
      to only tunnel the last packet in its send queue, and not the
      entire queue, we must still do this to maintain compatibility
      with older nodes.
      
      This commit makes it possible to get rid if some complex
      interaction between the two parallel links.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2da71425
  12. 30 3月, 2015 2 次提交
    • Y
      tipc: involve reference counter for node structure · 8a0f6ebe
      Ying Xue 提交于
      TIPC node hash node table is protected with rcu lock on read side.
      tipc_node_find() is used to look for a node object with node address
      through iterating the hash node table. As the entire process of what
      tipc_node_find() traverses the table is guarded with rcu read lock,
      it's safe for us. However, when callers use the node object returned
      by tipc_node_find(), there is no rcu read lock applied. Therefore,
      this is absolutely unsafe for callers of tipc_node_find().
      
      Now we introduce a reference counter for node structure. Before
      tipc_node_find() returns node object to its caller, it first increases
      the reference counter. Accordingly, after its caller used it up,
      it decreases the counter again. This can prevent a node being used by
      one thread from being freed by another thread.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericson.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a0f6ebe
    • Y
      tipc: fix potential deadlock when all links are reset · b952b2be
      Ying Xue 提交于
      [   60.988363] ======================================================
      [   60.988754] [ INFO: possible circular locking dependency detected ]
      [   60.989152] 3.19.0+ #194 Not tainted
      [   60.989377] -------------------------------------------------------
      [   60.989781] swapper/3/0 is trying to acquire lock:
      [   60.990079]  (&(&n_ptr->lock)->rlock){+.-...}, at: [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc]
      [   60.990743]
      [   60.990743] but task is already holding lock:
      [   60.991106]  (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
      [   60.991738]
      [   60.991738] which lock already depends on the new lock.
      [   60.991738]
      [   60.992174]
      [   60.992174] the existing dependency chain (in reverse order) is:
      [   60.992174]
      -> #1 (&(&bclink->lock)->rlock){+.-...}:
      [   60.992174]        [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140
      [   60.992174]        [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
      [   60.992174]        [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
      [   60.992174]        [<ffffffffa0000f57>] tipc_bclink_add_node+0x97/0xf0 [tipc]
      [   60.992174]        [<ffffffffa0011815>] tipc_node_link_up+0xf5/0x110 [tipc]
      [   60.992174]        [<ffffffffa0007783>] link_state_event+0x2b3/0x4f0 [tipc]
      [   60.992174]        [<ffffffffa00193c0>] tipc_link_proto_rcv+0x24c/0x418 [tipc]
      [   60.992174]        [<ffffffffa0008857>] tipc_rcv+0x827/0xac0 [tipc]
      [   60.992174]        [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc]
      [   60.992174]        [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980
      [   60.992174]        [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70
      [   60.992174]        [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130
      [   60.992174]        [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0
      [   60.992174]        [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490
      [   60.992174]        [<ffffffff8155c1b7>] e1000_clean+0x267/0x990
      [   60.992174]        [<ffffffff81647b60>] net_rx_action+0x150/0x360
      [   60.992174]        [<ffffffff8105ec43>] __do_softirq+0x123/0x360
      [   60.992174]        [<ffffffff8105f12e>] irq_exit+0x8e/0xb0
      [   60.992174]        [<ffffffff8179f9f5>] do_IRQ+0x65/0x110
      [   60.992174]        [<ffffffff8179da6f>] ret_from_intr+0x0/0x13
      [   60.992174]        [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20
      [   60.992174]        [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0
      [   60.992174]        [<ffffffff81033cda>] start_secondary+0x13a/0x150
      [   60.992174]
      -> #0 (&(&n_ptr->lock)->rlock){+.-...}:
      [   60.992174]        [<ffffffff810a8f7d>] __lock_acquire+0x163d/0x1ca0
      [   60.992174]        [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140
      [   60.992174]        [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
      [   60.992174]        [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc]
      [   60.992174]        [<ffffffffa0001e11>] tipc_bclink_rcv+0x611/0x640 [tipc]
      [   60.992174]        [<ffffffffa0008646>] tipc_rcv+0x616/0xac0 [tipc]
      [   60.992174]        [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc]
      [   60.992174]        [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980
      [   60.992174]        [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70
      [   60.992174]        [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130
      [   60.992174]        [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0
      [   60.992174]        [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490
      [   60.992174]        [<ffffffff8155c1b7>] e1000_clean+0x267/0x990
      [   60.992174]        [<ffffffff81647b60>] net_rx_action+0x150/0x360
      [   60.992174]        [<ffffffff8105ec43>] __do_softirq+0x123/0x360
      [   60.992174]        [<ffffffff8105f12e>] irq_exit+0x8e/0xb0
      [   60.992174]        [<ffffffff8179f9f5>] do_IRQ+0x65/0x110
      [   60.992174]        [<ffffffff8179da6f>] ret_from_intr+0x0/0x13
      [   60.992174]        [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20
      [   60.992174]        [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0
      [   60.992174]        [<ffffffff81033cda>] start_secondary+0x13a/0x150
      [   60.992174]
      [   60.992174] other info that might help us debug this:
      [   60.992174]
      [   60.992174]  Possible unsafe locking scenario:
      [   60.992174]
      [   60.992174]        CPU0                    CPU1
      [   60.992174]        ----                    ----
      [   60.992174]   lock(&(&bclink->lock)->rlock);
      [   60.992174]                                lock(&(&n_ptr->lock)->rlock);
      [   60.992174]                                lock(&(&bclink->lock)->rlock);
      [   60.992174]   lock(&(&n_ptr->lock)->rlock);
      [   60.992174]
      [   60.992174]  *** DEADLOCK ***
      [   60.992174]
      [   60.992174] 3 locks held by swapper/3/0:
      [   60.992174]  #0:  (rcu_read_lock){......}, at: [<ffffffff81646791>] __netif_receive_skb_core+0x71/0x980
      [   60.992174]  #1:  (rcu_read_lock){......}, at: [<ffffffffa0002c35>] tipc_l2_rcv_msg+0x5/0xd0 [tipc]
      [   60.992174]  #2:  (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
      [   60.992174]
      
      The correct the sequence of grabbing n_ptr->lock and bclink->lock
      should be that the former is first held and the latter is then taken,
      which exactly happened on CPU1. But especially when the retransmission
      of broadcast link is failed, bclink->lock is first held in
      tipc_bclink_rcv(), and n_ptr->lock is taken in link_retransmit_failure()
      called by tipc_link_retransmit() subsequently, which is demonstrated on
      CPU0. As a result, deadlock occurs.
      
      If the order of holding the two locks happening on CPU0 is reversed, the
      deadlock risk will be relieved. Therefore, the node lock taken in
      link_retransmit_failure() originally is moved to tipc_bclink_rcv()
      so that it's obtained before bclink lock. But the precondition of
      the adjustment of node lock is that responding to bclink reset event
      must be moved from tipc_bclink_unlock() to tipc_node_unlock().
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b952b2be
  13. 26 3月, 2015 2 次提交
    • J
      tipc: eliminate race condition at dual link establishment · 8b4ed863
      Jon Paul Maloy 提交于
      Despite recent improvements, the establishment of dual parallel
      links still has a small glitch where messages can bypass each
      other. When the second link in a dual-link configuration is
      established, part of the first link's traffic will be steered over
      to the new link. Although we do have a mechanism to ensure that
      packets sent before and after the establishment of the new link
      arrive in sequence to the destination node, this is not enough.
      The arriving messages will still be delivered upwards in different
      threads, something entailing a risk of message disordering during
      the transition phase.
      
      To fix this, we introduce a synchronization mechanism between the
      two parallel links, so that traffic arriving on the new link cannot
      be added to its input queue until we are guaranteed that all
      pre-establishment messages have been delivered on the old, parallel
      link.
      
      This problem seems to always have been around, but its occurrence is
      so rare that it has not been noticed until recent intensive testing.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b4ed863
    • J
      tipc: clean up handling of link congestion · 3127a020
      Jon Paul Maloy 提交于
      After the recent changes in message importance handling it becomes
      possible to simplify handling of messages and sockets when we
      encounter link congestion.
      
      We merge the function tipc_link_cong() into link_schedule_user(),
      and simplify the code of the latter. The code should now be
      easier to follow, especially regarding return codes and handling
      of the message that caused the situation.
      
      In case the scheduling function is unable to pre-allocate a wakeup
      message buffer, it now returns -ENOBUFS, which is a more correct
      code than the previously used -EHOSTUNREACH.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3127a020