1. 25 2月, 2017 1 次提交
  2. 25 1月, 2017 1 次提交
    • P
      tipc: fix nametbl_lock soft lockup at node/link events · 93f955aa
      Parthasarathy Bhuvaragan 提交于
      We trigger a soft lockup as we grab nametbl_lock twice if the node
      has a pending node up/down or link up/down event while:
      - we process an incoming named message in tipc_named_rcv() and
        perform an tipc_update_nametbl().
      - we have pending backlog items in the name distributor queue
        during a nametable update using tipc_nametbl_publish() or
        tipc_nametbl_withdraw().
      
      The following are the call chain associated:
      tipc_named_rcv() Grabs nametbl_lock
         tipc_update_nametbl() (publish/withdraw)
           tipc_node_subscribe()/unsubscribe()
             tipc_node_write_unlock()
                << lockup occurs if an outstanding node/link event
                   exits, as we grabs nametbl_lock again >>
      
      tipc_nametbl_withdraw() Grab nametbl_lock
        tipc_named_process_backlog()
          tipc_update_nametbl()
            << rest as above >>
      
      The function tipc_node_write_unlock(), in addition to releasing the
      lock processes the outstanding node/link up/down events. To do this,
      we need to grab the nametbl_lock again leading to the lockup.
      
      In this commit we fix the soft lockup by introducing a fast variant of
      node_unlock(), where we just release the lock. We adapt the
      node_subscribe()/node_unsubscribe() to use the fast variants.
      Reported-and-Tested-by: NJohn Thompson <thompa.atl@gmail.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93f955aa
  3. 21 1月, 2017 1 次提交
  4. 04 1月, 2017 1 次提交
    • J
      tipc: reduce risk of user starvation during link congestion · 365ad353
      Jon Paul Maloy 提交于
      The socket code currently handles link congestion by either blocking
      and trying to send again when the congestion has abated, or just
      returning to the user with -EAGAIN and let him re-try later.
      
      This mechanism is prone to starvation, because the wakeup algorithm is
      non-atomic. During the time the link issues a wakeup signal, until the
      socket wakes up and re-attempts sending, other senders may have come
      in between and occupied the free buffer space in the link. This in turn
      may lead to a socket having to make many send attempts before it is
      successful. In extremely loaded systems we have observed latency times
      of several seconds before a low-priority socket is able to send out a
      message.
      
      In this commit, we simplify this mechanism and reduce the risk of the
      described scenario happening. When a message is attempted sent via a
      congested link, we now let it be added to the link's backlog queue
      anyway, thus permitting an oversubscription of one message per source
      socket. We still create a wakeup item and return an error code, hence
      instructing the sender to block or stop sending. Only when enough space
      has been freed up in the link's backlog queue do we issue a wakeup event
      that allows the sender to continue with the next message, if any.
      
      The fact that a socket now can consider a message sent even when the
      link returns a congestion code means that the sending socket code can
      be simplified. Also, since this is a good opportunity to get rid of the
      obsolete 'mtu change' condition in the three socket send functions, we
      now choose to refactor those functions completely.
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      365ad353
  5. 30 10月, 2016 1 次提交
    • J
      tipc: fix broadcast link synchronization problem · 06bd2b1e
      Jon Paul Maloy 提交于
      In commit 2d18ac4b ("tipc: extend broadcast link initialization
      criteria") we tried to fix a problem with the initial synchronization
      of broadcast link acknowledge values. Unfortunately that solution is
      not sufficient to solve the issue.
      
      We have seen it happen that LINK_PROTOCOL/STATE packets with a valid
      non-zero unicast acknowledge number may bypass BCAST_PROTOCOL
      initialization, NAME_DISTRIBUTOR and other STATE packets with invalid
      broadcast acknowledge numbers, leading to premature opening of the
      broadcast link. When the bypassed packets finally arrive, they are
      inadvertently accepted, and the already correctly initialized
      acknowledge number in the broadcast receive link is overwritten by
      the invalid (zero) value of the said packets. After this the broadcast
      link goes stale.
      
      We now fix this by marking the packets where we know the acknowledge
      value is or may be invalid, and then ignoring the acks from those.
      
      To this purpose, we claim an unused bit in the header to indicate that
      the value is invalid. We set the bit to 1 in the initial BCAST_PROTOCOL
      synchronization packet and all initial ("bulk") NAME_DISTRIBUTOR
      packets, plus those LINK_PROTOCOL packets sent out before the broadcast
      links are fully synchronized.
      
      This minor protocol update is fully backwards compatible.
      Reported-by: NJohn Thompson <thompa.atl@gmail.com>
      Tested-by: NJohn Thompson <thompa.atl@gmail.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06bd2b1e
  6. 03 9月, 2016 1 次提交
    • J
      tipc: transfer broadcast nacks in link state messages · 02d11ca2
      Jon Paul Maloy 提交于
      When we send broadcasts in clusters of more 70-80 nodes, we sometimes
      see the broadcast link resetting because of an excessive number of
      retransmissions. This is caused by a combination of two factors:
      
      1) A 'NACK crunch", where loss of broadcast packets is discovered
         and NACK'ed by several nodes simultaneously, leading to multiple
         redundant broadcast retransmissions.
      
      2) The fact that the NACKS as such also are sent as broadcast, leading
         to excessive load and packet loss on the transmitting switch/bridge.
      
      This commit deals with the latter problem, by moving sending of
      broadcast nacks from the dedicated BCAST_PROTOCOL/NACK message type
      to regular unicast LINK_PROTOCOL/STATE messages. We allocate 10 unused
      bits in word 8 of the said message for this purpose, and introduce a
      new capability bit, TIPC_BCAST_STATE_NACK in order to keep the change
      backwards compatible.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02d11ca2
  7. 19 8月, 2016 1 次提交
  8. 27 7月, 2016 3 次提交
  9. 12 7月, 2016 1 次提交
  10. 16 6月, 2016 1 次提交
    • J
      tipc: add neighbor monitoring framework · 35c55c98
      Jon Paul Maloy 提交于
      TIPC based clusters are by default set up with full-mesh link
      connectivity between all nodes. Those links are expected to provide
      a short failure detection time, by default set to 1500 ms. Because
      of this, the background load for neighbor monitoring in an N-node
      cluster increases with a factor N on each node, while the overall
      monitoring traffic through the network infrastructure increases at
      a ~(N * (N - 1)) rate. Experience has shown that such clusters don't
      scale well beyond ~100 nodes unless we significantly increase failure
      discovery tolerance.
      
      This commit introduces a framework and an algorithm that drastically
      reduces this background load, while basically maintaining the original
      failure detection times across the whole cluster. Using this algorithm,
      background load will now grow at a rate of ~(2 * sqrt(N)) per node, and
      at ~(2 * N * sqrt(N)) in traffic overhead. As an example, each node will
      now have to actively monitor 38 neighbors in a 400-node cluster, instead
      of as before 399.
      
      This "Overlapping Ring Supervision Algorithm" is completely distributed
      and employs no centralized or coordinated state. It goes as follows:
      
      - Each node makes up a linearly ascending, circular list of all its N
        known neighbors, based on their TIPC node identity. This algorithm
        must be the same on all nodes.
      
      - The node then selects the next M = sqrt(N) - 1 nodes downstream from
        itself in the list, and chooses to actively monitor those. This is
        called its "local monitoring domain".
      
      - It creates a domain record describing the monitoring domain, and
        piggy-backs this in the data area of all neighbor monitoring messages
        (LINK_PROTOCOL/STATE) leaving that node. This means that all nodes in
        the cluster eventually (default within 400 ms) will learn about
        its monitoring domain.
      
      - Whenever a node discovers a change in its local domain, e.g., a node
        has been added or has gone down, it creates and sends out a new
        version of its node record to inform all neighbors about the change.
      
      - A node receiving a domain record from anybody outside its local domain
        matches this against its own list (which may not look the same), and
        chooses to not actively monitor those members of the received domain
        record that are also present in its own list. Instead, it relies on
        indications from the direct monitoring nodes if an indirectly
        monitored node has gone up or down. If a node is indicated lost, the
        receiving node temporarily activates its own direct monitoring towards
        that node in order to confirm, or not, that it is actually gone.
      
      - Since each node is actively monitoring sqrt(N) downstream neighbors,
        each node is also actively monitored by the same number of upstream
        neighbors. This means that all non-direct monitoring nodes normally
        will receive sqrt(N) indications that a node is gone.
      
      - A major drawback with ring monitoring is how it handles failures that
        cause massive network partitionings. If both a lost node and all its
        direct monitoring neighbors are inside the lost partition, the nodes in
        the remaining partition will never receive indications about the loss.
        To overcome this, each node also chooses to actively monitor some
        nodes outside its local domain. Those nodes are called remote domain
        "heads", and are selected in such a way that no node in the cluster
        will be more than two direct monitoring hops away. Because of this,
        each node, apart from monitoring the member of its local domain, will
        also typically monitor sqrt(N) remote head nodes.
      
      - As an optimization, local list status, domain status and domain
        records are marked with a generation number. This saves senders from
        unnecessarily conveying  unaltered domain records, and receivers from
        performing unneeded re-adaptations of their node monitoring list, such
        as re-assigning domain heads.
      
      - As a measure of caution we have added the possibility to disable the
        new algorithm through configuration. We do this by keeping a threshold
        value for the cluster size; a cluster that grows beyond this value
        will switch from full-mesh to ring monitoring, and vice versa when
        it shrinks below the value. This means that if the threshold is set to
        a value larger than any anticipated cluster size (default size is 32)
        the new algorithm is effectively disabled. A patch set for altering the
        threshold value and for listing the table contents will follow shortly.
      
      - This change is fully backwards compatible.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35c55c98
  11. 09 6月, 2016 2 次提交
    • J
      tipc: change node timer unit from jiffies to ms · 5ca509fc
      Jon Paul Maloy 提交于
      The node keepalive interval is recalculated at each timer expiration
      to catch any changes in the link tolerance, and stored in a field in
      struct tipc_node. We use jiffies as unit for the stored value.
      
      This is suboptimal, because it makes the calculation unnecessary
      complex, including two unit conversions. The conversions also lead to
      a rounding error that causes the link "abort limit" to be 3 in the
      normal case, instead of 4, as intended. This again leads to unnecessary
      link resets when the network is pushed close to its limit, e.g., in an
      environment with hundreds of nodes or namesapces.
      
      In this commit, we do instead let the keepalive value be calculated and
      stored in milliseconds, so that there is only one conversion and the
      rounding error is eliminated.
      
      We also remove a redundant "keepalive" field in struct tipc_link. This
      is remnant from the previous implementation.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5ca509fc
    • J
      tipc: correct error in node fsm · c4282ca7
      Jon Paul Maloy 提交于
      commit 88e8ac70 ("tipc: reduce transmission rate of reset messages
      when link is down") revealed a flaw in the node FSM, as defined in
      the log of commit 66996b6c ("tipc: extend node FSM").
      
      We see the following scenario:
      1: Node B receives a RESET message from node A before its link endpoint
         is fully up, i.e., the node FSM is in state SELF_UP_PEER_COMING. This
         event will not change the node FSM state, but the (distinct) link FSM
         will move to state RESETTING.
      2: As an effect of the previous event, the local endpoint on B will
         declare node A lost, and post the event SELF_DOWN to the its node
         FSM. This moves the FSM state to SELF_DOWN_PEER_LEAVING, meaning
         that no messages will be accepted from A until it receives another
         RESET message that confirms that A's endpoint has been reset. This
         is  wasteful, since we know this as a fact already from the first
         received RESET, but worse is that the link instance's FSM has not
         wasted this information, but instead moved on to state ESTABLISHING,
         meaning that it repeatedly sends out ACTIVATE messages to the reset
         peer A.
      3: Node A will receive one of the ACTIVATE messages, move its link FSM
         to state ESTABLISHED, and start repeatedly sending out STATE messages
         to node B.
      4: Node B will consistently drop these messages, since it can only accept
         accept a RESET according to its node FSM.
      5: After four lost STATE messages node A will reset its link and start
         repeatedly sending out RESET messages to B.
      6: Because of the reduced send rate for RESET messages, it is very
         likely that A will receive an ACTIVATE (which is sent out at a much
         higher frequency) before it gets the chance to send a RESET, and A
         may hence quickly move back to state ESTABLISHED and continue sending
         out STATE messages, which will again be dropped by B.
      7: GOTO 5.
      8: After having repeated the cycle 5-7 a number of times, node A will
         by chance get in between with sending a RESET, and the situation is
         resolved.
      
      Unfortunately, we have seen that it may take a substantial amount of
      time before this vicious loop is broken, sometimes in the order of
      minutes.
      
      We correct this by making a small correction to the node FSM: When a
      node in state SELF_UP_PEER_COMING receives a SELF_DOWN event, it now
      moves directly back to state SELF_DOWN_PEER_DOWN, instead of as now
      SELF_DOWN_PEER_LEAVING. This is logically consistent, since we don't
      need to wait for RESET confirmation from of an endpoint that we alread
      know has been reset. It also means that node B in the scenario above
      will not be dropping incoming STATE messages, and the link can come up
      immediately.
      
      Finally, a symmetry comparison reveals that the  FSM has a similar
      error when receiving the event PEER_DOWN in state PEER_UP_SELF_COMING.
      Instead of moving to PERR_DOWN_SELF_LEAVING, it should move directly
      to SELF_DOWN_PEER_DOWN. Although we have never seen any negative effect
      of this logical error, we choose fix this one, too.
      
      The node FSM looks as follows after those changes:
      
                                 +----------------------------------------+
                                 |                           PEER_DOWN_EVT|
                                 |                                        |
        +------------------------+----------------+                       |
        |SELF_DOWN_EVT           |                |                       |
        |                        |                |                       |
        |              +-----------+          +-----------+               |
        |              |NODE_      |          |NODE_      |               |
        |   +----------|FAILINGOVER|<---------|SYNCHING   |-----------+   |
        |   |SELF_     +-----------+ FAILOVER_+-----------+   PEER_   |   |
        |   |DOWN_EVT   |          A BEGIN_EVT  A         |   DOWN_EVT|   |
        |   |           |          |            |         |           |   |
        |   |           |          |            |         |           |   |
        |   |           |FAILOVER_ |FAILOVER_   |SYNCH_   |SYNCH_     |   |
        |   |           |END_EVT   |BEGIN_EVT   |BEGIN_EVT|END_EVT    |   |
        |   |           |          |            |         |           |   |
        |   |           |          |            |         |           |   |
        |   |           |         +--------------+        |           |   |
        |   |           +-------->|   SELF_UP_   |<-------+           |   |
        |   |   +-----------------|   PEER_UP    |----------------+   |   |
        |   |   |SELF_DOWN_EVT    +--------------+   PEER_DOWN_EVT|   |   |
        |   |   |                    A        A                   |   |   |
        |   |   |                    |        |                   |   |   |
        |   |   |         PEER_UP_EVT|        |SELF_UP_EVT        |   |   |
        |   |   |                    |        |                   |   |   |
        V   V   V                    |        |                   V   V   V
      +------------+       +-----------+    +-----------+       +------------+
      |SELF_DOWN_  |       |SELF_UP_   |    |PEER_UP_   |       |PEER_DOWN   |
      |PEER_LEAVING|       |PEER_COMING|    |SELF_COMING|       |SELF_LEAVING|
      +------------+       +-----------+    +-----------+       +------------+
             |               |       A        A       |                |
             |               |       |        |       |                |
             |       SELF_   |       |SELF_   |PEER_  |PEER_           |
             |       DOWN_EVT|       |UP_EVT  |UP_EVT |DOWN_EVT        |
             |               |       |        |       |                |
             |               |       |        |       |                |
             |               |    +--------------+    |                |
             |PEER_DOWN_EVT  +--->|  SELF_DOWN_  |<---+   SELF_DOWN_EVT|
             +------------------->|  PEER_DOWN   |<--------------------+
                                  +--------------+
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c4282ca7
  12. 13 5月, 2016 1 次提交
    • J
      tipc: eliminate risk of double link_up events · e7142c34
      Jon Paul Maloy 提交于
      When an ACTIVATE or data packet is received in a link in state
      ESTABLISHING, the link does not immediately change state to
      ESTABLISHED, but does instead return a LINK_UP event to the caller,
      which will execute the state change in a different lock context.
      
      This non-atomic approach incurs a low risk that we may have two
      LINK_UP events pending simultaneously for the same link, resulting
      in the final part of the setup procedure being executed twice. The
      only potential harm caused by this it that we may see two LINK_UP
      events issued to subsribers of the topology server, something that
      may cause confusion.
      
      This commit eliminates this risk by checking if the link is already
      up before proceeding with the second half of the setup.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7142c34
  13. 04 5月, 2016 1 次提交
  14. 02 5月, 2016 2 次提交
    • H
      tipc: only process unicast on intended node · efe79050
      Hamish Martin 提交于
      We have observed complete lock up of broadcast-link transmission due to
      unacknowledged packets never being removed from the 'transmq' queue. This
      is traced to nodes having their ack field set beyond the sequence number
      of packets that have actually been transmitted to them.
      Consider an example where node 1 has sent 10 packets to node 2 on a
      link and node 3 has sent 20 packets to node 2 on another link. We
      see examples of an ack from node 2 destined for node 3 being treated as
      an ack from node 2 at node 1. This leads to the ack on the node 1 to node
      2 link being increased to 20 even though we have only sent 10 packets.
      When node 1 does get around to sending further packets, none of the
      packets with sequence numbers less than 21 are actually removed from the
      transmq.
      To resolve this we reinstate some code lost in commit d999297c ("tipc:
      reduce locking scope during packet reception") which ensures that only
      messages destined for the receiving node are processed by that node. This
      prevents the sequence numbers from getting out of sync and resolves the
      packet leakage, thereby resolving the broadcast-link transmission
      lock-ups we observed.
      
      While we are aware that this change only patches over a root problem that
      we still haven't identified, this is a sanity test that it is always
      legitimate to do. It will remain in the code even after we identify and
      fix the real problem.
      Reviewed-by: NChris Packham <chris.packham@alliedtelesis.co.nz>
      Reviewed-by: NJohn Thompson <john.thompson@alliedtelesis.co.nz>
      Signed-off-by: NHamish Martin <hamish.martin@alliedtelesis.co.nz>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      efe79050
    • J
      tipc: set 'active' state correctly for first established link · def22c47
      Jon Paul Maloy 提交于
      When we are displaying statistics for the first link established between
      two peers, it will always be presented as STANDBY although it in reality
      is ACTIVE.
      
      This happens because we forget to set the 'active' flag in the link
      instance at the moment it is established. Although this is a bug, it only
      has impact on the presentation view of the link, not on its actual
      functionality.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      def22c47
  15. 16 4月, 2016 2 次提交
  16. 08 3月, 2016 1 次提交
  17. 07 3月, 2016 1 次提交
  18. 26 2月, 2016 2 次提交
    • J
      tipc: fix crash during node removal · d25a0125
      Jon Paul Maloy 提交于
      When the TIPC module is unloaded, we have identified a race condition
      that allows a node reference counter to go to zero and the node instance
      being freed before the node timer is finished with accessing it. This
      leads to occasional crashes, especially in multi-namespace environments.
      
      The scenario goes as follows:
      
      CPU0:(node_stop)                       CPU1:(node_timeout)  // ref == 2
      
      1:                                          if(!mod_timer())
      2: if (del_timer())
      3:   tipc_node_put()                                        // ref -> 1
      4: tipc_node_put()                                          // ref -> 0
      5:   kfree_rcu(node);
      6:                                               tipc_node_get(node)
      7:                                               // BOOM!
      
      We now clean up this functionality as follows:
      
      1) We remove the node pointer from the node lookup table before we
         attempt deactivating the timer. This way, we reduce the risk that
         tipc_node_find() may obtain a valid pointer to an instance marked
         for deletion; a harmless but undesirable situation.
      
      2) We use del_timer_sync() instead of del_timer() to safely deactivate
         the node timer without any risk that it might be reactivated by the
         timeout handler. There is no risk of deadlock here, since the two
         functions never touch the same spinlocks.
      
      3: We remove a pointless tipc_node_get() + tipc_node_put() from the
         timeout handler.
      Reported-by: NZhijiang Hu <huzhijiang@gmail.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d25a0125
    • J
      tipc: eliminate risk of finding to-be-deleted node instance · b170997a
      Jon Paul Maloy 提交于
      Although we have never seen it happen, we have identified the
      following problematic scenario when nodes are stopped and deleted:
      
      CPU0:                            CPU1:
      
      tipc_node_xxx()                                   //ref == 1
         tipc_node_put()                                //ref -> 0
                                       tipc_node_find() // node still in table
             tipc_node_delete()
               list_del_rcu(n. list)
                                       tipc_node_get()  //ref -> 1, bad
               kfree_rcu()
      
                                       tipc_node_put() //ref to 0 again.
                                       kfree_rcu()     // BOOM!
      
      We fix this by introducing use of the conditional kref_get_if_not_zero()
      instead of kref_get() in the function tipc_node_find(). This eliminates
      any risk of post-mortem access.
      Reported-by: NZhijiang Hu <huzhijiang@gmail.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b170997a
  19. 17 2月, 2016 2 次提交
  20. 06 2月, 2016 1 次提交
    • R
      tipc: fix link attribute propagation bug · d01332f1
      Richard Alpe 提交于
      Changing certain link attributes (link tolerance and link priority)
      from the TIPC management tool is supposed to automatically take
      effect at both endpoints of the affected link.
      
      Currently the media address is not instantiated for the link and is
      used uninstantiated when crafting protocol messages designated for the
      peer endpoint. This means that changing a link property currently
      results in the property being changed on the local machine but the
      protocol message designated for the peer gets lost. Resulting in
      property discrepancy between the endpoints.
      
      In this patch we resolve this by using the media address from the
      link entry and using the bearer transmit function to send it. Hence,
      we can now eliminate the redundant function tipc_link_prot_xmit() and
      the redundant field tipc_link::media_addr.
      
      Fixes: 2af5ae37 (tipc: clean up unused code and structures)
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Reported-by: NJason Hu <huzhijiang@gmail.com>
      Signed-off-by: NRichard Alpe <richard.alpe@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d01332f1
  21. 04 12月, 2015 1 次提交
  22. 21 11月, 2015 7 次提交
    • J
      tipc: eliminate remnants of hungarian notation · 1a90632d
      Jon Paul Maloy 提交于
      The number of variables with Hungarian notation (l_ptr, n_ptr etc.)
      has been significantly reduced over the last couple of years.
      
      We now root out the last traces of this practice.
      There are no functional changes in this commit.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a90632d
    • J
      tipc: narrow down interface towards struct tipc_link · 38206d59
      Jon Paul Maloy 提交于
      We move the definition of struct tipc_link from link.h to link.c in
      order to minimize its exposure to the rest of the code.
      
      When needed, we define new functions to make it possible for external
      entities to access and set data in the link.
      
      Apart from the above, there are no functional changes.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38206d59
    • J
      tipc: narrow down exposure of struct tipc_node · 5be9c086
      Jon Paul Maloy 提交于
      In our effort to have less code and include dependencies between
      entities such as node, link and bearer, we try to narrow down
      the exposed interface towards the node as much as possible.
      
      In this commit, we move the definition of struct tipc_node, along
      with many of its associated function declarations, from node.h to
      node.c. We also move some function definitions from link.c and
      name_distr.c to node.c, since they access fields in struct tipc_node
      that should not be externally visible. The moved functions are renamed
      according to new location, and made static whenever possible.
      
      There are no functional changes in this commit.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5be9c086
    • J
      tipc: convert node lock to rwlock · 5405ff6e
      Jon Paul Maloy 提交于
      According to the node FSM a node in state SELF_UP_PEER_UP cannot
      change state inside a lock context, except when a TUNNEL_PROTOCOL
      (SYNCH or FAILOVER) packet arrives. However, the node's individual
      links may still change state.
      
      Since each link now is protected by its own spinlock, we finally have
      the conditions in place to convert the node spinlock to an rwlock_t.
      If the node state and arriving packet type are rigth, we can let the
      link directly receive the packet under protection of its own spinlock
      and the node lock in read mode. In all other cases we use the node
      lock in write mode. This enables full concurrent execution between
      parallel links during steady-state traffic situations, i.e., 99+ %
      of the time.
      
      This commit implements this change.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5405ff6e
    • J
      tipc: introduce per-link spinlock · 2312bf61
      Jon Paul Maloy 提交于
      As a preparation to allow parallel links to work more independently
      from each other we introduce a per-link spinlock, to be stored in the
      struct nodes's link entry area. Since the node lock still is a regular
      spinlock there is no increase in parallellism at this stage.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2312bf61
    • J
      tipc: reduce code dependency between binding table and node layer · 1d7e1c25
      Jon Paul Maloy 提交于
      The file name_distr.c currently contains three functions,
      named_cluster_distribute(), tipc_publ_subcscribe() and
      tipc_publ_unsubscribe() that all directly access fields in
      struct tipc_node. We want to eliminate such dependencies, so
      we move those functions to the file node.c and rename them to
      tipc_node_broadcast(), tipc_node_subscribe() and tipc_node_unsubscribe()
      respectively.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d7e1c25
    • J
      tipc: small cleanup of function tipc_node_check_state() · 5c10e979
      Jon Paul Maloy 提交于
      The function tipc_node_check_state() contains the core logics
      for handling link synchronization and failover. For this reason,
      it is important to keep it as comprehensible as possible.
      
      In this commit, we make three small cleanups.
      
      1) If the node is in state SELF_DOWN_PEER_LEAVING and the received
         packet confirms that the peer has lost contact, there will be no
         further action in this function. To make this clearer, we return
         from the function directly after the state change.
      
      2) Since commit 0f8b8e28 ("tipc: eliminate risk of stalled
         link synchronization") only the logically first TUNNEL_PROTO/SYNCH
         packet can alter the link state and set the synch point,
         independently of arrival order. Hence, there is not any longer any
         need to adjust the synch value in case such packets arrive in
         disorder. We remove this adjustment.
      
      3) It is the intention that any message arriving on any of the links
         may trig a check for and possible termination of a node SYNCH state.
         A redundant and unnoticed check for tipc_link_is_synching() obviously
         beats this purpose, with the effect that only packets arriving on the
         synching link may currently end the synch state. We remove this check.
         This change will further shorten the synchronization period between
         parallel links.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5c10e979
  23. 25 10月, 2015 1 次提交
  24. 24 10月, 2015 4 次提交
    • J
      tipc: clean up unused code and structures · 2af5ae37
      Jon Paul Maloy 提交于
      After the previous changes in this series, we can now remove some
      unused code and structures, both in the broadcast, link aggregation
      and link code.
      
      There are no functional changes in this commit.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2af5ae37
    • J
      tipc: ensure binding table initial distribution is sent via first link · c49a0a84
      Jon Paul Maloy 提交于
      Correct synchronization of the broadcast link at first contact between
      two nodes is dependent on the assumption that the binding table "bulk"
      update passes via the same link as the initial broadcast syncronization
      message, i.e., via the first link that is established.
      
      This is not guaranteed in the current implementation. If two link
      come up very close to each other in time, the "bulk" may quite well
      pass via the second link, and hence void the guarantee of a correct
      initial synchronization before the broadcast link is opened.
      
      This commit makes two small changes to strengthen this guarantee.
      
      1) We let the second established link occupy slot 1 of the
         "active_links" array, while the first link will retain slot 0.
         (This is in reality a cosmetic change, we could just as well keep
          the current, opposite order)
      
      2) We let the name distributor always use link selector/slot 0 when
         it sends it binding table updates.
      
      The extra traffic bias on the first link caused by this change should
      be negligible, since binding table updates constitutes a very small
      fraction of the total traffic.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c49a0a84
    • J
      tipc: eliminate link's reference to owner node · c72fa872
      Jon Paul Maloy 提交于
      With the recent commit series, we have established a one-way dependency
      between the link aggregation (struct tipc_node) instances and their
      pertaining tipc_link instances. This has enabled quite significant code
      and structure simplifications.
      
      In this commit, we eliminate the field 'owner', which points to an
      instance of struct tipc_node, from struct tipc_link, and replace it with
      a pointer to struct net, which is the only external reference now needed
      by a link instance.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c72fa872
    • J
      tipc: simplify bearer level broadcast · b06b281e
      Jon Paul Maloy 提交于
      Until now, we have been keeping track of the exact set of broadcast
      destinations though the help structure tipc_node_map. This leads us to
      have to maintain a whole infrastructure for supporting this, including
      a pseudo-bearer and a number of functions to manipulate both the bearers
      and the node map correctly. Apart from the complexity, this approach is
      also limiting, as struct tipc_node_map only can support cluster local
      broadcast if we want to avoid it becoming excessively large. We want to
      eliminate this limitation, in order to enable introduction of scoped
      multicast in the future.
      
      A closer analysis reveals that it is unnecessary maintaining this "full
      set" overview; it is sufficient to keep a counter per bearer, indicating
      how many nodes can be reached via this bearer at the moment. The protocol
      is now robust enough to handle transitional discrepancies between the
      nominal number of reachable destinations, as expected by the broadcast
      protocol itself, and the number which is actually reachable at the
      moment. The initial broadcast synchronization, in conjunction with the
      retransmission mechanism, ensures that all packets will eventually be
      acknowledged by the correct set of destinations.
      
      This commit introduces these changes.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b06b281e