1. 14 2月, 2014 1 次提交
    • J
      tipc: delay delete of link when failover is needed · 7d33939f
      Jon Paul Maloy 提交于
      When a bearer is disabled, all its attached links are deleted.
      Ideally, we should do link failover to redundant links on other bearers,
      if there are any, in such cases. This would be consistent with current
      behavior when a link is reset, but not deleted. However, due to the
      complexity involved, and the (wrongly) perceived low demand for this
      feature, it was never implemented until now.
      
      We mark the doomed link for deletion with a new flag, but wait until the
      failover process is finished before we actually delete it. With the
      improved link tunnelling/failover code introduced earlier in this commit
      series, it is now easy to identify a spot in the code where the failover
      is finished and it is safe to delete the marked link. Moreover, the test
      for the flag and the deletion can be done synchronously, and outside the
      most time critical data path.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d33939f
  2. 08 1月, 2014 1 次提交
  3. 05 1月, 2014 1 次提交
  4. 11 12月, 2013 1 次提交
  5. 08 11月, 2013 1 次提交
    • E
      tipc: message reassembly using fragment chain · 40ba3cdf
      Erik Hugne 提交于
      When the first fragment of a long data data message is received on a link, a
      reassembly buffer large enough to hold the data from this and all subsequent
      fragments of the message is allocated. The payload of each new fragment is
      copied into this buffer upon arrival. When the last fragment is received, the
      reassembled message is delivered upwards to the port/socket layer.
      
      Not only is this an inefficient approach, but it may also cause bursts of
      reassembly failures in low memory situations. since we may fail to allocate
      the necessary large buffer in the first place. Furthermore, after 100 subsequent
      such failures the link will be reset, something that in reality aggravates the
      situation.
      
      To remedy this problem, this patch introduces a different approach. Instead of
      allocating a big reassembly buffer, we now append the arriving fragments
      to a reassembly chain on the link, and deliver the whole chain up to the
      socket layer once the last fragment has been received. This is safe because
      the retransmission layer of a TIPC link always delivers packets in strict
      uninterrupted order, to the reassembly layer as to all other upper layers.
      Hence there can never be more than one fragment chain pending reassembly at
      any given time in a link, and we can trust (but still verify) that the
      fragments will be chained up in the correct order.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40ba3cdf
  6. 28 2月, 2013 1 次提交
    • S
      hlist: drop the node parameter from iterators · b67bfe0d
      Sasha Levin 提交于
      I'm not sure why, but the hlist for each entry iterators were conceived
      
              list_for_each_entry(pos, head, member)
      
      The hlist ones were greedy and wanted an extra parameter:
      
              hlist_for_each_entry(tpos, pos, head, member)
      
      Why did they need an extra pos parameter? I'm not quite sure. Not only
      they don't really need it, it also prevents the iterator from looking
      exactly like the list iterator, which is unfortunate.
      
      Besides the semantic patch, there was some manual work required:
      
       - Fix up the actual hlist iterators in linux/list.h
       - Fix up the declaration of other iterators based on the hlist ones.
       - A very small amount of places were using the 'node' parameter, this
       was modified to use 'obj->member' instead.
       - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
       properly, so those had to be fixed up manually.
      
      The semantic patch which is mostly the work of Peter Senna Tschudin is here:
      
      @@
      iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
      
      type T;
      expression a,c,d,e;
      identifier b;
      statement S;
      @@
      
      -T b;
          <+... when != b
      (
      hlist_for_each_entry(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue(a,
      - b,
      c) S
      |
      hlist_for_each_entry_from(a,
      - b,
      c) S
      |
      hlist_for_each_entry_rcu(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_rcu_bh(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue_rcu_bh(a,
      - b,
      c) S
      |
      for_each_busy_worker(a, c,
      - b,
      d) S
      |
      ax25_uid_for_each(a,
      - b,
      c) S
      |
      ax25_for_each(a,
      - b,
      c) S
      |
      inet_bind_bucket_for_each(a,
      - b,
      c) S
      |
      sctp_for_each_hentry(a,
      - b,
      c) S
      |
      sk_for_each(a,
      - b,
      c) S
      |
      sk_for_each_rcu(a,
      - b,
      c) S
      |
      sk_for_each_from
      -(a, b)
      +(a)
      S
      + sk_for_each_from(a) S
      |
      sk_for_each_safe(a,
      - b,
      c, d) S
      |
      sk_for_each_bound(a,
      - b,
      c) S
      |
      hlist_for_each_entry_safe(a,
      - b,
      c, d, e) S
      |
      hlist_for_each_entry_continue_rcu(a,
      - b,
      c) S
      |
      nr_neigh_for_each(a,
      - b,
      c) S
      |
      nr_neigh_for_each_safe(a,
      - b,
      c, d) S
      |
      nr_node_for_each(a,
      - b,
      c) S
      |
      nr_node_for_each_safe(a,
      - b,
      c, d) S
      |
      - for_each_gfn_sp(a, c, d, b) S
      + for_each_gfn_sp(a, c, d) S
      |
      - for_each_gfn_indirect_valid_sp(a, c, d, b) S
      + for_each_gfn_indirect_valid_sp(a, c, d) S
      |
      for_each_host(a,
      - b,
      c) S
      |
      for_each_host_safe(a,
      - b,
      c, d) S
      |
      for_each_mesh_entry(a,
      - b,
      c, d) S
      )
          ...+>
      
      [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
      [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: fix warnings]
      [akpm@linux-foudnation.org: redo intrusive kvm changes]
      Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b67bfe0d
  7. 23 11月, 2012 1 次提交
    • J
      tipc: introduce message to synchronize broadcast link · c64f7a6a
      Jon Maloy 提交于
      Upon establishing a first link between two nodes, there is
      currently a risk that the two endpoints will disagree on exactly
      which sequence number reception and acknowleding of broadcast
      packets should start.
      
      The following scenarios may happen:
      
      1: Node A sends an ACTIVATE message to B, telling it to start acking
         packets from sequence number N.
      2: Node A sends out broadcast N, but does not expect an acknowledge
         from B, since B is not yet in its broadcast receiver's list.
      3: Node A receives ACK for N from all nodes except B, and releases
         packet N.
      4: Node B receives the ACTIVATE, activates its link endpoint, and
         stores the value N as sequence number of first expected packet.
      5: Node B sends a NAME_DISTR message to A.
      6: Node A receives the NAME_DISTR message, and activates its endpoint.
         At this moment B is added to A's broadcast receiver's set.
         Node A also sets sequence number 0 as the first broadcast packet
         to be received from B.
      7: Node A sends broadcast N+1.
      8: B receives N+1, determines there is a gap in the sequence, since
         it is expecting N, and sends a NACK for N back to A.
      9: Node A has already released N, so no retransmission is possible.
         The broadcast link in direction A->B is stale.
      
      In addition to, or instead of, 7-9 above, the following may happen:
      
      10: Node B sends broadcast M > 0 to A.
      11: Node A receives M, falsely decides there must be a gap, since
          it is expecting packet 0, and asks for retransmission of packets
          [0,M-1].
      12: Node B has already released these packets, so the broadcast
          link is stale in direction B->A.
      
      We solve this problem by introducing a new unicast message type,
      BCAST_PROTOCOL/STATE, to convey the sequence number of the next
      sent broadcast packet to the other endpoint, at exactly the moment
      that endpoint is added to the own node's broadcast receivers list,
      and before any other unicast messages are permitted to be sent.
      
      Furthermore, we don't allow any node to start receiving and
      processing broadcast packets until this new synchronization
      message has been received.
      
      To maintain backwards compatibility, we still open up for
      broadcast reception if we receive a NAME_DISTR message without
      any preceding broadcast sync message. In this case, we must
      assume that the other end has an older code version, and will
      never send out the new synchronization message. Hence, for mixed
      old and new nodes, the issue arising in 7-12 of the above may
      happen with the same probability as before.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      c64f7a6a
  8. 22 11月, 2012 2 次提交
  9. 14 7月, 2012 1 次提交
  10. 01 5月, 2012 1 次提交
    • P
      tipc: compress out gratuitous extra carriage returns · 617d3c7a
      Paul Gortmaker 提交于
      Some of the comment blocks are floating in limbo between two
      functions, or between blocks of code.  Delete the extra line
      feeds between any comment and its associated following block
      of code, to be consistent with the majority of the rest of
      the kernel.  Also delete trailing newlines at EOF and fix
      a couple trivial typos in existing comments.
      
      This is a 100% cosmetic change with no runtime impact.  We get
      rid of over 500 lines of non-code, and being blank line deletes,
      they won't even show up as noise in git blame.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      617d3c7a
  11. 24 4月, 2012 1 次提交
  12. 20 4月, 2012 1 次提交
    • A
      tipc: Add routines for safe checking of node's network address · 336ebf5b
      Allan Stephens 提交于
      Introduces routines that test whether a given network address is
      equal to a node's own network address or if it lies within the node's
      own network cluster, and which work properly regardless of whether
      the node is using the default network address <0.0.0> or a non-zero
      network address that is assigned later on. In essence, these routines
      ensure that address <0.0.0> is treated as an alias for "this node",
      regardless of which network address the node is actually using.
      
      Old users of the pre-existing more strict match in_own_cluster()
      have been accordingly redirected to what is now called
      in_own_cluster_exact() --- which does not extend matching to <0,0,0>.
      Signed-off-by: NAllan Stephens <allan.stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      336ebf5b
  13. 25 2月, 2012 6 次提交
  14. 07 2月, 2012 3 次提交
    • A
      tipc: Remove obsolete broadcast tag capability · 1ec2bb08
      Allan Stephens 提交于
      Eliminates support for the broadcast tag field, which is no longer
      used by broadcast link NACK messages.
      Signed-off-by: NAllan Stephens <allan.stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      1ec2bb08
    • A
      tipc: Major redesign of broadcast link ACK/NACK algorithms · 7a54d4a9
      Allan Stephens 提交于
      Completely redesigns broadcast link ACK and NACK mechanisms to prevent
      spurious retransmit requests in dual LAN networks, and to prevent the
      broadcast link from stalling due to the failure of a receiving node to
      acknowledge receiving a broadcast message or request its retransmission.
      
      Note: These changes only impact the timing of when ACK and NACK messages
      are sent, and not the basic broadcast link protocol itself, so inter-
      operability with nodes using the "classic" algorithms is maintained.
      
      The revised algorithms are as follows:
      
      1) An explicit ACK message is still sent after receiving 16 in-sequence
      messages, and implicit ACK information continues to be carried in other
      unicast link message headers (including link state messages).  However,
      the timing of explicit ACKs is now based on the receiving node's absolute
      network address rather than its relative network address to ensure that
      the failure of another node does not delay the ACK beyond its 16 message
      target.
      
      2) A NACK message is now typically sent only when a message gap persists
      for two consecutive incoming link state messages; this ensures that a
      suspected gap is not confirmed until both LANs in a dual LAN network have
      had an opportunity to deliver the message, thereby preventing spurious NACKs.
      A NACK message can also be generated by the arrival of a single link state
      message, if the deferred queue is so big that the current message gap
      cannot be the result of "normal" mis-ordering due to the use of dual LANs
      (or one LAN using a bonded interface). Since link state messages typically
      arrive at different nodes at different times the problem of multiple nodes
      issuing identical NACKs simultaneously is inherently avoided.
      
      3) Nodes continue to "peek" at NACK messages sent by other nodes. If
      another node requests retransmission of a message gap suspected (but not
      yet confirmed) by the peeking node, the peeking node forgets about the
      gap and does not generate a duplicate retransmit request. (If the peeking
      node subsequently fails to receive the lost message, later link state
      messages will cause it to rediscover and confirm the gap and send another
      NACK.)
      
      4) Message gap "equality" is now determined by the start of the gap only.
      This is sufficient to deal with the most common cases of message loss,
      and eliminates the need for complex end of gap computations.
      
      5) A peeking node no longer tries to determine whether it should send a
      complementary NACK, since the most common cases of message loss don't
      require it to be sent. Consequently, the node no longer examines the
      "broadcast tag" field of a NACK message when peeking.
      Signed-off-by: NAllan Stephens <allan.stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      7a54d4a9
    • A
      tipc: Ensure broadcast link re-acquires node after link failure · 93499313
      Allan Stephens 提交于
      Fix a bug that can prevent TIPC from sending broadcast messages to a node
      if contact with the node is lost and then regained. The problem occurs if
      the broadcast link first clears the flag indicating the node is part of the
      link's distribution set (when it loses contact with the node), and later
      fails to restore the flag (when contact is regained); restoration fails
      if contact with the node is regained by implicit unicast link activation
      triggered by the arrival of a data message, rather than explicitly by the
      arrival of a link activation message.
      
      The broadcast link now uses separate fields to track whether a node is
      theoretically capable of receiving broadcast messages versus whether it is
      actually part of the link's distribution set. The former member is updated
      by the receipt of link protocol messages, which can occur at any time; the
      latter member is updated only when contact with the node is gained or lost.
      This change also permits the simplification of several conditional
      expressions since the broadcast link's "supported" field can now only be
      set if there are working links to the associated node.
      Signed-off-by: NAllan Stephens <allan.stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      93499313
  15. 30 12月, 2011 1 次提交
  16. 28 12月, 2011 2 次提交
  17. 18 9月, 2011 1 次提交
    • A
      tipc: Ensure both nodes recognize loss of contact between them · b4b56102
      Allan Stephens 提交于
      Enhances TIPC to ensure that a node that loses contact with a
      neighboring node does not allow contact to be re-established until
      it sees that its peer has also recognized the loss of contact.
      
      Previously, nodes that were connected by two or more links could
      encounter a situation in which node A would lose contact with node B
      on all of its links, purge its name table of names published by B,
      and then fail to repopulate those names once contact with B was restored.
      This would happen because B was able to re-establish one or more links
      so quickly that it never reached a point where it had no links to A --
      meaning that B never saw a loss of contact with A, and consequently
      didn't re-publish its names to A.
      
      This problem is now prevented by enhancing the cleanup done by TIPC
      following a loss of contact with a neighboring node to ensure that
      node A ignores all messages sent by B until it receives a LINK_PROTOCOL
      message that indicates B has lost contact with A, thereby preventing
      the (re)establishment of links between the nodes. The loss of contact
      is recognized when a RESET or ACTIVATE message is received that has
      a "redundant link exists" field of 0, indicating that B's sending link
      endpoint is in a reset state and that B has no other working links.
      
      Additionally, TIPC now suppresses the sending of (most) link protocol
      messages to a neighboring node while it is cleaning up after an earlier
      loss of contact with that node. This stops the peer node from prematurely
      activating its link endpoint, which would prevent TIPC from later
      activating its own end. TIPC still allows outgoing RESET messages to
      occur during cleanup, to avoid problems if its own node recognizes
      the loss of contact first and tries to notify the peer of the situation.
      
      Finally, TIPC now recognizes an impending loss of contact with a peer node
      as soon as it receives a RESET message on a working link that is the
      peer's only link to the node, and ensures that the link protocol
      suppression mentioned above goes into effect right away -- that is,
      even before its own link endpoints have failed. This is necessary to
      ensure correct operation when there are redundant links between the nodes,
      since otherwise TIPC would send an ACTIVATE message upon receiving a RESET
      on its first link and only begin suppressing when a RESET on its second
      link was received, instead of initiating suppression with the first RESET
      message as it needs to.
      
      Note: The reworked cleanup code also eliminates a check that prevented
      a link endpoint's discovery object from responding to incoming messages
      while stale name table entries are being purged. This check is now
      unnecessary and would have slowed down re-establishment of communication
      between the nodes in some situations.
      Signed-off-by: NAllan Stephens <allan.stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      b4b56102
  18. 01 9月, 2011 2 次提交
    • A
      tipc: Prevent broadcast link stalling when another node fails · 169073db
      Allan Stephens 提交于
      Ensure that broadcast link messages that have not been acknowledged
      by a newly failed node do not get an implied acknowledgement until the
      failed node is removed from the broadcast link's map of reachable nodes.
      
      Previously, a race condition allowed a new broadcast link message to be
      sent after the implicit acknowledgement processing was completed, but
      before the map of reachable nodes was updated, resulting in the message
      having an expected acknowledgement count that required the failed node
      to explicitly acknowledge the message. Since this would never occur
      the new message would remain in the broadcast link's transmit queue
      forever, eventually causing the link to become congested and "stall".
      Delaying the implicit acknowledgement processing until after the update
      of the map of reachable nodes eliminates this race condition and prevents
      stalling.
      Signed-off-by: NAllan Stephens <allan.stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      169073db
    • A
      tipc: Enhance cleanup of broadcast link when contact with node is lost · c5bd4d85
      Allan Stephens 提交于
      Enhances cleanup of broadcast link-related information when contact
      with a node is lost.
      
      1) All broadcast link-related cleanup now occurs only if the lost node
         was capable of communicating over the broadcast link.
      
      2) Following cleanup, the lost node is marked as no longer supporting
         the broadcast link, ensuring that any remaining broadcast messages
         received from that node prior to the re-establishment of a normal
         communication link are ignored.
      
      Thanks to Surya [Suryanarayana.Garlapati@emerson.com] for contributing
      a prototype version of this patch.
      Signed-off-by: NAllan Stephens <allan.stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      c5bd4d85
  19. 14 3月, 2011 10 次提交
  20. 24 2月, 2011 1 次提交
    • A
      tipc: Combine bearer structure with tipc_bearer structure · 2d627b92
      Allan Stephens 提交于
      Combines two distinct structures containing information about a TIPC bearer
      into a single structure. The structures were previously kept separate so
      that public information about a bearer could be made available to plug-in
      media types using TIPC's native API, while the remaining information was
      kept private for use by TIPC itself. However, now that the native API has
      been removed there is no longer any need for this arrangement.
      
      Since one of the structures was already embedded within the other, the
      change largely involves replacing instances of "publ.foo" with "foo".
      The changes do not otherwise alter the operation of TIPC bearers.
      Signed-off-by: NAllan Stephens <Allan.Stephens@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      2d627b92
  21. 02 1月, 2011 1 次提交