1. 26 3月, 2015 3 次提交
    • J
      tipc: eliminate race condition at dual link establishment · 8b4ed863
      Jon Paul Maloy 提交于
      Despite recent improvements, the establishment of dual parallel
      links still has a small glitch where messages can bypass each
      other. When the second link in a dual-link configuration is
      established, part of the first link's traffic will be steered over
      to the new link. Although we do have a mechanism to ensure that
      packets sent before and after the establishment of the new link
      arrive in sequence to the destination node, this is not enough.
      The arriving messages will still be delivered upwards in different
      threads, something entailing a risk of message disordering during
      the transition phase.
      
      To fix this, we introduce a synchronization mechanism between the
      two parallel links, so that traffic arriving on the new link cannot
      be added to its input queue until we are guaranteed that all
      pre-establishment messages have been delivered on the old, parallel
      link.
      
      This problem seems to always have been around, but its occurrence is
      so rare that it has not been noticed until recent intensive testing.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b4ed863
    • J
      tipc: clean up handling of link congestion · 3127a020
      Jon Paul Maloy 提交于
      After the recent changes in message importance handling it becomes
      possible to simplify handling of messages and sockets when we
      encounter link congestion.
      
      We merge the function tipc_link_cong() into link_schedule_user(),
      and simplify the code of the latter. The code should now be
      easier to follow, especially regarding return codes and handling
      of the message that caused the situation.
      
      In case the scheduling function is unable to pre-allocate a wakeup
      message buffer, it now returns -ENOBUFS, which is a more correct
      code than the previously used -EHOSTUNREACH.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3127a020
    • J
      tipc: introduce starvation free send algorithm · 1f66d161
      Jon Paul Maloy 提交于
      Currently, we only use a single counter; the length of the backlog
      queue, to determine whether a message should be accepted to the queue
      or not. Each time a message is being sent, the queue length is compared
      to a threshold value for the message's importance priority. If the queue
      length is beyond this threshold, the message is rejected. This algorithm
      implies a risk of starvation of low importance senders during very high
      load, because it may take a long time before the backlog queue has
      decreased enough to accept a lower level message.
      
      We now eliminate this risk by introducing a counter for each importance
      priority. When a message is sent, we check only the queue level for that
      particular message's priority. If that is ok, the message can be added
      to the backlog, irrespective of the queue level for other priorities.
      This way, each level is guaranteed a certain portion of the total
      bandwidth, and any risk of starvation is eliminated.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f66d161
  2. 20 3月, 2015 1 次提交
  3. 15 3月, 2015 6 次提交
    • J
      tipc: clean up handling of message priorities · e3eea1eb
      Jon Paul Maloy 提交于
      Messages transferred by TIPC are assigned an "importance priority", -an
      integer value indicating how to treat the message when there is link or
      destination socket congestion.
      
      There is no separate header field for this value. Instead, the message
      user values have been chosen in ascending order according to perceived
      importance, so that the message user field can be used for this.
      
      This is not a good solution. First, we have many more users than the
      needed priority levels, so we end up with treating more priority
      levels than necessary. Second, the user field cannot always
      accurately reflect the priority of the message. E.g., a message
      fragment packet should really have the priority of the enveloped
      user data message, and not the priority of the MSG_FRAGMENTER user.
      Until now, we have been working around this problem in different ways,
      but it is now time to implement a consistent way of handling such
      priorities, although still within the constraint that we cannot
      allocate any more bits in the regular data message header for this.
      
      In this commit, we define a new priority level, TIPC_SYSTEM_IMPORTANCE,
      that will be the only one used apart from the four (lower) user data
      levels. All non-data messages map down to this priority. Furthermore,
      we take some free bits from the MSG_FRAGMENTER header and allocate
      them to store the priority of the enveloped message. We then adjust
      the functions msg_importance()/msg_set_importance() so that they
      read/set the correct header fields depending on user type.
      
      This small protocol change is fully compatible, because the code at
      the receiving end of a link currently reads the importance level
      only from user data messages, where there is no change.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3eea1eb
    • J
      tipc: split link outqueue · 05dcc5aa
      Jon Paul Maloy 提交于
      struct tipc_link contains one single queue for outgoing packets,
      where both transmitted and waiting packets are queued.
      
      This infrastructure is hard to maintain, because we need
      to keep a number of fields to keep track of which packets are
      sent or unsent, and the number of packets in each category.
      
      A lot of code becomes simpler if we split this queue into a transmission
      queue, where sent/unacknowledged packets are kept, and a backlog queue,
      where we keep the not yet sent packets.
      
      In this commit we do this separation.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05dcc5aa
    • J
      tipc: eliminate unnecessary call to broadcast ack function · 2cdf3918
      Jon Paul Maloy 提交于
      The unicast packet header contains a broadcast acknowledge sequence
      number, that may need to be conveyed to the broadcast link for proper
      treatment. Currently, the function tipc_rcv(), which is on the most
      critical data path, calls the function tipc_bclink_acknowledge() to
      have this done. This call is made for each received packet, and results
      in the unconditional grabbing of the broadcast link spinlock.
      
      This is unnecessary, since we can see directly from tipc_rcv() if
      the acknowledged number differs from what has been previously acked
      from the node in question. In the vast majority of cases the numbers
      won't differ, and there is nothing to update.
      
      We now make the call to tipc_bclink_acknowledge() conditional
      to that the two ack values differ.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2cdf3918
    • J
      tipc: extract bundled buffers by cloning instead of copying · c1336ee4
      Jon Paul Maloy 提交于
      When we currently extract a bundled buffer from a message bundle in
      the function tipc_msg_extract(), we allocate a new buffer and explicitly
      copy the linear data area.
      
      This is unnecessary, since we can just clone the buffer and do
      skb_pull() on the clone to move the data pointer to the correct
      position.
      
      This is what we do in this commit.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c1336ee4
    • J
      tipc: eliminate unnecessary linearization of incoming buffers · 1149557d
      Jon Paul Maloy 提交于
      Currently, TIPC linearizes all incoming buffers directly at reception
      before passing them upwards in the stack. This is clearly a waste of
      CPU resources, and must be avoided.
      
      In this commit, we eliminate this unnecessary linearization. We still
      ensure that at least the message header is linear, and that the buffer
      is linearized where this is still needed, i.e. when unbundling and when
      reversing messages.
      
      In addition, we ensure that fragmented messages are validated after
      reassembly before delivering them upwards in the stack.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1149557d
    • J
      tipc: move message validation function to msg.c · cf2157f8
      Jon Paul Maloy 提交于
      The function link_buf_validate() is in reality re-entrant and context
      independent, and will in later commits be called from several locations.
      Therefore, we move it to msg.c, make it outline and rename the it to
      tipc_msg_validate().
      
      We also redesign the function to make proper use of pskb_may_pull()
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf2157f8
  4. 11 3月, 2015 1 次提交
    • J
      tipc: ensure that idle links are deleted when a bearer is disabled · 169bf912
      Jon Paul Maloy 提交于
      commit afaa3f65
      (tipc: purge links when bearer is disabled) was an attempt to resolve
      a problem that turned out to have a more profound reason.
      
      When we disable a bearer, we delete all its pertaining links if
      there is no other bearer to perform failover to, or if the module
      is shutting down. In case there are dual bearers, we wait with
      deleting links until the failover procedure is finished.
      
      However, this misses the case when a link on the removed bearer
      was already down, so that there will be no failover procedure to
      finish the link delete. This causes confusion if a new bearer is
      added to replace the removed one, and also entails a small memory
      leak.
      
      This commit takes the current state of the link into account when
      deciding when to delete it, and also reverses the above-mentioned
      commit.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      169bf912
  5. 10 3月, 2015 1 次提交
    • J
      tipc: fix bug in link failover handling · e6441bae
      Jon Paul Maloy 提交于
      In commit c637c103
      ("tipc: resolve race problem at unicast message reception") we
      introduced a new mechanism for delivering buffers upwards from link
      to socket layer.
      
      That code contains a bug in how we handle the new link input queue
      during failover. When a link is reset, some of its users may be blocked
      because of congestion, and in order to resolve this, we add any pending
      wakeup pseudo messages to the link's input queue, and deliver them to
      the socket. This misses the case where the other, remaining link also
      may have congested users. Currently, the owner node's reference to the
      remaining link's input queue is unconditionally overwritten by the
      reset link's input queue. This has the effect that wakeup events from
      the remaining link may be unduely delayed (but not lost) for a
      potentially long period.
      
      We fix this by adding the pending events from the reset link to the
      input queue that is currently referenced by the node, whichever one
      it is.
      
      This commit should be applied to both net and net-next.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6441bae
  6. 10 2月, 2015 5 次提交
  7. 06 2月, 2015 2 次提交
    • J
      tipc: resolve race problem at unicast message reception · c637c103
      Jon Paul Maloy 提交于
      TIPC handles message cardinality and sequencing at the link layer,
      before passing messages upwards to the destination sockets. During the
      upcall from link to socket no locks are held. It is therefore possible,
      and we see it happen occasionally, that messages arriving in different
      threads and delivered in sequence still bypass each other before they
      reach the destination socket. This must not happen, since it violates
      the sequentiality guarantee.
      
      We solve this by adding a new input buffer queue to the link structure.
      Arriving messages are added safely to the tail of that queue by the
      link, while the head of the queue is consumed, also safely, by the
      receiving socket. Sequentiality is secured per socket by only allowing
      buffers to be dequeued inside the socket lock. Since there may be multiple
      simultaneous readers of the queue, we use a 'filter' parameter to reduce
      the risk that they peek the same buffer from the queue, hence also
      reducing the risk of contention on the receiving socket locks.
      
      This solves the sequentiality problem, and seems to cause no measurable
      performance degradation.
      
      A nice side effect of this change is that lock handling in the functions
      tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that
      will enable future simplifications of those functions.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c637c103
    • J
      tipc: reduce usage of context info in socket and link · c5898636
      Jon Paul Maloy 提交于
      The most common usage of namespace information is when we fetch the
      own node addess from the net structure. This leads to a lot of
      passing around of a parameter of type 'struct net *' between
      functions just to make them able to obtain this address.
      
      However, in many cases this is unnecessary. The own node address
      is readily available as a member of both struct tipc_sock and
      tipc_link, and can be fetched from there instead.
      The fact that the vast majority of functions in socket.c and link.c
      anyway are maintaining a pointer to their respective base structures
      makes this option even more compelling.
      
      In this commit, we introduce the inline functions tsk_own_node()
      and link_own_node() to make it easy for functions to fetch the node
      address from those structs instead of having to pass along and
      dereference the namespace struct.
      
      In particular, we make calls to the msg_xx() functions in msg.{h,c}
      context independent by directly passing them the own node address
      as parameter when needed. Those functions should be regarded as
      leaves in the code dependency tree, and it is hence desirable to
      keep them namspace unaware.
      
      Apart from a potential positive effect on cache behavior, these
      changes make it easier to introduce the changes that will follow
      later in this series.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5898636
  8. 05 2月, 2015 2 次提交
    • J
      tipc: separate link starting event from link timeout event · af9946fd
      Jon Paul Maloy 提交于
      When a new link instance is created, it is trigged to start by
      sending it a TIPC_STARTING_EVT, whereafter a regular link
      reset is applied to it.
      
      The starting event is codewise treated as a timeout event, and prompts
      a link RESET message to be sent to the peer node, carrying a link
      session identifier. The later link_reset() call nudges this session
      identifier, whereafter all subsequent RESET messages will be sent out
      with the new identifier. The latter session number overrides the former,
      causing the peer to unconditionally accept it irrespective of its
      current working state.
      
      We don't think that this causes any problem, but it is not in accordance
      with the protocol spec, and may cause confusion when debugging TIPC
      sessions.
      
      To avoid this, we make the starting event distinct from the
      subsequent timeout events, by not allowing the former to send
      out any RESET message. This eliminates the described problem.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af9946fd
    • J
      tipc: add reference count to struct tipc_link · 2d72d495
      Jon Paul Maloy 提交于
      When a bearer is disabled, all pertaining links will be reset and
      deleted. However, if there is a second active link towards a killed
      link's destination, the delete has to be postponed until the failover
      is finished. During this interval, we currently put the link in zombie
      mode, i.e., we take it out of traffic, delete its timer, but leave it
      attached to the owner node structure until all missing packets have
      been received.  When this is done, we detach the link from its node
      and delete it, assuming that the synchronous timer deletion that was
      initiated earlier in a different thread has finished.
      
      This is unsafe, as the failover may finish before del_timer_sync()
      has returned in the other thread.
      
      We fix this by adding an atomic reference counter of type kref in
      struct tipc_link. The counter keeps track of the references kept
      to the link by the owner node and the timer. We then do a conditional
      delete, based on the reference counter, both after the failover has
      been finished and when the timer expires, if applicable. Whoever
      comes last, will actually delete the link. This approach also implies
      that we can make the deletion of the timer asynchronous.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d72d495
  9. 27 1月, 2015 1 次提交
  10. 13 1月, 2015 8 次提交
  11. 01 1月, 2015 1 次提交
  12. 11 12月, 2014 1 次提交
  13. 27 11月, 2014 8 次提交