- 24 10月, 2015 2 次提交
-
-
由 Jon Paul Maloy 提交于
The broadcast lock will need to be acquired outside bcast.c in a later commit. For this reason, we move the lock to struct tipc_net. Consistent with the changes in the previous commit, we also introducee two new functions tipc_bcast_lock() and tipc_bcast_unlock(). The code that is currently using tipc_bclink_lock()/unlock() will be phased out during the coming commits in this series. Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Reviewed-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Paul Maloy 提交于
Currently, a number of structure and function definitions related to the broadcast functionality are unnecessarily exposed in the file bcast.h. This obscures the fact that the external interface towards the broadcast link in fact is very narrow, and causes unnecessary recompilations of other files when anything changes in those definitions. In this commit, we move as many of those definitions as is currently possible to the file bcast.c. We also rename the structure 'tipc_bclink' to 'tipc_bc_base', both since the name does not correctly describe the contents of this struct, and will do so even less in the future, and because we want to use the term 'link' more appropriately in the functionality introduced later in this series. Finally, we rename a couple of functions, such as tipc_bclink_xmit() and others that will be kept in the future, to include the term 'bcast' instead. There are no functional changes in this commit. Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Reviewed-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 10月, 2015 3 次提交
-
-
由 Jon Paul Maloy 提交于
In commit d999297c ("tipc: reduce locking scope during packet reception") we altered the packet retransmission function. Since then, when restransmitting packets, we create a clone of the original buffer using __pskb_copy(skb, MIN_H_SIZE), where MIN_H_SIZE is the size of the area we want to have copied, but also the smallest possible TIPC packet size. The value of MIN_H_SIZE is 24. Unfortunately, __pskb_copy() also has the effect that the headroom of the cloned buffer takes the size MIN_H_SIZE. This is too small for carrying the packet over the UDP tunnel bearer, which requires a minimum headroom of 28 bytes. A change to just use pskb_copy() lets the clone inherit the original headroom of 80 bytes, but also assumes that the copied data area is of at least that size, something that is not always the case. So that is not a viable solution. We now fix this by adding a check for sufficient headroom in the transmit function of udp_media.c, and expanding it when necessary. Fixes: commit d999297c ("tipc: reduce locking scope during packet reception") Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Paul Maloy 提交于
The current code for message reassembly is erroneously assuming that the the first arriving fragment buffer always is linear, and then goes ahead resetting the fragment list of that buffer in anticipation of more arriving fragments. However, if the buffer already happens to be non-linear, we will inadvertently drop the already attached fragment list, and later on trig a BUG() in __pskb_pull_tail(). We see this happen when running fragmented TIPC multicast across UDP, something made possible since commit d0f91938 ("tipc: add ip/udp media type") We fix this by not resetting the fragment list when the buffer is non- linear, and by initiatlizing our private fragment list tail pointer to the tail of the existing fragment list. Fixes: commit d0f91938 ("tipc: add ip/udp media type") Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Paul Maloy 提交于
The default fix broadcast window size is currently set to 20 packets. This is a very low value, set at a time when we were still testing on 10 Mb/s hubs, and a change to it is long overdue. Commit 7845989c ("net: tipc: fix stall during bclink wakeup procedure") revealed a problem with this low value. For messages of importance LOW, the backlog queue limit will be calculated to 30 packets, while a single, maximum sized message of 66000 bytes, carried across a 1500 MTU network consists of 46 packets. This leads to the following scenario (among others leading to the same situation): 1: Msg 1 of 46 packets is sent. 20 packets go to the transmit queue, 26 packets to the backlog queue. 2: Msg 2 of 46 packets is attempted sent, but rejected because there is no more space in the backlog queue at this level. The sender is added to the wakeup queue with a "pending packets chain size" number of 46. 3: Some packets in the transmit queue are acked and released. We try to wake up the sender, but the pending size of 46 is bigger than the LOW wakeup limit of 30, so this doesn't happen. 5: Subsequent acks releases all the remaining buffers. Each time we test for the wakeup criteria and find that 46 still is larger than 30, even after both the transmit and the backlog queues are empty. 6: The sender is never woken up and given a chance to send its message. He is stuck. We could now loosen the wakeup criteria (used by link_prepare_wakeup()) to become equal to the send criteria (used by tipc_link_xmit()), i.e., by ignoring the "pending packets chain size" value altogether, or we can just increase the queue limits so that the criteria can be satisfied anyway. There are good reasons (potentially multiple waiting senders) to not opt for the former solution, so we choose the latter one. This commit fixes the problem by giving the broadcast link window a default value of 50 packets. We also introduce a new minimum link window size BCLINK_MIN_WIN of 32, which is enough to always avoid the described situation. Finally, in order to not break any existing users which may set the window explicitly, we enforce that the window is set to the new minimum value in case the user is trying to set it to anything lower. Fixes: 7845989c ("net: tipc: fix stall during bclink wakeup procedure") Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Reviewed-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 10月, 2015 7 次提交
-
-
由 Jon Paul Maloy 提交于
The change made in the previous commit revealed a small flaw in the way the node FSM is updated. When the function tipc_node_link_down() is called for the last link to a node, we should check whether this was caused by a local reset or by a received RESET message from the peer. In the latter case, we can directly issue a PEER_LOST_CONTACT_EVT to the node FSM, so that it is ready to re-establish contact. If this is not done, the peer node will sometimes have to go through a second establish cycle before the link becomes stable. We fix this in this commit by conditionally issuing the mentioned event in the function tipc_node_link_down(). We also move LINK_RESET FSM even away from the link_reset() function and into the caller function, partially because it is easier to follow the code when state changes are gathered at a limited number of locations, partially because there will be cases in future commits where we don't want the link to go RESET mode when link_reset() is called. Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Acked-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Paul Maloy 提交于
When a link is taken down because of a node local event, such as disabling of a bearer or an interface, we currently leave it to the peer node to discover the broken communication. The default time for such failure discovery is 1.5-2 seconds. If we instead allow the terminating link endpoint to send out a RESET message at the moment it is reset, we can achieve the impression that both endpoints are going down instantly. Since this is a very common scenario, we find it worthwhile to make this small modification. Apart from letting the link produce the said message, we also have to ensure that the interface is able to transmit it before TIPC is detached. We do this by performing the disabling of a bearer in three steps: 1) Disable reception of TIPC packets from the interface in question. 2) Take down the links, while allowing them so send out a RESET message. 3) Disable transmission of TIPC packets on the interface. Apart from this, we now have to react on the NETDEV_GOING_DOWN event, instead of as currently the NEDEV_DOWN event, to ensure that such transmission is possible during the teardown phase. Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Acked-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Paul Maloy 提交于
Link establishing, just like link teardown, is a non-atomic action, in the sense that discovering that conditions are right to establish a link, and the actual adding of the link to one of the node's send slots is done in two different lock contexts. The link FSM is designed to help bridging the gap between the two contexts in a safe manner. We have now discovered a weakness in the implementaton of this FSM. Because we directly let the link go from state LINK_ESTABLISHING to state LINK_ESTABLISHED already in the first lock context, we are unable to distinguish between a fully established link, i.e., a link that has been added to its slot, and a link that has not yet reached the second lock context. It may hence happen that a manual intervention, e.g., when disabling an interface, causes the function tipc_node_link_down() to try removing the link from the node slots, decrementing its active link counter etc, although the link was never added there in the first place. We solve this by delaying the actual state change until we reach the second lock context, inside the function tipc_node_link_up(). This makes it possible for potentail callers of __tipc_node_link_down() to know if they should proceed or not, and the problem is solved. Unforunately, the situation described above also has a second problem. Since there by necessity is a tipc_node_link_up() call pending once the node lock has been released, we must defuse that call by setting the link back from LINK_ESTABLISHING to LINK_RESET state. This forces us to make a slight modification to the link FSM, which will now look as follows. +------------------------------------+ |RESET_EVT | | | | +--------------+ | +-----------------| SYNCHING |-----------------+ | |FAILURE_EVT +--------------+ PEER_RESET_EVT| | | A | | | | | | | | | | | | | | |SYNCH_ |SYNCH_ | | | |BEGIN_EVT |END_EVT | | | | | | | V | V V | +-------------+ +--------------+ +------------+ | | RESETTING |<---------| ESTABLISHED |--------->| PEER_RESET | | +-------------+ FAILURE_ +--------------+ PEER_ +------------+ | | EVT | A RESET_EVT | | | | | | | | +----------------+ | | | RESET_EVT| |RESET_EVT | | | | | | | | | | |ESTABLISH_EVT | | | | +-------------+ | | | | | | RESET_EVT | | | | | | | | | | | V V V | | | | +-------------+ +--------------+ RESET_EVT| +--->| RESET |--------->| ESTABLISHING |<----------------+ +-------------+ PEER_ +--------------+ | A RESET_EVT | | | | | | | |FAILOVER_ |FAILOVER_ |FAILOVER_ |BEGIN_EVT |END_EVT |BEGIN_EVT | | | V | | +-------------+ | | FAILINGOVER |<----------------+ +-------------+ Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Acked-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Paul Maloy 提交于
After the previous commits, we are guaranteed that no packets of type LINK_PROTOCOL or with illegal sequence numbers will be attempted added to the link deferred queue. This makes it possible to make some simplifications to the sorting algorithm in the function tipc_skb_queue_sorted(). We also alter the function so that it will drop packets if one with the same seqeunce number is already present in the queue. This is necessary because we have identified weird packet sequences, involving duplicate packets, where a legitimate in-sequence packet may advance to the head of the queue without being detected and de-queued. Finally, we make this function outline, since it will now be called only in exceptional cases. Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Acked-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Paul Maloy 提交于
The sequence number of an incoming packet is currently only checked for less than, equality to, or bigger than the next expected number, meaning that the receive window in practice becomes one half sequence number cycle, or U16_MAX/2. This does not make sense, and may not even be safe if there are extreme delays in the network. Any packet sent by the peer during the ongoing cycle must belong inside his current send window, or should otherwise be dropped if possible. Since a link endpoint cannot know its peer's current send window, it has to base this sanity check on a worst-case assumption, i.e., that the peer is using a maximum sized window of 8191 packets. Using this assumption, we now add a check that the sequence number is not bigger than next_expected + TIPC_MAX_LINK_WIN. We also re-order the checks done, so that the receive window test is performed before the gap test. This way, we are guaranteed that no packet with illegal sequence numbers are ever added to the deferred queue. Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Acked-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Paul Maloy 提交于
Currently, all packets received in tipc_link_rcv() are unconditionally added to the packet deferred queue, whereafter that queue is walked and all its buffers evaluated for delivery. This is both non-optimal and and makes the queue sorting function unnecessary complex. This commit changes the loop so that an arrived packet is evaluated first, and added to the deferred queue only when a sequence number gap is discovered. A non-empty deferred queue is walked until it is empty or until its head's sequence number doesn't fit. Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Acked-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Paul Maloy 提交于
During packet reception, the function tipc_link_rcv() adds its accepted packets to a temporary buffer queue, before finally splicing this queue into the lock protected input queue that will be delivered up to the socket layer. The purpose is to reduce potential contention on the input queue lock. However, since the vast majority of packets arrive in sequence, they will anyway be added one by one to the input queue, and the use of the temporary queue becomes a sub-optimization. The only case where this queue makes sense is when unpacking buffers from a bundle packet; here we want to avoid dozens of small buffers to be added individually to the lock-protected input queue in a tight loop. In this commit, we remove the general usage of the temporary queue, and keep it only for the packet unbundling case. Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Acked-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 10月, 2015 1 次提交
-
-
由 Jon Paul Maloy 提交于
In commit e3eea1eb ("tipc: clean up handling of message priorities") we introduced a field in the packet header for keeping track of the priority of fragments, since this value is not present in the specified protocol header. Since the value so far only is used at the transmitting end of the link, we have not yet officially defined it as part of the protocol. Unfortunately, the field we use for keeping this value, bits 13-15 in in word 5, has turned out to be a poor choice; it is already used by the broadcast protocol for carrying the 'network id' field of the sending node. Since packet fragments also need to be transported across the broadcast protocol, the risk of conflict is obvious, and we see this happen when we use network identities larger than 2^13-1. This has escaped our testing because we have so far only been using small network id values. We now move this field to bits 0-2 in word 9, a field that is guaranteed to be unused by all involved protocols. Fixes: e3eea1eb ("tipc: clean up handling of message priorities") Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Acked-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 10月, 2015 1 次提交
-
-
由 Jon Paul Maloy 提交于
In commit 6e498158 ("tipc: move link synch and failover to link aggregation level") we introduced a new mechanism for performing link failover and synchronization. We have now detected a bug in this mechanism. During link synchronization we use the arrival of any packet on the tunnel link to trig a check for whether it has reached the synchronization point or not. This has turned out to be too permissive, since it may cause an arriving non-last SYNCH packet to end the synch state, just to see the next SYNCH packet initiate a new synch state with a new, higher synch point. This is not fatal, but should be avoided, because it may significantly extend the synchronization period, while at the same time we are not allowed to send NACKs if packets are lost. In the worst case, a low-traffic user may see its traffic stall until a LINK_PROTOCOL state message trigs the link to leave synchronization state. At the same time, LINK_PROTOCOL packets which happen to have a (non- valid) sequence number lower than the tunnel link's rcv_nxt value will be consistently dropped, and will never be able to resolve the situation described above. We fix this by exempting LINK_PROTOCOL packets from the sequence number check, as they should be. We also reduce (but don't completely eliminate) the risk of entering multiple synchronization states by only allowing the (logically) first SYNCH packet to initiate a synchronization state. This works independently of actual packet arrival order. Fixes: commit 6e498158 ("tipc: move link synch and failover to link aggregation level") Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Acked-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 9月, 2015 1 次提交
-
-
由 Erik Hugne 提交于
The msg pointer into header may change after skb linearization. We must reinitialize it after calling skb_linearize to prevent operating on a freed or invalid pointer. Signed-off-by: NErik Hugne <erik.hugne@ericsson.com> Reported-by: NTamás Végh <tamas.vegh@ericsson.com> Acked-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 9月, 2015 1 次提交
-
-
由 Kolmakov Dmitriy 提交于
If an attempt to wake up users of broadcast link is made when there is no enough place in send queue than it may hang up inside the tipc_sk_rcv() function since the loop breaks only after the wake up queue becomes empty. This can lead to complete CPU stall with the following message generated by RCU: INFO: rcu_sched self-detected stall on CPU { 0} (t=2101 jiffies g=54225 c=54224 q=11465) Task dump for CPU 0: tpch R running task 0 39949 39948 0x0000000a ffffffff818536c0 ffff88181fa037a0 ffffffff8106a4be 0000000000000000 ffffffff818536c0 ffff88181fa037c0 ffffffff8106d8a8 ffff88181fa03800 0000000000000001 ffff88181fa037f0 ffffffff81094a50 ffff88181fa15680 Call Trace: <IRQ> [<ffffffff8106a4be>] sched_show_task+0xae/0x120 [<ffffffff8106d8a8>] dump_cpu_task+0x38/0x40 [<ffffffff81094a50>] rcu_dump_cpu_stacks+0x90/0xd0 [<ffffffff81097c3b>] rcu_check_callbacks+0x3eb/0x6e0 [<ffffffff8106e53f>] ? account_system_time+0x7f/0x170 [<ffffffff81099e64>] update_process_times+0x34/0x60 [<ffffffff810a84d1>] tick_sched_handle.isra.18+0x31/0x40 [<ffffffff810a851c>] tick_sched_timer+0x3c/0x70 [<ffffffff8109a43d>] __run_hrtimer.isra.34+0x3d/0xc0 [<ffffffff8109aa95>] hrtimer_interrupt+0xc5/0x1e0 [<ffffffff81030d52>] ? native_smp_send_reschedule+0x42/0x60 [<ffffffff81032f04>] local_apic_timer_interrupt+0x34/0x60 [<ffffffff810335bc>] smp_apic_timer_interrupt+0x3c/0x60 [<ffffffff8165a3fb>] apic_timer_interrupt+0x6b/0x70 [<ffffffff81659129>] ? _raw_spin_unlock_irqrestore+0x9/0x10 [<ffffffff8107eb9f>] __wake_up_sync_key+0x4f/0x60 [<ffffffffa313ddd1>] tipc_write_space+0x31/0x40 [tipc] [<ffffffffa313dadf>] filter_rcv+0x31f/0x520 [tipc] [<ffffffffa313d699>] ? tipc_sk_lookup+0xc9/0x110 [tipc] [<ffffffff81659259>] ? _raw_spin_lock_bh+0x19/0x30 [<ffffffffa314122c>] tipc_sk_rcv+0x2dc/0x3e0 [tipc] [<ffffffffa312e7ff>] tipc_bclink_wakeup_users+0x2f/0x40 [tipc] [<ffffffffa313ce26>] tipc_node_unlock+0x186/0x190 [tipc] [<ffffffff81597c1c>] ? kfree_skb+0x2c/0x40 [<ffffffffa313475c>] tipc_rcv+0x2ac/0x8c0 [tipc] [<ffffffffa312ff58>] tipc_l2_rcv_msg+0x38/0x50 [tipc] [<ffffffff815a76d3>] __netif_receive_skb_core+0x5a3/0x950 [<ffffffff815a98d3>] __netif_receive_skb+0x13/0x60 [<ffffffff815a993e>] netif_receive_skb_internal+0x1e/0x90 [<ffffffff815aa138>] napi_gro_receive+0x78/0xa0 [<ffffffffa07f93f4>] tg3_poll_work+0xc54/0xf40 [tg3] [<ffffffff81597c8c>] ? consume_skb+0x2c/0x40 [<ffffffffa07f9721>] tg3_poll_msix+0x41/0x160 [tg3] [<ffffffff815ab0f2>] net_rx_action+0xe2/0x290 [<ffffffff8104b92a>] __do_softirq+0xda/0x1f0 [<ffffffff8104bc26>] irq_exit+0x76/0xa0 [<ffffffff81004355>] do_IRQ+0x55/0xf0 [<ffffffff8165a12b>] common_interrupt+0x6b/0x6b <EOI> The issue occurs only when tipc_sk_rcv() is used to wake up postponed senders: tipc_bclink_wakeup_users() // wakeupq - is a queue which consists of special // messages with SOCK_WAKEUP type. tipc_sk_rcv(wakeupq) ... while (skb_queue_len(inputq)) { filter_rcv(skb) // Here the type of message is checked // and if it is SOCK_WAKEUP then // it tries to wake up a sender. tipc_write_space(sk) wake_up_interruptible_sync_poll() } After the sender thread is woke up it can gather control and perform an attempt to send a message. But if there is no enough place in send queue it will call link_schedule_user() function which puts a message of type SOCK_WAKEUP to the wakeup queue and put the sender to sleep. Thus the size of the queue actually is not changed and the while() loop never exits. The approach I proposed is to wake up only senders for which there is enough place in send queue so the described issue can't occur. Moreover the same approach is already used to wake up senders on unicast links. I have got into the issue on our product code but to reproduce the issue I changed a benchmark test application (from tipcutils/demos/benchmark) to perform the following scenario: 1. Run 64 instances of test application (nodes). It can be done on the one physical machine. 2. Each application connects to all other using TIPC sockets in RDM mode. 3. When setup is done all nodes start simultaneously send broadcast messages. 4. Everything hangs up. The issue is reproducible only when a congestion on broadcast link occurs. For example, when there are only 8 nodes it works fine since congestion doesn't occur. Send queue limit is 40 in my case (I use a critical importance level) and when 64 nodes send a message at the same moment a congestion occurs every time. Signed-off-by: NDmitry S Kolmakov <kolmakov.dmitriy@huawei.com> Reviewed-by: NJon Maloy <jon.maloy@ericsson.com> Acked-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 8月, 2015 3 次提交
-
-
由 Jon Paul Maloy 提交于
Recent changes to the link synchronization means that we can now just drop packets arriving on the synchronizing link before the synch point is reached. This has lead to significant simplifications to the implementation, but also turns out to have a flip side that we need to consider. Under unlucky circumstances, the two endpoints may end up repeatedly dropping each other's packets, while immediately asking for retransmission of the same packets, just to drop them once more. This pattern will eventually be broken when the synch point is reached on the other link, but before that, the endpoints may have arrived at the retransmission limit (stale counter) that indicates that the link should be broken. We see this happen at rare occasions. The fix for this is to not ask for retransmissions when a link is in state LINK_SYNCHING. The fact that the link has reached this state means that it has already received the first SYNCH packet, and that it knows the synch point. Hence, it doesn't need any more packets until the other link has reached the synch point, whereafter it can go ahead and ask for the missing packets. However, because of the reduced traffic on the synching link that follows this change, it may now take longer to discover that the synch point has been reached. We compensate for this by letting all packets, on any of the links, trig a check for synchronization termination. This is possible because the packets themselves don't contain any information that is needed for discovering this condition. Reviewed-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Paul Maloy 提交于
When we introduced the new link failover/synch mechanism in commit 6e498158 ("tipc: move link synch and failover to link aggregation level"), we missed the case when the non-tunnel link goes down during the link synchronization period. In this case the tunnel link will remain in state LINK_SYNCHING, something leading to unpredictable behavior when the failover procedure is initiated. In this commit, we ensure that the node and remaining link goes back to regular communication state (SELF_UP_PEER_UP/LINK_ESTABLISHED) when one of the parallel links goes down. We also ensure that we don't re-enter synch mode if subsequent SYNCH packets arrive on the remaining link. Reviewed-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Paul Maloy 提交于
When a link goes down, and there is still a working link towards its destination node, a failover is initiated, and the failed link is not allowed to re-establish until that procedure is finished. To ensure this, the concerned link endpoints are set to state LINK_FAILINGOVER, and the node endpoints to NODE_FAILINGOVER during the failover period. However, if the link reset is due to a disabled bearer, the corres- ponding link endpoint is deleted, and only the node endpoint knows about the ongoing failover. Now, if the disabled bearer is re-enabled during the failover period, the discovery mechanism may create a new link endpoint that is ready to be established, despite that this is not permitted. This situation may cause both the ongoing failover and any subsequent link synchronization to fail. In this commit, we ensure that a newly created link goes directly to state LINK_FAILINGOVER if the corresponding node state is NODE_FAILINGOVER. This eliminates the problem described above. Furthermore, we tighten the criteria for which packets are allowed to end a failover state in the function tipc_node_check_state(). By checking that the receiving link is up and running, instead of just checking that it is not in failover mode, we eliminate the risk that protocol packets from the re-created link may cause the failover to be prematurely terminated. Reviewed-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 8月, 2015 1 次提交
-
-
由 Richard Alpe 提交于
A zero length payload means that no TLV (Type Length Value) data has been passed. Prior to this patch a non-existing TLV could be sanity checked with TLV_OK() resulting in random behavior where a user sending an empty message occasionally got a incorrect "operation not supported" message back. Signed-off-by: NRichard Alpe <richard.alpe@ericsson.com> Reviewed-by: NErik Hugne <erik.hugne@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 8月, 2015 1 次提交
-
-
由 Roopa Prabhu 提交于
This patch adds net argument to ipv6_stub_impl.ipv6_dst_lookup for use cases where sk is not available (like mpls). sk appears to be needed to get the namespace 'net' and is optional otherwise. This patch series changes ipv6_stub_impl.ipv6_dst_lookup to take net argument. sk remains optional. All callers of ipv6_stub_impl.ipv6_dst_lookup have been modified to pass net. I have modified them to use already available 'net' in the scope of the call. I can change them to sock_net(sk) to avoid any unintended change in behaviour if sock namespace is different. They dont seem to be from code inspection. Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 7月, 2015 12 次提交
-
-
由 Jon Paul Maloy 提交于
We simplify the link creation function tipc_link_create() and the way the link struct it is connected to the node struct. In particular, we remove the duplicate initialization of some fields which are anyway set in tipc_link_reset(). Tested-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Paul Maloy 提交于
Currently, when we extract small messages from a message bundle, or when many messages have accumulated in the link arrival queue, those messages are added one by one to the lock protected link input queue. This may increase contention with the reader of that queue, in the function tipc_sk_rcv(). This commit introduces a temporary, unprotected input queue in tipc_link_rcv() for such cases. Only when the arrival queue has been emptied, and the function is ready to return, does it splice the whole temporary queue into the real input queue. Tested-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Paul Maloy 提交于
After the most recent changes, all access calls to a link which may entail addition of messages to the link's input queue are postpended by an explicit call to tipc_sk_rcv(), using a reference to the correct queue. This means that the potentially hazardous implicit delivery, using tipc_node_unlock() in combination with a binary flag and a cached queue pointer, now has become redundant. This commit removes this implicit delivery mechanism both for regular data messages and for binding table update messages. Tested-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Paul Maloy 提交于
In order to facilitate future improvements to the locking structure, we want to make resetting and establishing of links non-atomic. I.e., the functions tipc_node_link_up() and tipc_node_link_down() should be called from outside the node lock context, and grab/release the node lock themselves. This requires that we can freeze the link state from the moment it is set to RESETTING or PEER_RESET in one lock context until it is set to RESET or ESTABLISHING in a later context. The recently introduced link FSM makes this possible, so we are now ready to introduce the above change. This commit implements this. Tested-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Paul Maloy 提交于
The node lock is currently grabbed and and released in the function tipc_disc_rcv() in the file discover.c. As a preparation for the next commits, we need to move this node lock handling, along with the code area it is covering, to node.c. This commit introduces this change. Tested-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Paul Maloy 提交于
Until now, we have been handling link failover and synchronization by using an additional link state variable, "exec_mode". This variable is not independent of the link FSM state, something causing a risk of inconsistencies, apart from the fact that it clutters the code. The conditions are now in place to define a new link FSM that covers all existing use cases, including failover and synchronization, and eliminate the "exec_mode" field altogether. The FSM must also support non-atomic resetting of links, which will be introduced later. The new link FSM is shown below, with 7 states and 8 events. Only events leading to state change are shown as edges. +------------------------------------+ |RESET_EVT | | | | +--------------+ | +-----------------| SYNCHING |-----------------+ | |FAILURE_EVT +--------------+ PEER_RESET_EVT| | | A | | | | | | | | | | | | | | |SYNCH_ |SYNCH_ | | | |BEGIN_EVT |END_EVT | | | | | | | V | V V | +-------------+ +--------------+ +------------+ | | RESETTING |<---------| ESTABLISHED |--------->| PEER_RESET | | +-------------+ FAILURE_ +--------------+ PEER_ +------------+ | | EVT | A RESET_EVT | | | | | | | | | | | | | +--------------+ | | | RESET_EVT| |RESET_EVT |ESTABLISH_EVT | | | | | | | | | | | | V V | | | +-------------+ +--------------+ RESET_EVT| +--->| RESET |--------->| ESTABLISHING |<----------------+ +-------------+ PEER_ +--------------+ | A RESET_EVT | | | | | | | |FAILOVER_ |FAILOVER_ |FAILOVER_ |BEGIN_EVT |END_EVT |BEGIN_EVT | | | V | | +-------------+ | | FAILINGOVER |<----------------+ +-------------+ These changes are fully backwards compatible. Tested-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Paul Maloy 提交于
The implementation of the link FSM currently takes decisions about and sends out link protocol messages. This is unnecessary, since such actions are not the result of any link state change, and are even decided based on non-FSM state information ("silent_intv_cnt"). We now move the sending of unicast link protocol messages to the function tipc_link_timeout(), and the initial broadcast synchronization message to tipc_node_link_up(). The latter is done because a link instance should not need to know whether it is the first or second link to a destination. Such information is now restricted to and handled by the link aggregation layer in node.c Tested-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Paul Maloy 提交于
Link failover and synchronization have until now been handled by the links themselves, forcing them to have knowledge about and to access parallel links in order to make the two algorithms work correctly. In this commit, we move the control part of this functionality to the link aggregation level in node.c, which is the right location for this. As a result, the two algorithms become easier to follow, and the link implementation becomes simpler. Tested-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Paul Maloy 提交于
In the next commit, we will move link synch/failover orchestration to the link aggregation level. In order to do this, we first need to extend the node FSM with two more states, NODE_SYNCHING and NODE_FAILINGOVER, plus four new events to enter and leave those states. This commit introduces this change, without yet making use of it. The node FSM now looks as follows: +-----------------------------------------+ | PEER_DOWN_EVT| | | +------------------------+----------------+ | |SELF_DOWN_EVT | | | | | | | | +-----------+ +-----------+ | | |NODE_ | |NODE_ | | | +----------|FAILINGOVER|<---------|SYNCHING |------------+ | | |SELF_ +-----------+ FAILOVER_+-----------+ PEER_ | | | |DOWN_EVT | A BEGIN_EVT A | DOWN_EVT| | | | | | | | | | | | | | | | | | | | |FAILOVER_|FAILOVER_ |SYNCH_ |SYNCH_ | | | | |END_EVT |BEGIN_EVT |BEGIN_EVT|END_EVT | | | | | | | | | | | | | | | | | | | | | +--------------+ | | | | | +------->| SELF_UP_ |<-------+ | | | | +----------------| PEER_UP |------------------+ | | | | |SELF_DOWN_EVT +--------------+ PEER_DOWN_EVT| | | | | | A A | | | | | | | | | | | | | | PEER_UP_EVT| |SELF_UP_EVT | | | | | | | | | | | V V V | | V V V +------------+ +-----------+ +-----------+ +------------+ |SELF_DOWN_ | |SELF_UP_ | |PEER_UP_ | |PEER_DOWN | |PEER_LEAVING|<------|PEER_COMING| |SELF_COMING|------>|SELF_LEAVING| +------------+ SELF_ +-----------+ +-----------+ PEER_ +------------+ | DOWN_EVT A A DOWN_EVT | | | | | | | | | | SELF_UP_EVT| |PEER_UP_EVT | | | | | | | | | |PEER_DOWN_EVT +--------------+ SELF_DOWN_EVT| +------------------->| SELF_DOWN_ |<--------------------+ | PEER_DOWN | +--------------+ Tested-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Paul Maloy 提交于
In many cases the call order when a link is reset goes as follows: tipc_node_xx()->tipc_link_reset()->tipc_node_link_down() This is not the right order if we want the node to be in control, so in this commit we change the order to: tipc_node_xx()->tipc_node_link_down()->tipc_link_reset() The fact that tipc_link_reset() now is called from only one location with a well-defined state will also facilitate later simplifications of tipc_link_reset() and the link FSM. Tested-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Paul Maloy 提交于
In line with our effort to let the node level have full control over its links, we want to move all link reset calls from link.c to node.c. Some of the calls can be moved by simply moving the calling function, when this is the right thing to do. For the remaining calls we use the now established technique of returning a TIPC_LINK_DOWN_EVT flag from tipc_link_rcv(), whereafter we perform the reset call when the call returns. This change serves as a preparation for the coming commits. Tested-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Paul Maloy 提交于
The function tipc_link_activate() is redundant, since it mostly performs settings that have already been done in a preceding tipc_link_reset(). There are three exceptions to this: - The actual state change to TIPC_LINK_WORKING. This should anyway be done in the FSM, and not in a separate function. - Registration of the link with the bearer. This should be done by the node, since we don't want the link to have any knowledge about its specific bearer. - Call to tipc_node_link_up() for user access registration. With the new role distribution between link aggregation and link level this becomes the wrong call order; tipc_node_link_up() should instead be called directly as a result of a TIPC_LINK_UP event, hence by the node itself. This commit implements those changes. Tested-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 7月, 2015 1 次提交
-
-
由 Jon Maloy 提交于
In commit d999297c ("tipc: reduce locking scope during packet reception") we introduced a new function tipc_build_bcast_sync_msg(), which carries initial synchronization data between two nodes at first contact and at re-contact. In this function, we missed to add synchronization data, with the effect that the broadcast link endpoints will fail to synchronize correctly at re-contact between a running and a restarted node. All other cases work as intended. With this commit, we fix this bug. Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 7月, 2015 3 次提交
-
-
由 Jon Paul Maloy 提交于
When a message is received in a socket, one of the call chains tipc_sk_rcv()->tipc_sk_enqueue()->filter_rcv()(->tipc_sk_proto_rcv()) or tipc_sk_backlog_rcv()->filter_rcv()(->tipc_sk_proto_rcv()) are followed. At each of these levels we may encounter situations where the message may need to be rejected, or a new message produced for transfer back to the sender. Despite recent improvements, the current code for doing this is perceived as awkward and hard to follow. Leveraging the two previous commits in this series, we now introduce a more uniform handling of such situations. We let each of the functions in the chain itself produce/reverse the message to be returned to the sender, but also perform the actual forwarding. This simplifies the necessary logics within each function. Reviewed-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Paul Maloy 提交于
Currently, we use the code sequence if (msg_reverse()) tipc_link_xmit_skb() at numerous locations in socket.c. The preparation of arguments for these calls, as well as the sequence itself, makes the code unecessarily complex. In this commit, we introduce a new function, tipc_sk_respond(), that performs this call combination. We also replace some, but not yet all, of these explicit call sequences with calls to the new function. Notably, we let the function tipc_sk_proto_rcv() use the new function to directly send out PROBE_REPLY messages, instead of deferring this to the calling tipc_sk_rcv() function, as we do now. Reviewed-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Paul Maloy 提交于
The shortest TIPC message header, for cluster local CONNECTED messages, is 24 bytes long. With this format, the fields "dest_node" and "orig_node" are optimized away, since they in reality are redundant in this particular case. However, the absence of these fields leads to code inconsistencies that are difficult to handle in some cases, especially when we need to reverse or reject messages at the socket layer. In this commit, we concentrate the handling of the absent fields to one place, by letting the function tipc_msg_reverse() reallocate the buffer and expand the header to 32 bytes when necessary. This means that the socket code now can assume that the two previously absent fields are present in the header when a message needs to be rejected. This opens up for some further simplifications of the socket code. Reviewed-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 7月, 2015 1 次提交
-
-
由 Jon Paul Maloy 提交于
In commit d999297c ("tipc: reduce locking scope during packet reception") we introduced a new function tipc_link_proto_rcv(). This function contains a bug, so that it sometimes by error sends out a non-zero link priority value in created protocol messages. The bug may lead to an extra link reset at initial link establising with older nodes. This will never happen more than once, whereafter the link will work as intended. We fix this bug in this commit. Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 7月, 2015 2 次提交
-
-
由 Jon Paul Maloy 提交于
We convert packet/message reception according to the same principle we have been using for message sending and timeout handling: We move the function tipc_rcv() to node.c, hence handling the initial packet reception at the link aggregation level. The function grabs the node lock, selects the receiving link, and accesses it via a new call tipc_link_rcv(). This function appends buffers to the input queue for delivery upwards, but it may also append outgoing packets to the xmit queue, just as we do during regular message sending. The latter will happen when buffers are forwarded from the link backlog, or when retransmission is requested. Upon return of this function, and after having released the node lock, tipc_rcv() delivers/tranmsits the contents of those queues, but it may also perform actions such as link activation or reset, as indicated by the return flags from the link. This reduces the number of cpu cycles spent inside the node spinlock, and reduces contention on that lock. Reviewed-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Paul Maloy 提交于
The logics for determining when a node is permitted to establish and maintain contact with its peer node becomes non-trivial in the presence of multiple parallel links that may come and go independently. A known failure scenario is that one endpoint registers both its links to the peer lost, cleans up it binding table, and prepares for a table update once contact is re-establihed, while the other endpoint may see its links reset and re-established one by one, hence seeing no need to re-synchronize the binding table. To avoid this, a node must not allow re-establishing contact until it has confirmation that even the peer has lost both links. Currently, the mechanism for handling this consists of setting and resetting two state flags from different locations in the code. This solution is hard to understand and maintain. A closer analysis even reveals that it is not completely safe. In this commit we do instead introduce an FSM that keeps track of the conditions for when the node can establish and maintain links. It has six states and four events, and is strictly based on explicit knowledge about the own node's and the peer node's contact states. Only events leading to state change are shown as edges in the figure below. +--------------+ | SELF_UP/ | +---------------->| PEER_COMING |-----------------+ SELF_ | +--------------+ |PEER_ ESTBL_ | | |ESTBL_ CONTACT| SELF_LOST_CONTACT | |CONTACT | v | | +--------------+ | | PEER_ | SELF_DOWN/ | SELF_ | | LOST_ +--| PEER_LEAVING |<--+ LOST_ v +-------------+ CONTACT | +--------------+ | CONTACT +-----------+ | SELF_DOWN/ |<----------+ +----------| SELF_UP/ | | PEER_DOWN |<----------+ +----------| PEER_UP | +-------------+ SELF_ | +--------------+ | PEER_ +-----------+ | LOST_ +--| SELF_LEAVING/|<--+ LOST_ A | CONTACT | PEER_DOWN | CONTACT | | +--------------+ | | A | PEER_ | PEER_LOST_CONTACT | |SELF_ ESTBL_ | | |ESTBL_ CONTACT| +--------------+ |CONTACT +---------------->| PEER_UP/ |-----------------+ | SELF_COMING | +--------------+ Reviewed-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-