1. 12 7月, 2016 2 次提交
    • J
      tipc: ensure correct broadcast send buffer release when peer is lost · a71eb720
      Jon Paul Maloy 提交于
      After a new receiver peer has been added to the broadcast transmission
      link, we allow immediate transmission of new broadcast packets, trusting
      that the new peer will not accept the packets until it has received the
      previously sent unicast broadcast initialiation message. In the same
      way, the sender must not accept any acknowledges until it has itself
      received the broadcast initialization from the peer, as well as
      confirmation of the reception of its own initialization message.
      
      Furthermore, when a receiver peer goes down, the sender has to produce
      the missing acknowledges from the lost peer locally, in order ensure
      correct release of the buffers that were expected to be acknowledged by
      the said peer.
      
      In a highly stressed system we have observed that contact with a peer
      may come up and be lost before the above mentioned broadcast initial-
      ization and confirmation have been received. This leads to the locally
      produced acknowledges being rejected, and the non-acknowledged buffers
      to linger in the broadcast link transmission queue until it fills up
      and the link goes into permanent congestion.
      
      In this commit, we remedy this by temporarily setting the corresponding
      broadcast receive link state to ESTABLISHED and the 'bc_peer_is_up'
      state to true before we issue the local acknowledges. This ensures that
      those acknowledges will always be accepted. The mentioned state values
      are restored immediately afterwards when the link is reset.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a71eb720
    • J
      tipc: extend broadcast link initialization criteria · 2d18ac4b
      Jon Paul Maloy 提交于
      At first contact between two nodes, an endpoint might sometimes have
      time to send out a LINK_PROTOCOL/STATE packet before it has received
      the broadcast initialization packet from the peer, i.e., before it has
      received a valid broadcast packet number to add to the 'bc_ack' field
      of the protocol message.
      
      This means that the peer endpoint will receive a protocol packet with an
      invalid broadcast acknowledge value of 0. Under unlucky circumstances
      this may lead to the original, already received acknowledge value being
      overwritten, so that the whole broadcast link goes stale after a while.
      
      We fix this by delaying the setting of the link field 'bc_peer_is_up'
      until we know that the peer really has received our own broadcast
      initialization message. The latter is always sent out as the first
      unicast message on a link, and always with seqeunce number 1. Because
      of this, we only need to look for a non-zero unicast acknowledge value
      in the arriving STATE messages, and once that is confirmed we know we
      are safe and can set the mentioned field. Before this moment, we must
      ignore all broadcast acknowledges from the peer.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d18ac4b
  2. 02 7月, 2016 1 次提交
    • R
      tipc: fix nl compat regression for link statistics · 55e77a3e
      Richard Alpe 提交于
      Fix incorrect use of nla_strlcpy() where the first NLA_HDRLEN bytes
      of the link name where left out.
      
      Making the output of tipc-config -ls look something like:
      Link statistics:
      dcast-link
      1:data0-1.1.2:data0
      1:data0-1.1.3:data0
      
      Also, for the record, the patch that introduce this regression
      claims "Sending the whole object out can cause a leak". Which isn't
      very likely as this is a compat layer, where the data we are parsing
      is generated by us and we know the string to be NULL terminated. But
      you can of course never be to secure.
      
      Fixes: 5d2be142 (tipc: fix an infoleak in tipc_nl_compat_link_dump)
      Signed-off-by: NRichard Alpe <richard.alpe@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55e77a3e
  3. 23 6月, 2016 1 次提交
    • J
      tipc: unclone unbundled buffers before forwarding · 27777daa
      Jon Paul Maloy 提交于
      When extracting an individual message from a received "bundle" buffer,
      we just create a clone of the base buffer, and adjust it to point into
      the right position of the linearized data area of the latter. This works
      well for regular message reception, but during periods of extremely high
      load it may happen that an extracted buffer, e.g, a connection probe, is
      reversed and forwarded through an external interface while the preceding
      extracted message is still unhandled. When this happens, the header or
      data area of the preceding message will be partially overwritten by a
      MAC header, leading to unpredicatable consequences, such as a link
      reset.
      
      We now fix this by ensuring that the msg_reverse() function never
      returns a cloned buffer, and that the returned buffer always contains
      sufficient valid head and tail room to be forwarded.
      Reported-by: NErik Hugne <erik.hugne@gmail.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27777daa
  4. 18 6月, 2016 1 次提交
    • J
      tipc: fix socket timer deadlock · f1d048f2
      Jon Paul Maloy 提交于
      We sometimes observe a 'deadly embrace' type deadlock occurring
      between mutually connected sockets on the same node. This happens
      when the one-hour peer supervision timers happen to expire
      simultaneously in both sockets.
      
      The scenario is as follows:
      
      CPU 1:                          CPU 2:
      --------                        --------
      tipc_sk_timeout(sk1)            tipc_sk_timeout(sk2)
        lock(sk1.slock)                 lock(sk2.slock)
        msg_create(probe)               msg_create(probe)
        unlock(sk1.slock)               unlock(sk2.slock)
        tipc_node_xmit_skb()            tipc_node_xmit_skb()
          tipc_node_xmit()                tipc_node_xmit()
            tipc_sk_rcv(sk2)                tipc_sk_rcv(sk1)
              lock(sk2.slock)                 lock((sk1.slock)
              filter_rcv()                    filter_rcv()
                tipc_sk_proto_rcv()             tipc_sk_proto_rcv()
                  msg_create(probe_rsp)           msg_create(probe_rsp)
                  tipc_sk_respond()               tipc_sk_respond()
                    tipc_node_xmit_skb()            tipc_node_xmit_skb()
                      tipc_node_xmit()                tipc_node_xmit()
                        tipc_sk_rcv(sk1)                tipc_sk_rcv(sk2)
                          lock((sk1.slock)                lock((sk2.slock)
                          ===> DEADLOCK                   ===> DEADLOCK
      
      Further analysis reveals that there are three different locations in the
      socket code where tipc_sk_respond() is called within the context of the
      socket lock, with ensuing risk of similar deadlocks.
      
      We now solve this by passing a buffer queue along with all upcalls where
      sk_lock.slock may potentially be held. Response or rejected message
      buffers are accumulated into this queue instead of being sent out
      directly, and only sent once we know we are safely outside the slock
      context.
      Reported-by: NGUNA <gbalasun@gmail.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f1d048f2
  5. 16 6月, 2016 2 次提交
    • Y
      tipc: eliminate uninitialized variable warning · c91522f8
      Ying Xue 提交于
      net/tipc/link.c: In function ‘tipc_link_timeout’:
      net/tipc/link.c:744:28: warning: ‘mtyp’ may be used uninitialized in this function [-Wuninitialized]
      
      Fixes: 42b18f60 ("tipc: refactor function tipc_link_timeout()")
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c91522f8
    • Y
      tipc: fix suspicious RCU usage · 66d95b67
      Ying Xue 提交于
      When run tipcTS&tipcTC test suite, the following complaint appears:
      
      [   56.926168] ===============================
      [   56.926169] [ INFO: suspicious RCU usage. ]
      [   56.926171] 4.7.0-rc1+ #160 Not tainted
      [   56.926173] -------------------------------
      [   56.926174] net/tipc/bearer.c:408 suspicious rcu_dereference_protected() usage!
      [   56.926175]
      [   56.926175] other info that might help us debug this:
      [   56.926175]
      [   56.926177]
      [   56.926177] rcu_scheduler_active = 1, debug_locks = 1
      [   56.926179] 3 locks held by swapper/4/0:
      [   56.926180]  #0:  (((&req->timer))){+.-...}, at: [<ffffffff810e79b5>] call_timer_fn+0x5/0x340
      [   56.926203]  #1:  (&(&req->lock)->rlock){+.-...}, at: [<ffffffffa000c29b>] disc_timeout+0x1b/0xd0 [tipc]
      [   56.926212]  #2:  (rcu_read_lock){......}, at: [<ffffffffa00055e0>] tipc_bearer_xmit_skb+0xb0/0x2e0 [tipc]
      [   56.926218]
      [   56.926218] stack backtrace:
      [   56.926221] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.7.0-rc1+ #160
      [   56.926222] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      [   56.926224]  0000000000000000 ffff880016803d28 ffffffff813c4423 ffff8800154252c0
      [   56.926227]  0000000000000001 ffff880016803d58 ffffffff810b7512 ffff8800124d8120
      [   56.926230]  ffff880013f8a160 ffff8800132b5ccc ffff8800124d8120 ffff880016803d88
      [   56.926234] Call Trace:
      [   56.926235]  <IRQ>  [<ffffffff813c4423>] dump_stack+0x67/0x94
      [   56.926250]  [<ffffffff810b7512>] lockdep_rcu_suspicious+0xe2/0x120
      [   56.926256]  [<ffffffffa00051f1>] tipc_l2_send_msg+0x131/0x1c0 [tipc]
      [   56.926261]  [<ffffffffa000567c>] tipc_bearer_xmit_skb+0x14c/0x2e0 [tipc]
      [   56.926266]  [<ffffffffa00055e0>] ? tipc_bearer_xmit_skb+0xb0/0x2e0 [tipc]
      [   56.926273]  [<ffffffffa000c280>] ? tipc_disc_init_msg+0x1f0/0x1f0 [tipc]
      [   56.926278]  [<ffffffffa000c280>] ? tipc_disc_init_msg+0x1f0/0x1f0 [tipc]
      [   56.926283]  [<ffffffffa000c2d6>] disc_timeout+0x56/0xd0 [tipc]
      [   56.926288]  [<ffffffff810e7a68>] call_timer_fn+0xb8/0x340
      [   56.926291]  [<ffffffff810e79b5>] ? call_timer_fn+0x5/0x340
      [   56.926296]  [<ffffffffa000c280>] ? tipc_disc_init_msg+0x1f0/0x1f0 [tipc]
      [   56.926300]  [<ffffffff810e8f4a>] run_timer_softirq+0x23a/0x390
      [   56.926306]  [<ffffffff810f89ff>] ? clockevents_program_event+0x7f/0x130
      [   56.926316]  [<ffffffff819727c3>] __do_softirq+0xc3/0x4a2
      [   56.926323]  [<ffffffff8106ba5a>] irq_exit+0x8a/0xb0
      [   56.926327]  [<ffffffff81972456>] smp_apic_timer_interrupt+0x46/0x60
      [   56.926331]  [<ffffffff81970a49>] apic_timer_interrupt+0x89/0x90
      [   56.926333]  <EOI>  [<ffffffff81027fda>] ? default_idle+0x2a/0x1a0
      [   56.926340]  [<ffffffff81027fd8>] ? default_idle+0x28/0x1a0
      [   56.926342]  [<ffffffff810289cf>] arch_cpu_idle+0xf/0x20
      [   56.926345]  [<ffffffff810adf0f>] default_idle_call+0x2f/0x50
      [   56.926347]  [<ffffffff810ae145>] cpu_startup_entry+0x215/0x3e0
      [   56.926353]  [<ffffffff81040ad9>] start_secondary+0xf9/0x100
      
      The warning appears as rtnl_dereference() is wrongly used in
      tipc_l2_send_msg() under RCU read lock protection. Instead the proper
      usage should be that rcu_dereference_rtnl() is called here.
      
      Fixes: 5b7066c3 ("tipc: stricter filtering of packets in bearer layer")
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66d95b67
  6. 03 6月, 2016 1 次提交
  7. 26 5月, 2016 1 次提交
  8. 20 5月, 2016 1 次提交
  9. 18 5月, 2016 1 次提交
  10. 17 5月, 2016 1 次提交
  11. 13 5月, 2016 1 次提交
    • J
      tipc: eliminate risk of double link_up events · e7142c34
      Jon Paul Maloy 提交于
      When an ACTIVATE or data packet is received in a link in state
      ESTABLISHING, the link does not immediately change state to
      ESTABLISHED, but does instead return a LINK_UP event to the caller,
      which will execute the state change in a different lock context.
      
      This non-atomic approach incurs a low risk that we may have two
      LINK_UP events pending simultaneously for the same link, resulting
      in the final part of the setup procedure being executed twice. The
      only potential harm caused by this it that we may see two LINK_UP
      events issued to subsribers of the topology server, something that
      may cause confusion.
      
      This commit eliminates this risk by checking if the link is already
      up before proceeding with the second half of the setup.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7142c34
  12. 04 5月, 2016 3 次提交
    • J
      tipc: redesign connection-level flow control · 10724cc7
      Jon Paul Maloy 提交于
      There are two flow control mechanisms in TIPC; one at link level that
      handles network congestion, burst control, and retransmission, and one
      at connection level which' only remaining task is to prevent overflow
      in the receiving socket buffer. In TIPC, the latter task has to be
      solved end-to-end because messages can not be thrown away once they
      have been accepted and delivered upwards from the link layer, i.e, we
      can never permit the receive buffer to overflow.
      
      Currently, this algorithm is message based. A counter in the receiving
      socket keeps track of number of consumed messages, and sends a dedicated
      acknowledge message back to the sender for each 256 consumed message.
      A counter at the sending end keeps track of the sent, not yet
      acknowledged messages, and blocks the sender if this number ever reaches
      512 unacknowledged messages. When the missing acknowledge arrives, the
      socket is then woken up for renewed transmission. This works well for
      keeping the message flow running, as it almost never happens that a
      sender socket is blocked this way.
      
      A problem with the current mechanism is that it potentially is very
      memory consuming. Since we don't distinguish between small and large
      messages, we have to dimension the socket receive buffer according
      to a worst-case of both. I.e., the window size must be chosen large
      enough to sustain a reasonable throughput even for the smallest
      messages, while we must still consider a scenario where all messages
      are of maximum size. Hence, the current fix window size of 512 messages
      and a maximum message size of 66k results in a receive buffer of 66 MB
      when truesize(66k) = 131k is taken into account. It is possible to do
      much better.
      
      This commit introduces an algorithm where we instead use 1024-byte
      blocks as base unit. This unit, always rounded upwards from the
      actual message size, is used when we advertise windows as well as when
      we count and acknowledge transmitted data. The advertised window is
      based on the configured receive buffer size in such a way that even
      the worst-case truesize/msgsize ratio always is covered. Since the
      smallest possible message size (from a flow control viewpoint) now is
      1024 bytes, we can safely assume this ratio to be less than four, which
      is the value we are now using.
      
      This way, we have been able to reduce the default receive buffer size
      from 66 MB to 2 MB with maintained performance.
      
      In order to keep this solution backwards compatible, we introduce a
      new capability bit in the discovery protocol, and use this throughout
      the message sending/reception path to always select the right unit.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10724cc7
    • J
      tipc: propagate peer node capabilities to socket layer · 60020e18
      Jon Paul Maloy 提交于
      During neighbor discovery, nodes advertise their capabilities as a bit
      map in a dedicated 16-bit field in the discovery message header. This
      bit map has so far only be stored in the node structure on the peer
      nodes, but we now see the need to keep a copy even in the socket
      structure.
      
      This commit adds this functionality.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60020e18
    • J
      tipc: re-enable compensation for socket receive buffer double counting · 7c8bcfb1
      Jon Paul Maloy 提交于
      In the refactoring commit d570d864 ("tipc: enqueue arrived buffers
      in socket in separate function") we did by accident replace the test
      
      if (sk->sk_backlog.len == 0)
           atomic_set(&tsk->dupl_rcvcnt, 0);
      
      with
      
      if (sk->sk_backlog.len)
           atomic_set(&tsk->dupl_rcvcnt, 0);
      
      This effectively disables the compensation we have for the double
      receive buffer accounting that occurs temporarily when buffers are
      moved from the backlog to the socket receive queue. Until now, this
      has gone unnoticed because of the large receive buffer limits we are
      applying, but becomes indispensable when we reduce this buffer limit
      later in this series.
      
      We now fix this by inverting the mentioned condition.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c8bcfb1
  13. 02 5月, 2016 2 次提交
    • H
      tipc: only process unicast on intended node · efe79050
      Hamish Martin 提交于
      We have observed complete lock up of broadcast-link transmission due to
      unacknowledged packets never being removed from the 'transmq' queue. This
      is traced to nodes having their ack field set beyond the sequence number
      of packets that have actually been transmitted to them.
      Consider an example where node 1 has sent 10 packets to node 2 on a
      link and node 3 has sent 20 packets to node 2 on another link. We
      see examples of an ack from node 2 destined for node 3 being treated as
      an ack from node 2 at node 1. This leads to the ack on the node 1 to node
      2 link being increased to 20 even though we have only sent 10 packets.
      When node 1 does get around to sending further packets, none of the
      packets with sequence numbers less than 21 are actually removed from the
      transmq.
      To resolve this we reinstate some code lost in commit d999297c ("tipc:
      reduce locking scope during packet reception") which ensures that only
      messages destined for the receiving node are processed by that node. This
      prevents the sequence numbers from getting out of sync and resolves the
      packet leakage, thereby resolving the broadcast-link transmission
      lock-ups we observed.
      
      While we are aware that this change only patches over a root problem that
      we still haven't identified, this is a sanity test that it is always
      legitimate to do. It will remain in the code even after we identify and
      fix the real problem.
      Reviewed-by: NChris Packham <chris.packham@alliedtelesis.co.nz>
      Reviewed-by: NJohn Thompson <john.thompson@alliedtelesis.co.nz>
      Signed-off-by: NHamish Martin <hamish.martin@alliedtelesis.co.nz>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      efe79050
    • J
      tipc: set 'active' state correctly for first established link · def22c47
      Jon Paul Maloy 提交于
      When we are displaying statistics for the first link established between
      two peers, it will always be presented as STANDBY although it in reality
      is ACTIVE.
      
      This happens because we forget to set the 'active' flag in the link
      instance at the moment it is established. Although this is a bug, it only
      has impact on the presentation view of the link, not on its actual
      functionality.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      def22c47
  14. 29 4月, 2016 1 次提交
  15. 25 4月, 2016 1 次提交
  16. 18 4月, 2016 1 次提交
  17. 16 4月, 2016 5 次提交
    • J
      tipc: let first message on link be a state message · 34b9cd64
      Jon Paul Maloy 提交于
      According to the link FSM, a received traffic packet can take a link
      from state ESTABLISHING to ESTABLISHED, but the link can still not be
      fully set up in one atomic operation. This means that even if the the
      very first packet on the link is a traffic packet with sequence number
      1 (one), it has to be dropped and retransmitted.
      
      This can be avoided if we let the mentioned packet be preceded by a
      LINK_PROTOCOL/STATE message, which takes up the endpoint before the
      arrival of the traffic.
      
      We add this small feature in this commit.
      
      This is a fully compatible change.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34b9cd64
    • J
      tipc: ensure that first packets on link are sent in order · de7e07f9
      Jon Paul Maloy 提交于
      In some link establishment scenarios we see that packet #2 may be sent
      out before packet #1, forcing the receiver to demand retransmission of
      the missing packet. This is harmless, but may cause confusion among
      people tracing the packet flow.
      
      Since this is extremely easy to fix, we do so by adding en extra send
      call to the bearer immediately after the link has come up.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de7e07f9
    • J
      tipc: refactor function tipc_link_timeout() · 42b18f60
      Jon Paul Maloy 提交于
      The function tipc_link_timeout() is unnecessary complex, and can
      easily be made more readable.
      
      We do that with this commit. The only functional change is that we
      remove a redundant test for whether the broadcast link is up or not.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      42b18f60
    • J
      tipc: reduce transmission rate of reset messages when link is down · 88e8ac70
      Jon Paul Maloy 提交于
      When a link is down, it will continuously try to re-establish contact
      with the peer by sending out a RESET or an ACTIVATE message at each
      timeout interval. The default value for this interval is currently
      375 ms. This is wasteful, and may become a problem in very large
      clusters with dozens or hundreds of nodes being down simultaneously.
      
      We now introduce a simple backoff algorithm for these cases. The
      first five messages are sent at default rate; thereafter a message
      is sent only each 16th timer interval.
      
      This will cover the vast majority of link recycling cases, since the
      endpoint starting last will transmit at the higher speed, and the link
      should normally be established well be before the rate needs to be
      reduced.
      
      The only case where we will see a degradation of link re-establishment
      times is when the endpoints remain intact, and a glitch in the
      transmission media is causing the link reset. We will then experience
      a worst-case re-establishing time of 6 seconds, something we deem
      acceptable.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88e8ac70
    • J
      tipc: guarantee peer bearer id exchange after reboot · 634696b1
      Jon Paul Maloy 提交于
      When a link endpoint is going down locally, e.g., because its interface
      is being stopped, it will spontaneously send out a RESET message to
      its peer, informing it about this fact. This saves the peer from
      detecting the failure via probing, and hence gives both speedier and
      less resource consuming failure detection on the peer side.
      
      According to the link FSM, a receiver of a RESET message, ignoring the
      reason for it, must now consider the sender ready to come back up, and
      starts periodically sending out ACTIVATE messages to the peer in order
      to re-establish the link. Also, according to the FSM, the receiver of
      an ACTIVATE message can now go directly to state ESTABLISHED and start
      sending regular traffic packets. This is a well-proven and robust FSM.
      
      However, in the case of a reboot, there is a small possibilty that link
      endpoint on the rebooted node may have been re-created with a new bearer
      identity between the moment it sent its (pre-boot) RESET and the moment
      it receives the ACTIVATE from the peer. The new bearer identity cannot
      be known by the peer according to this scenario, since traffic headers
      don't convey such information. This is a problem, because both endpoints
      need to know the correct value of the peer's bearer id at any moment in
      time in order to be able to produce correct link events for their users.
      
      The only way to guarantee this is to enforce a full setup message
      exchange (RESET + ACTIVATE) even after the reboot, since those messages
      carry the bearer idientity in their header.
      
      In this commit we do this by introducing and setting a "stopping" bit in
      the header of the spontaneously generated RESET messages, informing the
      peer that the sender will not be immediately ready to re-establish the
      link. A receiver seeing this bit must act as if this were a locally
      detected connectivity failure, and hence has to go through a full two-
      way setup message exchange before any link can be re-established.
      
      Although never reported, this problem seems to have always been around.
      
      This protocol addition is fully backwards compatible.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      634696b1
  18. 15 4月, 2016 1 次提交
    • P
      tipc: fix a race condition leading to subscriber refcnt bug · 333f7962
      Parthasarathy Bhuvaragan 提交于
      Until now, the requests sent to topology server are queued
      to a workqueue by the generic server framework.
      These messages are processed by worker threads and trigger the
      registered callbacks.
      To reduce latency on uniprocessor systems, explicit rescheduling
      is performed using cond_resched() after MAX_RECV_MSG_COUNT(25)
      messages.
      
      This implementation on SMP systems leads to an subscriber refcnt
      error as described below:
      When a worker thread yields by calling cond_resched() in a SMP
      system, a new worker is created on another CPU to process the
      pending workitem. Sometimes the sleeping thread wakes up before
      the new thread finishes execution.
      This breaks the assumption on ordering and being single threaded.
      The fault is more frequent when MAX_RECV_MSG_COUNT is lowered.
      
      If the first thread was processing subscription create and the
      second thread processing close(), the close request will free
      the subscriber and the create request oops as follows:
      
      [31.224137] WARNING: CPU: 2 PID: 266 at include/linux/kref.h:46 tipc_subscrb_rcv_cb+0x317/0x380         [tipc]
      [31.228143] CPU: 2 PID: 266 Comm: kworker/u8:1 Not tainted 4.5.0+ #97
      [31.228377] Workqueue: tipc_rcv tipc_recv_work [tipc]
      [...]
      [31.228377] Call Trace:
      [31.228377]  [<ffffffff812fbb6b>] dump_stack+0x4d/0x72
      [31.228377]  [<ffffffff8105a311>] __warn+0xd1/0xf0
      [31.228377]  [<ffffffff8105a3fd>] warn_slowpath_null+0x1d/0x20
      [31.228377]  [<ffffffffa0098067>] tipc_subscrb_rcv_cb+0x317/0x380 [tipc]
      [31.228377]  [<ffffffffa00a4984>] tipc_receive_from_sock+0xd4/0x130 [tipc]
      [31.228377]  [<ffffffffa00a439b>] tipc_recv_work+0x2b/0x50 [tipc]
      [31.228377]  [<ffffffff81071925>] process_one_work+0x145/0x3d0
      [31.246554] ---[ end trace c3882c9baa05a4fd ]---
      [31.248327] BUG: spinlock bad magic on CPU#2, kworker/u8:1/266
      [31.249119] BUG: unable to handle kernel NULL pointer dereference at 0000000000000428
      [31.249323] IP: [<ffffffff81099d0c>] spin_dump+0x5c/0xe0
      [31.249323] PGD 0
      [31.249323] Oops: 0000 [#1] SMP
      
      In this commit, we
      - rename tipc_conn_shutdown() to tipc_conn_release().
      - move connection release callback execution from tipc_close_conn()
        to a new function tipc_sock_release(), which is executed before
        we free the connection.
      Thus we release the subscriber during connection release procedure
      rather than connection shutdown procedure.
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      333f7962
  19. 14 4月, 2016 1 次提交
  20. 12 4月, 2016 2 次提交
  21. 08 4月, 2016 2 次提交
    • J
      tipc: stricter filtering of packets in bearer layer · 5b7066c3
      Jon Paul Maloy 提交于
      Resetting a bearer/interface, with the consequence of resetting all its
      pertaining links, is not an atomic action. This becomes particularly
      evident in very large clusters, where a lot of traffic may happen on the
      remaining links while we are busy shutting them down. In extreme cases,
      we may even see links being re-created and re-established before we are
      finished with the job.
      
      To solve this, we now introduce a solution where we temporarily detach
      the bearer from the interface when the bearer is reset. This inhibits
      all packet reception, while sending still is possible. For the latter,
      we use the fact that the device's user pointer now is zero to filter out
      which packets can be sent during this situation; i.e., outgoing RESET
      messages only.  This filtering serves to speed up the neighbors'
      detection of the loss event, and saves us from unnecessary probing.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b7066c3
    • J
      tipc: eliminate buffer leak in bearer layer · 4e801fa1
      Jon Paul Maloy 提交于
      When enabling a bearer we create a 'neigbor discoverer' instance by
      calling the function tipc_disc_create() before the bearer is actually
      registered in the list of enabled bearers. Because of this, the very
      first discovery broadcast message, created by the mentioned function,
      is lost, since it cannot find any valid bearer to use. Furthermore,
      the used send function, tipc_bearer_xmit_skb() does not free the given
      buffer when it cannot find a  bearer, resulting in the leak of exactly
      one send buffer each time a bearer is enabled.
      
      This commit fixes this problem by introducing two changes:
      
      1) Instead of attemting to send the discovery message directly, we let
         tipc_disc_create() return the discovery buffer to the calling
         function, tipc_enable_bearer(), so that the latter can send it
         when the enabling sequence is finished.
      
      2) In tipc_bearer_xmit_skb(), as well as in the two other transmit
         functions at the bearer layer, we now free the indicated buffer or
         buffer chain when a valid bearer cannot be found.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e801fa1
  22. 15 3月, 2016 1 次提交
  23. 12 3月, 2016 1 次提交
  24. 08 3月, 2016 1 次提交
  25. 07 3月, 2016 5 次提交