1. 11 6月, 2015 1 次提交
  2. 30 4月, 2015 2 次提交
    • J
      tipc: fix problem with parallel link synchronization mechanism · 0d699f28
      Jon Paul Maloy 提交于
      Currently, we try to accumulate arrived packets in the links's
      'deferred' queue during the parallel link syncronization phase.
      
      This entails two problems:
      
      - With an unlucky combination of arriving packets the algorithm
        may go into a lockstep with the out-of-sequence handling function,
        where the synch mechanism is adding a packet to the deferred queue,
        while the out-of-sequence handling is retrieving it again, thus
        ending up in a loop inside the node_lock scope.
      
      - Even if this is avoided, the link will very often send out
        unnecessary protocol messages, in the worst case leading to
        redundant retransmissions.
      
      We fix this by just dropping arriving packets on the upcoming link
      during the synchronization phase, thus relying on the retransmission
      protocol to resolve the situation once the two links have arrived to
      a synchronized state.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d699f28
    • N
      tipc: remove wrong use of NLM_F_MULTI · f2f67390
      Nicolas Dichtel 提交于
      NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact,
      it is sent only at the end of a dump.
      
      Libraries like libnl will wait forever for NLMSG_DONE.
      
      Fixes: 35b9dd76 ("tipc: add bearer get/dump to new netlink api")
      Fixes: 7be57fc6 ("tipc: add link get/dump to new netlink api")
      Fixes: 46f15c67 ("tipc: add media get/dump to new netlink api")
      CC: Richard Alpe <richard.alpe@ericsson.com>
      CC: Jon Maloy <jon.maloy@ericsson.com>
      CC: Ying Xue <ying.xue@windriver.com>
      CC: tipc-discussion@lists.sourceforge.net
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f2f67390
  3. 23 4月, 2015 3 次提交
    • E
      tipc: fix node refcount issue · 73a31737
      Erik Hugne 提交于
      When link statistics is dumped over netlink, we iterate over
      the list of peer nodes and append each links statistics to
      the netlink msg. In the case where the dump is resumed after
      filling up a nlmsg, the node refcnt is decremented without
      having been incremented previously which may cause the node
      reference to be freed. When this happens, the following
      info/stacktrace will be generated, followed by a crash or
      undefined behavior.
      We fix this by removing the erroneous call to tipc_node_put
      inside the loop that iterates over nodes.
      
      [  384.312303] INFO: trying to register non-static key.
      [  384.313110] the code is fine but needs lockdep annotation.
      [  384.313290] turning off the locking correctness validator.
      [  384.313290] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.0.0+ #13
      [  384.313290] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  384.313290]  ffff88003c6d0290 ffff88003cc03ca8 ffffffff8170adf1 0000000000000007
      [  384.313290]  ffffffff82728730 ffff88003cc03d38 ffffffff810a6a6d 00000000001d7200
      [  384.313290]  ffff88003c6d0ab0 ffff88003cc03ce8 0000000000000285 0000000000000001
      [  384.313290] Call Trace:
      [  384.313290]  <IRQ>  [<ffffffff8170adf1>] dump_stack+0x4c/0x65
      [  384.313290]  [<ffffffff810a6a6d>] __lock_acquire+0xf3d/0xf50
      [  384.313290]  [<ffffffff810a7375>] lock_acquire+0xd5/0x290
      [  384.313290]  [<ffffffffa0043e8c>] ? link_timeout+0x1c/0x170 [tipc]
      [  384.313290]  [<ffffffffa0043e70>] ? link_state_event+0x4e0/0x4e0 [tipc]
      [  384.313290]  [<ffffffff81712890>] _raw_spin_lock_bh+0x40/0x80
      [  384.313290]  [<ffffffffa0043e8c>] ? link_timeout+0x1c/0x170 [tipc]
      [  384.313290]  [<ffffffffa0043e8c>] link_timeout+0x1c/0x170 [tipc]
      [  384.313290]  [<ffffffff810c4698>] call_timer_fn+0xb8/0x490
      [  384.313290]  [<ffffffff810c45e0>] ? process_timeout+0x10/0x10
      [  384.313290]  [<ffffffff810c5a2c>] run_timer_softirq+0x21c/0x420
      [  384.313290]  [<ffffffffa0043e70>] ? link_state_event+0x4e0/0x4e0 [tipc]
      [  384.313290]  [<ffffffff8105a954>] __do_softirq+0xf4/0x630
      [  384.313290]  [<ffffffff8105afdd>] irq_exit+0x5d/0x60
      [  384.313290]  [<ffffffff8103ade1>] smp_apic_timer_interrupt+0x41/0x50
      [  384.313290]  [<ffffffff817144a0>] apic_timer_interrupt+0x70/0x80
      [  384.313290]  <EOI>  [<ffffffff8100db10>] ? default_idle+0x20/0x210
      [  384.313290]  [<ffffffff8100db0e>] ? default_idle+0x1e/0x210
      [  384.313290]  [<ffffffff8100e61a>] arch_cpu_idle+0xa/0x10
      [  384.313290]  [<ffffffff81099803>] cpu_startup_entry+0x2c3/0x530
      [  384.313290]  [<ffffffff810d2893>] ? clockevents_register_device+0x113/0x200
      [  384.313290]  [<ffffffff81038b0f>] start_secondary+0x13f/0x170
      
      Fixes: 8a0f6ebe ("tipc: involve reference counter for node structure")
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73a31737
    • E
      tipc: fix random link reset problem · 9871b27f
      Erik Hugne 提交于
      In the function tipc_sk_rcv(), the stack variable 'err'
      is only initialized to TIPC_ERR_NO_PORT for the first
      iteration over the link input queue. If a chain of messages
      are received from a link, failure to lookup the socket for
      any but the first message will cause the message to bounce back
      out on a random link.
      We fix this by properly initializing err.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9871b27f
    • Y
      tipc: fix topology server broken issue · def81f69
      Ying Xue 提交于
      When a new topology server is launched in a new namespace, its
      listening socket is inserted into the "init ns" namespace's socket
      hash table rather than the one owned by the new namespace. Although
      the socket's namespace is forcedly changed to the new namespace later,
      the socket is still stored in the socket hash table of "init ns"
      namespace. When a client created in the new namespace connects
      its own topology server, the connection is failed as its server's
      socket could not be found from its own namespace's socket table.
      
      If __sock_create() instead of original sock_create_kern() is used
      to create the server's socket through specifying an expected namesapce,
      the socket will be inserted into the specified namespace's socket
      table, thereby avoiding to the topology server broken issue.
      
      Fixes: 76100a8a ("tipc: fix netns refcnt leak")
      Reported-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      def81f69
  4. 08 4月, 2015 1 次提交
  5. 03 4月, 2015 3 次提交
    • J
      tipc: simplify link mtu negotiation · ed193ece
      Jon Paul Maloy 提交于
      When a link is being established, the two endpoints advertise their
      respective interface MTU in the transmitted RESET and ACTIVATE messages.
      If there is any difference, the lower of the two MTUs will be selected
      for use by both endpoints.
      
      However, as a remnant of earlier attempts to introduce TIPC level
      routing. there also exists an MTU discovery mechanism. If an intermediate
      node has a lower MTU than the two endpoints, they will discover this
      through a bisectional approach, and finally adopt this MTU for common use.
      
      Since there is no TIPC level routing, and probably never will be,
      this mechanism doesn't make any sense, and only serves to make the
      link level protocol unecessarily complex.
      
      In this commit, we eliminate the MTU discovery algorithm,and fall back
      to the simple MTU advertising approach. This change is fully backwards
      compatible.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed193ece
    • J
      tipc: eliminate delayed link deletion at link failover · dff29b1a
      Jon Paul Maloy 提交于
      When a bearer is disabled manually, all its links have to be reset
      and deleted. However, if there is a remaining, parallel link ready
      to take over a deleted link's traffic, we currently delay the delete
      of the removed link until the failover procedure is finished. This
      is because the remaining link needs to access state from the reset
      link, such as the last received packet number, and any partially
      reassembled buffer, in order to perform a successful failover.
      
      In this commit, we do instead move the state data over to the new
      link, so that it can fulfill the procedure autonomously, without
      accessing any data on the old link. This means that we can now
      proceed and delete all pertaining links immediately when a bearer
      is disabled. This saves us from some unnecessary complexity in such
      situations.
      
      We also choose to change the confusing definitions CHANGEOVER_PROTOCOL,
      ORIGINAL_MSG and DUPLICATE_MSG to the more descriptive TUNNEL_PROTOCOL,
      FAILOVER_MSG and SYNCH_MSG respectively.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dff29b1a
    • J
      tipc: drop tunneled packet duplicates at reception · 2da71425
      Jon Paul Maloy 提交于
      In commit 8b4ed863
      ("tipc: eliminate race condition at dual link establishment")
      we introduced a parallel link synchronization mechanism that
      guarentees sequential delivery even for users switching from
      an old to a newly established link. The new mechanism makes it
      unnecessary to deliver the tunneled duplicate packets back to
      the old link, as we are currently doing. It is now sufficient
      to use the last tunneled packet's inner sequence number as
      synchronization point between the two parallel links, whereafter
      it can be dropped.
      
      In this commit, we drop the duplicate packets arriving on the new
      link, after updating the synchronization point at each new arrival.
      
      Although it would now have been sufficient for the other endpoint
      to only tunnel the last packet in its send queue, and not the
      entire queue, we must still do this to maintain compatibility
      with older nodes.
      
      This commit makes it possible to get rid if some complex
      interaction between the two parallel links.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2da71425
  6. 01 4月, 2015 1 次提交
    • Y
      tipc: fix a slab object leak · 7e436905
      Ying Xue 提交于
      When remove TIPC module, there is a warning to remind us that a slab
      object is leaked like:
      
      root@localhost:~# rmmod tipc
      [   19.056226] =============================================================================
      [   19.057549] BUG TIPC (Not tainted): Objects remaining in TIPC on kmem_cache_close()
      [   19.058736] -----------------------------------------------------------------------------
      [   19.058736]
      [   19.060287] INFO: Slab 0xffffea0000519a00 objects=23 used=1 fp=0xffff880014668b00 flags=0x100000000004080
      [   19.061915] INFO: Object 0xffff880014668000 @offset=0
      [   19.062717] kmem_cache_destroy TIPC: Slab cache still has objects
      
      This is because the listening socket of TIPC topology server is not
      closed before TIPC proto handler is unregistered with proto_unregister().
      However, as the socket is closed in tipc_exit_net() which is called by
      unregister_pernet_subsys() during unregistering TIPC namespace operation,
      the warning can be eliminated if calling unregister_pernet_subsys() is
      moved before calling proto_unregister().
      
      Fixes: e05b31f4 ("tipc: make tipc socket support net namespace")
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e436905
  7. 30 3月, 2015 3 次提交
    • J
      tipc: fix two bugs in secondary destination lookup · d482994f
      Jon Paul Maloy 提交于
      A message sent to a node after a successful name table lookup may still
      find that the destination socket has disappeared, because distribution
      of name table updates is non-atomic. If so, the message will be rejected
      back to the sender with error code TIPC_ERR_NO_PORT. If the source
      socket of the message has disappeared in the meantime, the message
      should be dropped.
      
      However, in the currrent code, the message will instead be subject to an
      unwanted tertiary lookup, because the function tipc_msg_lookup_dest()
      doesn't check if there is an error code present in the message before
      performing the lookup. In the worst case, the message may now find the
      old destination again, and be redirected once more, instead of being
      dropped directly as it should be.
      
      A second bug in this function is that the "prev_node" field in the message
      is not updated after successful lookup, something that may have
      unpredictable consequences.
      
      The problems arising from those bugs occur very infrequently.
      
      The third change in this function; the test on msg_reroute_msg_cnt() is
      purely cosmetic, reflecting that the returned value never can be negative.
      
      This commit corrects the two bugs described above.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d482994f
    • Y
      tipc: involve reference counter for node structure · 8a0f6ebe
      Ying Xue 提交于
      TIPC node hash node table is protected with rcu lock on read side.
      tipc_node_find() is used to look for a node object with node address
      through iterating the hash node table. As the entire process of what
      tipc_node_find() traverses the table is guarded with rcu read lock,
      it's safe for us. However, when callers use the node object returned
      by tipc_node_find(), there is no rcu read lock applied. Therefore,
      this is absolutely unsafe for callers of tipc_node_find().
      
      Now we introduce a reference counter for node structure. Before
      tipc_node_find() returns node object to its caller, it first increases
      the reference counter. Accordingly, after its caller used it up,
      it decreases the counter again. This can prevent a node being used by
      one thread from being freed by another thread.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericson.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a0f6ebe
    • Y
      tipc: fix potential deadlock when all links are reset · b952b2be
      Ying Xue 提交于
      [   60.988363] ======================================================
      [   60.988754] [ INFO: possible circular locking dependency detected ]
      [   60.989152] 3.19.0+ #194 Not tainted
      [   60.989377] -------------------------------------------------------
      [   60.989781] swapper/3/0 is trying to acquire lock:
      [   60.990079]  (&(&n_ptr->lock)->rlock){+.-...}, at: [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc]
      [   60.990743]
      [   60.990743] but task is already holding lock:
      [   60.991106]  (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
      [   60.991738]
      [   60.991738] which lock already depends on the new lock.
      [   60.991738]
      [   60.992174]
      [   60.992174] the existing dependency chain (in reverse order) is:
      [   60.992174]
      -> #1 (&(&bclink->lock)->rlock){+.-...}:
      [   60.992174]        [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140
      [   60.992174]        [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
      [   60.992174]        [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
      [   60.992174]        [<ffffffffa0000f57>] tipc_bclink_add_node+0x97/0xf0 [tipc]
      [   60.992174]        [<ffffffffa0011815>] tipc_node_link_up+0xf5/0x110 [tipc]
      [   60.992174]        [<ffffffffa0007783>] link_state_event+0x2b3/0x4f0 [tipc]
      [   60.992174]        [<ffffffffa00193c0>] tipc_link_proto_rcv+0x24c/0x418 [tipc]
      [   60.992174]        [<ffffffffa0008857>] tipc_rcv+0x827/0xac0 [tipc]
      [   60.992174]        [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc]
      [   60.992174]        [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980
      [   60.992174]        [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70
      [   60.992174]        [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130
      [   60.992174]        [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0
      [   60.992174]        [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490
      [   60.992174]        [<ffffffff8155c1b7>] e1000_clean+0x267/0x990
      [   60.992174]        [<ffffffff81647b60>] net_rx_action+0x150/0x360
      [   60.992174]        [<ffffffff8105ec43>] __do_softirq+0x123/0x360
      [   60.992174]        [<ffffffff8105f12e>] irq_exit+0x8e/0xb0
      [   60.992174]        [<ffffffff8179f9f5>] do_IRQ+0x65/0x110
      [   60.992174]        [<ffffffff8179da6f>] ret_from_intr+0x0/0x13
      [   60.992174]        [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20
      [   60.992174]        [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0
      [   60.992174]        [<ffffffff81033cda>] start_secondary+0x13a/0x150
      [   60.992174]
      -> #0 (&(&n_ptr->lock)->rlock){+.-...}:
      [   60.992174]        [<ffffffff810a8f7d>] __lock_acquire+0x163d/0x1ca0
      [   60.992174]        [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140
      [   60.992174]        [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
      [   60.992174]        [<ffffffffa0006dca>] tipc_link_retransmit+0x1aa/0x240 [tipc]
      [   60.992174]        [<ffffffffa0001e11>] tipc_bclink_rcv+0x611/0x640 [tipc]
      [   60.992174]        [<ffffffffa0008646>] tipc_rcv+0x616/0xac0 [tipc]
      [   60.992174]        [<ffffffffa0002ca3>] tipc_l2_rcv_msg+0x73/0xd0 [tipc]
      [   60.992174]        [<ffffffff81646e66>] __netif_receive_skb_core+0x746/0x980
      [   60.992174]        [<ffffffff816470c1>] __netif_receive_skb+0x21/0x70
      [   60.992174]        [<ffffffff81647295>] netif_receive_skb_internal+0x35/0x130
      [   60.992174]        [<ffffffff81648218>] napi_gro_receive+0x158/0x1d0
      [   60.992174]        [<ffffffff81559e05>] e1000_clean_rx_irq+0x155/0x490
      [   60.992174]        [<ffffffff8155c1b7>] e1000_clean+0x267/0x990
      [   60.992174]        [<ffffffff81647b60>] net_rx_action+0x150/0x360
      [   60.992174]        [<ffffffff8105ec43>] __do_softirq+0x123/0x360
      [   60.992174]        [<ffffffff8105f12e>] irq_exit+0x8e/0xb0
      [   60.992174]        [<ffffffff8179f9f5>] do_IRQ+0x65/0x110
      [   60.992174]        [<ffffffff8179da6f>] ret_from_intr+0x0/0x13
      [   60.992174]        [<ffffffff8100de9f>] arch_cpu_idle+0xf/0x20
      [   60.992174]        [<ffffffff8109dfa6>] cpu_startup_entry+0x2f6/0x3f0
      [   60.992174]        [<ffffffff81033cda>] start_secondary+0x13a/0x150
      [   60.992174]
      [   60.992174] other info that might help us debug this:
      [   60.992174]
      [   60.992174]  Possible unsafe locking scenario:
      [   60.992174]
      [   60.992174]        CPU0                    CPU1
      [   60.992174]        ----                    ----
      [   60.992174]   lock(&(&bclink->lock)->rlock);
      [   60.992174]                                lock(&(&n_ptr->lock)->rlock);
      [   60.992174]                                lock(&(&bclink->lock)->rlock);
      [   60.992174]   lock(&(&n_ptr->lock)->rlock);
      [   60.992174]
      [   60.992174]  *** DEADLOCK ***
      [   60.992174]
      [   60.992174] 3 locks held by swapper/3/0:
      [   60.992174]  #0:  (rcu_read_lock){......}, at: [<ffffffff81646791>] __netif_receive_skb_core+0x71/0x980
      [   60.992174]  #1:  (rcu_read_lock){......}, at: [<ffffffffa0002c35>] tipc_l2_rcv_msg+0x5/0xd0 [tipc]
      [   60.992174]  #2:  (&(&bclink->lock)->rlock){+.-...}, at: [<ffffffffa00004be>] tipc_bclink_lock+0x8e/0xa0 [tipc]
      [   60.992174]
      
      The correct the sequence of grabbing n_ptr->lock and bclink->lock
      should be that the former is first held and the latter is then taken,
      which exactly happened on CPU1. But especially when the retransmission
      of broadcast link is failed, bclink->lock is first held in
      tipc_bclink_rcv(), and n_ptr->lock is taken in link_retransmit_failure()
      called by tipc_link_retransmit() subsequently, which is demonstrated on
      CPU0. As a result, deadlock occurs.
      
      If the order of holding the two locks happening on CPU0 is reversed, the
      deadlock risk will be relieved. Therefore, the node lock taken in
      link_retransmit_failure() originally is moved to tipc_bclink_rcv()
      so that it's obtained before bclink lock. But the precondition of
      the adjustment of node lock is that responding to bclink reset event
      must be moved from tipc_bclink_unlock() to tipc_node_unlock().
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b952b2be
  8. 26 3月, 2015 3 次提交
    • J
      tipc: eliminate race condition at dual link establishment · 8b4ed863
      Jon Paul Maloy 提交于
      Despite recent improvements, the establishment of dual parallel
      links still has a small glitch where messages can bypass each
      other. When the second link in a dual-link configuration is
      established, part of the first link's traffic will be steered over
      to the new link. Although we do have a mechanism to ensure that
      packets sent before and after the establishment of the new link
      arrive in sequence to the destination node, this is not enough.
      The arriving messages will still be delivered upwards in different
      threads, something entailing a risk of message disordering during
      the transition phase.
      
      To fix this, we introduce a synchronization mechanism between the
      two parallel links, so that traffic arriving on the new link cannot
      be added to its input queue until we are guaranteed that all
      pre-establishment messages have been delivered on the old, parallel
      link.
      
      This problem seems to always have been around, but its occurrence is
      so rare that it has not been noticed until recent intensive testing.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b4ed863
    • J
      tipc: clean up handling of link congestion · 3127a020
      Jon Paul Maloy 提交于
      After the recent changes in message importance handling it becomes
      possible to simplify handling of messages and sockets when we
      encounter link congestion.
      
      We merge the function tipc_link_cong() into link_schedule_user(),
      and simplify the code of the latter. The code should now be
      easier to follow, especially regarding return codes and handling
      of the message that caused the situation.
      
      In case the scheduling function is unable to pre-allocate a wakeup
      message buffer, it now returns -ENOBUFS, which is a more correct
      code than the previously used -EHOSTUNREACH.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3127a020
    • J
      tipc: introduce starvation free send algorithm · 1f66d161
      Jon Paul Maloy 提交于
      Currently, we only use a single counter; the length of the backlog
      queue, to determine whether a message should be accepted to the queue
      or not. Each time a message is being sent, the queue length is compared
      to a threshold value for the message's importance priority. If the queue
      length is beyond this threshold, the message is rejected. This algorithm
      implies a risk of starvation of low importance senders during very high
      load, because it may take a long time before the backlog queue has
      decreased enough to accept a lower level message.
      
      We now eliminate this risk by introducing a counter for each importance
      priority. When a message is sent, we check only the queue level for that
      particular message's priority. If that is ok, the message can be added
      to the backlog, irrespective of the queue level for other priorities.
      This way, each level is guaranteed a certain portion of the total
      bandwidth, and any risk of starvation is eliminated.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f66d161
  9. 25 3月, 2015 4 次提交
  10. 24 3月, 2015 1 次提交
  11. 21 3月, 2015 1 次提交
  12. 20 3月, 2015 4 次提交
  13. 19 3月, 2015 2 次提交
  14. 18 3月, 2015 3 次提交
    • Y
      tipc: withdraw tipc topology server name when namespace is deleted · 2b9bb7f3
      Ying Xue 提交于
      The TIPC topology server is a per namespace service associated with the
      tipc name {1, 1}. When a namespace is deleted, that name must be withdrawn
      before we call sk_release_kernel because the kernel socket release is
      done in init_net and trying to withdraw a TIPC name published in another
      namespace will fail with an error as:
      
      [  170.093264] Unable to remove local publication
      [  170.093264] (type=1, lower=1, ref=2184244004, key=2184244005)
      
      We fix this by breaking the association between the topology server name
      and socket before calling sk_release_kernel.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b9bb7f3
    • Y
      tipc: fix a potential deadlock when nametable is purged · 8460504b
      Ying Xue 提交于
      [   28.531768] =============================================
      [   28.532322] [ INFO: possible recursive locking detected ]
      [   28.532322] 3.19.0+ #194 Not tainted
      [   28.532322] ---------------------------------------------
      [   28.532322] insmod/583 is trying to acquire lock:
      [   28.532322]  (&(&nseq->lock)->rlock){+.....}, at: [<ffffffffa000d219>] tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
      [   28.532322]
      [   28.532322] but task is already holding lock:
      [   28.532322]  (&(&nseq->lock)->rlock){+.....}, at: [<ffffffffa000e0dc>] tipc_nametbl_stop+0xfc/0x1f0 [tipc]
      [   28.532322]
      [   28.532322] other info that might help us debug this:
      [   28.532322]  Possible unsafe locking scenario:
      [   28.532322]
      [   28.532322]        CPU0
      [   28.532322]        ----
      [   28.532322]   lock(&(&nseq->lock)->rlock);
      [   28.532322]   lock(&(&nseq->lock)->rlock);
      [   28.532322]
      [   28.532322]  *** DEADLOCK ***
      [   28.532322]
      [   28.532322]  May be due to missing lock nesting notation
      [   28.532322]
      [   28.532322] 3 locks held by insmod/583:
      [   28.532322]  #0:  (net_mutex){+.+.+.}, at: [<ffffffff8163e30f>] register_pernet_subsys+0x1f/0x50
      [   28.532322]  #1:  (&(&tn->nametbl_lock)->rlock){+.....}, at: [<ffffffffa000e091>] tipc_nametbl_stop+0xb1/0x1f0 [tipc]
      [   28.532322]  #2:  (&(&nseq->lock)->rlock){+.....}, at: [<ffffffffa000e0dc>] tipc_nametbl_stop+0xfc/0x1f0 [tipc]
      [   28.532322]
      [   28.532322] stack backtrace:
      [   28.532322] CPU: 1 PID: 583 Comm: insmod Not tainted 3.19.0+ #194
      [   28.532322] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      [   28.532322]  ffffffff82394460 ffff8800144cb928 ffffffff81792f3e 0000000000000007
      [   28.532322]  ffffffff82394460 ffff8800144cba28 ffffffff810a8080 ffff8800144cb998
      [   28.532322]  ffffffff810a4df3 ffff880013e9cb10 ffffffff82b0d330 ffff880013e9cb38
      [   28.532322] Call Trace:
      [   28.532322]  [<ffffffff81792f3e>] dump_stack+0x4c/0x65
      [   28.532322]  [<ffffffff810a8080>] __lock_acquire+0x740/0x1ca0
      [   28.532322]  [<ffffffff810a4df3>] ? __bfs+0x23/0x270
      [   28.532322]  [<ffffffff810a7506>] ? check_irq_usage+0x96/0xe0
      [   28.532322]  [<ffffffff810a8a73>] ? __lock_acquire+0x1133/0x1ca0
      [   28.532322]  [<ffffffffa000d219>] ? tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
      [   28.532322]  [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140
      [   28.532322]  [<ffffffffa000d219>] ? tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
      [   28.532322]  [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
      [   28.532322]  [<ffffffffa000d219>] ? tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
      [   28.532322]  [<ffffffffa000d219>] tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
      [   28.532322]  [<ffffffffa000e11e>] tipc_nametbl_stop+0x13e/0x1f0 [tipc]
      [   28.532322]  [<ffffffffa000dfe5>] ? tipc_nametbl_stop+0x5/0x1f0 [tipc]
      [   28.532322]  [<ffffffffa0004bab>] tipc_init_net+0x13b/0x150 [tipc]
      [   28.532322]  [<ffffffffa0004a75>] ? tipc_init_net+0x5/0x150 [tipc]
      [   28.532322]  [<ffffffff8163dece>] ops_init+0x4e/0x150
      [   28.532322]  [<ffffffff810aa66d>] ? trace_hardirqs_on+0xd/0x10
      [   28.532322]  [<ffffffff8163e1d3>] register_pernet_operations+0xf3/0x190
      [   28.532322]  [<ffffffff8163e31e>] register_pernet_subsys+0x2e/0x50
      [   28.532322]  [<ffffffffa002406a>] tipc_init+0x6a/0x1000 [tipc]
      [   28.532322]  [<ffffffffa0024000>] ? 0xffffffffa0024000
      [   28.532322]  [<ffffffff810002d9>] do_one_initcall+0x89/0x1c0
      [   28.532322]  [<ffffffff811b7cb0>] ? kmem_cache_alloc_trace+0x50/0x1b0
      [   28.532322]  [<ffffffff810e725b>] ? do_init_module+0x2b/0x200
      [   28.532322]  [<ffffffff810e7294>] do_init_module+0x64/0x200
      [   28.532322]  [<ffffffff810e9353>] load_module+0x12f3/0x18e0
      [   28.532322]  [<ffffffff810e5890>] ? show_initstate+0x50/0x50
      [   28.532322]  [<ffffffff810e9a19>] SyS_init_module+0xd9/0x110
      [   28.532322]  [<ffffffff8179f3b3>] sysenter_dispatch+0x7/0x1f
      
      Before tipc_purge_publications() calls tipc_nametbl_remove_publ() to
      remove a publication with a name sequence, the name sequence's lock
      is held. However, when tipc_nametbl_remove_publ() calling
      tipc_nameseq_remove_publ() to remove the publication, it first tries
      to query name sequence instance with the publication, and then holds
      the lock of the found name sequence. But as the lock may be already
      taken in tipc_purge_publications(), deadlock happens like above
      scenario demonstrated. As tipc_nameseq_remove_publ() doesn't grab name
      sequence's lock, the deadlock can be avoided if it's directly invoked
      by tipc_purge_publications().
      
      Fixes: 97ede29e ("tipc: convert name table read-write lock to RCU")
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8460504b
    • Y
      tipc: fix netns refcnt leak · 76100a8a
      Ying Xue 提交于
      When the TIPC module is loaded, we launch a topology server in kernel
      space, which in its turn is creating TIPC sockets for communication
      with topology server users. Because both the socket's creator and
      provider reside in the same module, it is necessary that the TIPC
      module's reference count remains zero after the server is started and
      the socket created; otherwise it becomes impossible to perform "rmmod"
      even on an idle module.
      
      Currently, we achieve this by defining a separate "tipc_proto_kern"
      protocol struct, that is used only for kernel space socket allocations.
      This structure has the "owner" field set to NULL, which restricts the
      module reference count from being be bumped when sk_alloc() for local
      sockets is called. Furthermore, we have defined three kernel-specific
      functions, tipc_sock_create_local(), tipc_sock_release_local() and
      tipc_sock_accept_local(), to avoid the module counter being modified
      when module local sockets are created or deleted. This has worked well
      until we introduced name space support.
      
      However, after name space support was introduced, we have observed that
      a reference count leak occurs, because the netns counter is not
      decremented in tipc_sock_delete_local().
      
      This commit remedies this problem. But instead of just modifying
      tipc_sock_delete_local(), we eliminate the whole parallel socket
      handling infrastructure, and start using the regular sk_create_kern(),
      kernel_accept() and sk_release_kernel() calls. Since those functions
      manipulate the module counter, we must now compensate for that by
      explicitly decrementing the counter after module local sockets are
      created, and increment it just before calling sk_release_kernel().
      
      Fixes: a62fbcce ("tipc: make subscriber server support net namespace")
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericson.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reported-by: NCong Wang <cwang@twopensource.com>
      Tested-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76100a8a
  15. 15 3月, 2015 7 次提交
    • J
      tipc: clean up handling of message priorities · e3eea1eb
      Jon Paul Maloy 提交于
      Messages transferred by TIPC are assigned an "importance priority", -an
      integer value indicating how to treat the message when there is link or
      destination socket congestion.
      
      There is no separate header field for this value. Instead, the message
      user values have been chosen in ascending order according to perceived
      importance, so that the message user field can be used for this.
      
      This is not a good solution. First, we have many more users than the
      needed priority levels, so we end up with treating more priority
      levels than necessary. Second, the user field cannot always
      accurately reflect the priority of the message. E.g., a message
      fragment packet should really have the priority of the enveloped
      user data message, and not the priority of the MSG_FRAGMENTER user.
      Until now, we have been working around this problem in different ways,
      but it is now time to implement a consistent way of handling such
      priorities, although still within the constraint that we cannot
      allocate any more bits in the regular data message header for this.
      
      In this commit, we define a new priority level, TIPC_SYSTEM_IMPORTANCE,
      that will be the only one used apart from the four (lower) user data
      levels. All non-data messages map down to this priority. Furthermore,
      we take some free bits from the MSG_FRAGMENTER header and allocate
      them to store the priority of the enveloped message. We then adjust
      the functions msg_importance()/msg_set_importance() so that they
      read/set the correct header fields depending on user type.
      
      This small protocol change is fully compatible, because the code at
      the receiving end of a link currently reads the importance level
      only from user data messages, where there is no change.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3eea1eb
    • J
      tipc: split link outqueue · 05dcc5aa
      Jon Paul Maloy 提交于
      struct tipc_link contains one single queue for outgoing packets,
      where both transmitted and waiting packets are queued.
      
      This infrastructure is hard to maintain, because we need
      to keep a number of fields to keep track of which packets are
      sent or unsent, and the number of packets in each category.
      
      A lot of code becomes simpler if we split this queue into a transmission
      queue, where sent/unacknowledged packets are kept, and a backlog queue,
      where we keep the not yet sent packets.
      
      In this commit we do this separation.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05dcc5aa
    • J
      tipc: eliminate unnecessary call to broadcast ack function · 2cdf3918
      Jon Paul Maloy 提交于
      The unicast packet header contains a broadcast acknowledge sequence
      number, that may need to be conveyed to the broadcast link for proper
      treatment. Currently, the function tipc_rcv(), which is on the most
      critical data path, calls the function tipc_bclink_acknowledge() to
      have this done. This call is made for each received packet, and results
      in the unconditional grabbing of the broadcast link spinlock.
      
      This is unnecessary, since we can see directly from tipc_rcv() if
      the acknowledged number differs from what has been previously acked
      from the node in question. In the vast majority of cases the numbers
      won't differ, and there is nothing to update.
      
      We now make the call to tipc_bclink_acknowledge() conditional
      to that the two ack values differ.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2cdf3918
    • J
      tipc: extract bundled buffers by cloning instead of copying · c1336ee4
      Jon Paul Maloy 提交于
      When we currently extract a bundled buffer from a message bundle in
      the function tipc_msg_extract(), we allocate a new buffer and explicitly
      copy the linear data area.
      
      This is unnecessary, since we can just clone the buffer and do
      skb_pull() on the clone to move the data pointer to the correct
      position.
      
      This is what we do in this commit.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c1336ee4
    • J
      tipc: eliminate unnecessary linearization of incoming buffers · 1149557d
      Jon Paul Maloy 提交于
      Currently, TIPC linearizes all incoming buffers directly at reception
      before passing them upwards in the stack. This is clearly a waste of
      CPU resources, and must be avoided.
      
      In this commit, we eliminate this unnecessary linearization. We still
      ensure that at least the message header is linear, and that the buffer
      is linearized where this is still needed, i.e. when unbundling and when
      reversing messages.
      
      In addition, we ensure that fragmented messages are validated after
      reassembly before delivering them upwards in the stack.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1149557d
    • J
      tipc: move message validation function to msg.c · cf2157f8
      Jon Paul Maloy 提交于
      The function link_buf_validate() is in reality re-entrant and context
      independent, and will in later commits be called from several locations.
      Therefore, we move it to msg.c, make it outline and rename the it to
      tipc_msg_validate().
      
      We also redesign the function to make proper use of pskb_may_pull()
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf2157f8
    • J
      tipc: add framework for node capabilities exchange · 7764d6e8
      Jon Paul Maloy 提交于
      The TIPC protocol spec has defined a 13 bit capability bitmap in
      the neighbor discovery header, as a means to maintain compatibility
      between different code and protocol generations. Until now this field
      has been unused.
      
      We now introduce the basic framework for exchanging capabilities
      between nodes at first contact. After exchange, a peer node's
      capabilities are stored as a 16 bit bitmap in struct tipc_node.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7764d6e8
  16. 11 3月, 2015 1 次提交
    • J
      tipc: ensure that idle links are deleted when a bearer is disabled · 169bf912
      Jon Paul Maloy 提交于
      commit afaa3f65
      (tipc: purge links when bearer is disabled) was an attempt to resolve
      a problem that turned out to have a more profound reason.
      
      When we disable a bearer, we delete all its pertaining links if
      there is no other bearer to perform failover to, or if the module
      is shutting down. In case there are dual bearers, we wait with
      deleting links until the failover procedure is finished.
      
      However, this misses the case when a link on the removed bearer
      was already down, so that there will be no failover procedure to
      finish the link delete. This causes confusion if a new bearer is
      added to replace the removed one, and also entails a small memory
      leak.
      
      This commit takes the current state of the link into account when
      deciding when to delete it, and also reverses the above-mentioned
      commit.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      169bf912