1. 27 7月, 2016 2 次提交
  2. 12 7月, 2016 1 次提交
  3. 16 6月, 2016 2 次提交
    • Y
      tipc: fix suspicious RCU usage · 66d95b67
      Ying Xue 提交于
      When run tipcTS&tipcTC test suite, the following complaint appears:
      
      [   56.926168] ===============================
      [   56.926169] [ INFO: suspicious RCU usage. ]
      [   56.926171] 4.7.0-rc1+ #160 Not tainted
      [   56.926173] -------------------------------
      [   56.926174] net/tipc/bearer.c:408 suspicious rcu_dereference_protected() usage!
      [   56.926175]
      [   56.926175] other info that might help us debug this:
      [   56.926175]
      [   56.926177]
      [   56.926177] rcu_scheduler_active = 1, debug_locks = 1
      [   56.926179] 3 locks held by swapper/4/0:
      [   56.926180]  #0:  (((&req->timer))){+.-...}, at: [<ffffffff810e79b5>] call_timer_fn+0x5/0x340
      [   56.926203]  #1:  (&(&req->lock)->rlock){+.-...}, at: [<ffffffffa000c29b>] disc_timeout+0x1b/0xd0 [tipc]
      [   56.926212]  #2:  (rcu_read_lock){......}, at: [<ffffffffa00055e0>] tipc_bearer_xmit_skb+0xb0/0x2e0 [tipc]
      [   56.926218]
      [   56.926218] stack backtrace:
      [   56.926221] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.7.0-rc1+ #160
      [   56.926222] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      [   56.926224]  0000000000000000 ffff880016803d28 ffffffff813c4423 ffff8800154252c0
      [   56.926227]  0000000000000001 ffff880016803d58 ffffffff810b7512 ffff8800124d8120
      [   56.926230]  ffff880013f8a160 ffff8800132b5ccc ffff8800124d8120 ffff880016803d88
      [   56.926234] Call Trace:
      [   56.926235]  <IRQ>  [<ffffffff813c4423>] dump_stack+0x67/0x94
      [   56.926250]  [<ffffffff810b7512>] lockdep_rcu_suspicious+0xe2/0x120
      [   56.926256]  [<ffffffffa00051f1>] tipc_l2_send_msg+0x131/0x1c0 [tipc]
      [   56.926261]  [<ffffffffa000567c>] tipc_bearer_xmit_skb+0x14c/0x2e0 [tipc]
      [   56.926266]  [<ffffffffa00055e0>] ? tipc_bearer_xmit_skb+0xb0/0x2e0 [tipc]
      [   56.926273]  [<ffffffffa000c280>] ? tipc_disc_init_msg+0x1f0/0x1f0 [tipc]
      [   56.926278]  [<ffffffffa000c280>] ? tipc_disc_init_msg+0x1f0/0x1f0 [tipc]
      [   56.926283]  [<ffffffffa000c2d6>] disc_timeout+0x56/0xd0 [tipc]
      [   56.926288]  [<ffffffff810e7a68>] call_timer_fn+0xb8/0x340
      [   56.926291]  [<ffffffff810e79b5>] ? call_timer_fn+0x5/0x340
      [   56.926296]  [<ffffffffa000c280>] ? tipc_disc_init_msg+0x1f0/0x1f0 [tipc]
      [   56.926300]  [<ffffffff810e8f4a>] run_timer_softirq+0x23a/0x390
      [   56.926306]  [<ffffffff810f89ff>] ? clockevents_program_event+0x7f/0x130
      [   56.926316]  [<ffffffff819727c3>] __do_softirq+0xc3/0x4a2
      [   56.926323]  [<ffffffff8106ba5a>] irq_exit+0x8a/0xb0
      [   56.926327]  [<ffffffff81972456>] smp_apic_timer_interrupt+0x46/0x60
      [   56.926331]  [<ffffffff81970a49>] apic_timer_interrupt+0x89/0x90
      [   56.926333]  <EOI>  [<ffffffff81027fda>] ? default_idle+0x2a/0x1a0
      [   56.926340]  [<ffffffff81027fd8>] ? default_idle+0x28/0x1a0
      [   56.926342]  [<ffffffff810289cf>] arch_cpu_idle+0xf/0x20
      [   56.926345]  [<ffffffff810adf0f>] default_idle_call+0x2f/0x50
      [   56.926347]  [<ffffffff810ae145>] cpu_startup_entry+0x215/0x3e0
      [   56.926353]  [<ffffffff81040ad9>] start_secondary+0xf9/0x100
      
      The warning appears as rtnl_dereference() is wrongly used in
      tipc_l2_send_msg() under RCU read lock protection. Instead the proper
      usage should be that rcu_dereference_rtnl() is called here.
      
      Fixes: 5b7066c3 ("tipc: stricter filtering of packets in bearer layer")
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66d95b67
    • J
      tipc: add neighbor monitoring framework · 35c55c98
      Jon Paul Maloy 提交于
      TIPC based clusters are by default set up with full-mesh link
      connectivity between all nodes. Those links are expected to provide
      a short failure detection time, by default set to 1500 ms. Because
      of this, the background load for neighbor monitoring in an N-node
      cluster increases with a factor N on each node, while the overall
      monitoring traffic through the network infrastructure increases at
      a ~(N * (N - 1)) rate. Experience has shown that such clusters don't
      scale well beyond ~100 nodes unless we significantly increase failure
      discovery tolerance.
      
      This commit introduces a framework and an algorithm that drastically
      reduces this background load, while basically maintaining the original
      failure detection times across the whole cluster. Using this algorithm,
      background load will now grow at a rate of ~(2 * sqrt(N)) per node, and
      at ~(2 * N * sqrt(N)) in traffic overhead. As an example, each node will
      now have to actively monitor 38 neighbors in a 400-node cluster, instead
      of as before 399.
      
      This "Overlapping Ring Supervision Algorithm" is completely distributed
      and employs no centralized or coordinated state. It goes as follows:
      
      - Each node makes up a linearly ascending, circular list of all its N
        known neighbors, based on their TIPC node identity. This algorithm
        must be the same on all nodes.
      
      - The node then selects the next M = sqrt(N) - 1 nodes downstream from
        itself in the list, and chooses to actively monitor those. This is
        called its "local monitoring domain".
      
      - It creates a domain record describing the monitoring domain, and
        piggy-backs this in the data area of all neighbor monitoring messages
        (LINK_PROTOCOL/STATE) leaving that node. This means that all nodes in
        the cluster eventually (default within 400 ms) will learn about
        its monitoring domain.
      
      - Whenever a node discovers a change in its local domain, e.g., a node
        has been added or has gone down, it creates and sends out a new
        version of its node record to inform all neighbors about the change.
      
      - A node receiving a domain record from anybody outside its local domain
        matches this against its own list (which may not look the same), and
        chooses to not actively monitor those members of the received domain
        record that are also present in its own list. Instead, it relies on
        indications from the direct monitoring nodes if an indirectly
        monitored node has gone up or down. If a node is indicated lost, the
        receiving node temporarily activates its own direct monitoring towards
        that node in order to confirm, or not, that it is actually gone.
      
      - Since each node is actively monitoring sqrt(N) downstream neighbors,
        each node is also actively monitored by the same number of upstream
        neighbors. This means that all non-direct monitoring nodes normally
        will receive sqrt(N) indications that a node is gone.
      
      - A major drawback with ring monitoring is how it handles failures that
        cause massive network partitionings. If both a lost node and all its
        direct monitoring neighbors are inside the lost partition, the nodes in
        the remaining partition will never receive indications about the loss.
        To overcome this, each node also chooses to actively monitor some
        nodes outside its local domain. Those nodes are called remote domain
        "heads", and are selected in such a way that no node in the cluster
        will be more than two direct monitoring hops away. Because of this,
        each node, apart from monitoring the member of its local domain, will
        also typically monitor sqrt(N) remote head nodes.
      
      - As an optimization, local list status, domain status and domain
        records are marked with a generation number. This saves senders from
        unnecessarily conveying  unaltered domain records, and receivers from
        performing unneeded re-adaptations of their node monitoring list, such
        as re-assigning domain heads.
      
      - As a measure of caution we have added the possibility to disable the
        new algorithm through configuration. We do this by keeping a threshold
        value for the cluster size; a cluster that grows beyond this value
        will switch from full-mesh to ring monitoring, and vice versa when
        it shrinks below the value. This means that if the threshold is set to
        a value larger than any anticipated cluster size (default size is 32)
        the new algorithm is effectively disabled. A patch set for altering the
        threshold value and for listing the table contents will follow shortly.
      
      - This change is fully backwards compatible.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35c55c98
  4. 08 4月, 2016 2 次提交
    • J
      tipc: stricter filtering of packets in bearer layer · 5b7066c3
      Jon Paul Maloy 提交于
      Resetting a bearer/interface, with the consequence of resetting all its
      pertaining links, is not an atomic action. This becomes particularly
      evident in very large clusters, where a lot of traffic may happen on the
      remaining links while we are busy shutting them down. In extreme cases,
      we may even see links being re-created and re-established before we are
      finished with the job.
      
      To solve this, we now introduce a solution where we temporarily detach
      the bearer from the interface when the bearer is reset. This inhibits
      all packet reception, while sending still is possible. For the latter,
      we use the fact that the device's user pointer now is zero to filter out
      which packets can be sent during this situation; i.e., outgoing RESET
      messages only.  This filtering serves to speed up the neighbors'
      detection of the loss event, and saves us from unnecessary probing.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b7066c3
    • J
      tipc: eliminate buffer leak in bearer layer · 4e801fa1
      Jon Paul Maloy 提交于
      When enabling a bearer we create a 'neigbor discoverer' instance by
      calling the function tipc_disc_create() before the bearer is actually
      registered in the list of enabled bearers. Because of this, the very
      first discovery broadcast message, created by the mentioned function,
      is lost, since it cannot find any valid bearer to use. Furthermore,
      the used send function, tipc_bearer_xmit_skb() does not free the given
      buffer when it cannot find a  bearer, resulting in the leak of exactly
      one send buffer each time a bearer is enabled.
      
      This commit fixes this problem by introducing two changes:
      
      1) Instead of attemting to send the discovery message directly, we let
         tipc_disc_create() return the discovery buffer to the calling
         function, tipc_enable_bearer(), so that the latter can send it
         when the enabling sequence is finished.
      
      2) In tipc_bearer_xmit_skb(), as well as in the two other transmit
         functions at the bearer layer, we now free the indicated buffer or
         buffer chain when a valid bearer cannot be found.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e801fa1
  5. 08 3月, 2016 1 次提交
  6. 21 11月, 2015 1 次提交
  7. 24 10月, 2015 5 次提交
  8. 16 10月, 2015 1 次提交
    • J
      tipc: send out RESET immediately when link goes down · 282b3a05
      Jon Paul Maloy 提交于
      When a link is taken down because of a node local event, such as
      disabling of a bearer or an interface, we currently leave it to the
      peer node to discover the broken communication. The default time for
      such failure discovery is 1.5-2 seconds.
      
      If we instead allow the terminating link endpoint to send out a RESET
      message at the moment it is reset, we can achieve the impression that
      both endpoints are going down instantly. Since this is a very common
      scenario, we find it worthwhile to make this small modification.
      
      Apart from letting the link produce the said message, we also have to
      ensure that the interface is able to transmit it before TIPC is
      detached. We do this by performing the disabling of a bearer in three
      steps:
      
      1) Disable reception of TIPC packets from the interface in question.
      2) Take down the links, while allowing them so send out a RESET message.
      3) Disable transmission of TIPC packets on the interface.
      
      Apart from this, we now have to react on the NETDEV_GOING_DOWN event,
      instead of as currently the NEDEV_DOWN event, to ensure that such
      transmission is possible during the teardown phase.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      282b3a05
  9. 31 7月, 2015 1 次提交
  10. 21 7月, 2015 1 次提交
    • J
      tipc: make media xmit call outside node spinlock context · af9b028e
      Jon Paul Maloy 提交于
      Currently, message sending is performed through a deep call chain,
      where the node spinlock is grabbed and held during a significant
      part of the transmission time. This is clearly detrimental to
      overall throughput performance; it would be better if we could send
      the message after the spinlock has been released.
      
      In this commit, we do instead let the call revert on the stack after
      the buffer chain has been added to the transmission queue, whereafter
      clones of the buffers are transmitted to the device layer outside the
      spinlock scope.
      
      As a further step in our effort to separate the roles of the node
      and link entities we also move the function tipc_link_xmit() to
      node.c, and rename it to tipc_node_xmit().
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af9b028e
  11. 15 5月, 2015 1 次提交
  12. 10 5月, 2015 1 次提交
  13. 30 4月, 2015 1 次提交
  14. 11 3月, 2015 1 次提交
    • J
      tipc: ensure that idle links are deleted when a bearer is disabled · 169bf912
      Jon Paul Maloy 提交于
      commit afaa3f65
      (tipc: purge links when bearer is disabled) was an attempt to resolve
      a problem that turned out to have a more profound reason.
      
      When we disable a bearer, we delete all its pertaining links if
      there is no other bearer to perform failover to, or if the module
      is shutting down. In case there are dual bearers, we wait with
      deleting links until the failover procedure is finished.
      
      However, this misses the case when a link on the removed bearer
      was already down, so that there will be no failover procedure to
      finish the link delete. This causes confusion if a new bearer is
      added to replace the removed one, and also entails a small memory
      leak.
      
      This commit takes the current state of the link into account when
      deciding when to delete it, and also reverses the above-mentioned
      commit.
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      169bf912
  15. 06 3月, 2015 1 次提交
  16. 28 2月, 2015 1 次提交
  17. 10 2月, 2015 6 次提交
  18. 13 1月, 2015 5 次提交
  19. 25 11月, 2014 1 次提交
  20. 22 11月, 2014 5 次提交