1. 01 4月, 2018 3 次提交
    • J
      tipc: permit overlapping service ranges in name table · 37922ea4
      Jon Maloy 提交于
      With the new RB tree structure for service ranges it becomes possible to
      solve an old problem; - we can now allow overlapping service ranges in
      the table.
      
      When inserting a new service range to the tree, we use 'lower' as primary
      key, and when necessary 'upper' as secondary key.
      
      Since there may now be multiple service ranges matching an indicated
      'lower' value, we must also add the 'upper' value to the functions
      used for removing publications, so that the correct, corresponding
      range item can be found.
      
      These changes guarantee that a well-formed publication/withdrawal item
      from a peer node never will be rejected, and make it possible to
      eliminate the problematic backlog functionality we currently have for
      handling such cases.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37922ea4
    • J
      tipc: refactor name table translate function · f20889f7
      Jon Maloy 提交于
      The function tipc_nametbl_translate() function is ugly and hard to
      follow. This can be improved somewhat by introducing a stack variable
      for holding the publication list to be used and re-ordering the if-
      clauses for selection of algorithm.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f20889f7
    • J
      tipc: replace name table service range array with rb tree · 218527fe
      Jon Maloy 提交于
      The current design of the binding table has an unnecessary memory
      consuming and complex data structure. It aggregates the service range
      items into an array, which is expanded by a factor two every time it
      becomes too small to hold a new item. Furthermore, the arrays never
      shrink when the number of ranges diminishes.
      
      We now replace this array with an RB tree that is holding the range
      items as tree nodes, each range directly holding a list of bindings.
      
      This, along with a few name changes, improves both readability and
      volume of the code, as well as reducing memory consumption and hopefully
      improving cache hit rate.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      218527fe
  2. 24 3月, 2018 2 次提交
    • J
      tipc: remove direct accesses to own_addr field in struct tipc_net · 23fd3eac
      Jon Maloy 提交于
      As a preparation to changing the addressing structure of TIPC we replace
      all direct accesses to the tipc_net::own_addr field with the function
      dedicated for this, tipc_own_addr().
      
      There are no changes to program logics in this commit.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23fd3eac
    • J
      tipc: allow closest-first lookup algorithm when legacy address is configured · b89afb11
      Jon Maloy 提交于
      The removal of an internal structure of the node address has an unwanted
      side effect.
      - Currently, if a user is sending an anycast message with destination
        domain 0, the tipc_namebl_translate() function will use the 'closest-
        first' algorithm to first look for a node local destination, and only
        when no such is found, will it resort to the cluster global 'round-
        robin' lookup algorithm.
      - Current users can get around this, and enforce unconditional use of
        global round-robin by indicating a destination as Z.0.0 or Z.C.0.
      - This option disappears when we make the node address flat, since the
        lookup algorithm has no way of recognizing this case. So, as long as
        there are node local destinations, the algorithm will always select
        one of those, and there is nothing the sender can do to change this.
      
      We solve this by eliminating the 'closest-first' option, which was never
      a good idea anyway, for non-legacy users, but only for those. To
      distinguish between legacy users and non-legacy users we introduce a new
      flag 'legacy_addr_format' in struct tipc_core, to be set when the user
      configures a legacy-style Z.C.N node address. Hence, when a legacy user
      indicates a zero lookup domain 'closest-first' is selected, and in all
      other cases we use 'round-robin'.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b89afb11
  3. 18 3月, 2018 4 次提交
  4. 17 2月, 2018 4 次提交
  5. 16 1月, 2018 1 次提交
    • J
      tipc: fix bug during lookup of multicast destination nodes · e9a03445
      Jon Maloy 提交于
      In commit 232d07b7 ("tipc: improve groupcast scope handling") we
      inadvertently broke non-group multicast transmission when changing the
      parameter 'domain' to 'scope' in the function
      tipc_nametbl_lookup_dst_nodes(). We missed to make the corresponding
      change in the calling function, with the result that the lookup always
      fails.
      
      A closer anaysis reveals that this parameter is not needed at all.
      Non-group multicast is hard coded to use CLUSTER_SCOPE, and in the
      current implementation this will be delivered to all matching
      destinations except those which are published with NODE_SCOPE on other
      nodes. Since such publications never will be visible on the sending node
      anyway, it makes no sense to discriminate by scope at all.
      
      We now remove this parameter altogether.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e9a03445
  6. 10 1月, 2018 3 次提交
    • J
      tipc: improve groupcast scope handling · 232d07b7
      Jon Maloy 提交于
      When a member joins a group, it also indicates a binding scope. This
      makes it possible to create both node local groups, invisible to other
      nodes, as well as cluster global groups, visible everywhere.
      
      In order to avoid that different members end up having permanently
      differing views of group size and memberhip, we must inhibit locally
      and globally bound members from joining the same group.
      
      We do this by using the binding scope as an additional separator between
      groups. I.e., a member must ignore all membership events from sockets
      using a different scope than itself, and all lookups for message
      destinations must require an exact match between the message's lookup
      scope and the potential target's binding scope.
      
      Apart from making it possible to create local groups using the same
      identity on different nodes, a side effect of this is that it now also
      becomes possible to create a cluster global group with the same identity
      across the same nodes, without interfering with the local groups.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      232d07b7
    • J
      tipc: add option to suppress PUBLISH events for pre-existing publications · 8348500f
      Jon Maloy 提交于
      Currently, when a user is subscribing for binding table publications,
      he will receive a PUBLISH event for all already existing matching items
      in the binding table.
      
      However, a group socket making a subscriptions doesn't need this initial
      status update from the binding table, because it has already scanned it
      during the join operation. Worse, the multiplicatory effect of issuing
      mutual events for dozens or hundreds group members within a short time
      frame put a heavy load on the topology server, with the end result that
      scale out operations on a big group tend to take much longer than needed.
      
      We now add a new filter option, TIPC_SUB_NO_STATUS, for topology server
      subscriptions, so that this initial avalanche of events is suppressed.
      This change, along with the previous commit, significantly improves the
      range and speed of group scale out operations.
      
      We keep the new option internal for the tipc driver, at least for now.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8348500f
    • J
      tipc: send out join messages as soon as new member is discovered · d12d2e12
      Jon Maloy 提交于
      When a socket is joining a group, we look up in the binding table to
      find if there are already other members of the group present. This is
      used for being able to return EAGAIN instead of EHOSTUNREACH if the
      user proceeds directly to a send attempt.
      
      However, the information in the binding table can be used to directly
      set the created member in state MBR_PUBLISHED and send a JOIN message
      to the peer, instead of waiting for a topology PUBLISH event to do this.
      When there are many members in a group, the propagation time for such
      events can be significant, and we can save time during the join
      operation if we use the initial lookup result fully.
      
      In this commit, we eliminate the member state MBR_DISCOVERED which has
      been the result of the initial lookup, and do instead go directly to
      MBR_PUBLISHED, which initiates the setup.
      
      After this change, the tipc_member FSM looks as follows:
      
           +-----------+
      ---->| PUBLISHED |-----------------------------------------------+
      PUB- +-----------+                                 LEAVE/WITHRAW |
      LISH       |JOIN                                                 |
                 |     +-------------------------------------------+   |
                 |     |                            LEAVE/WITHDRAW |   |
                 |     |                +------------+             |   |
                 |     |   +----------->|  PENDING   |---------+   |   |
                 |     |   |msg/maxactv +-+---+------+  LEAVE/ |   |   |
                 |     |   |              |   |       WITHDRAW |   |   |
                 |     |   |   +----------+   |                |   |   |
                 |     |   |   |revert/maxactv|                |   |   |
                 |     |   |   V              V                V   V   V
                 |   +----------+  msg  +------------+       +-----------+
                 +-->|  JOINED  |------>|   ACTIVE   |------>|  LEAVING  |--->
                 |   +----------+       +--- -+------+ LEAVE/+-----------+DOWN
                 |        A   A               |      WITHDRAW A   A    A   EVT
                 |        |   |               |RECLAIM        |   |    |
                 |        |   |REMIT          V               |   |    |
                 |        |   |== adv   +------------+        |   |    |
                 |        |   +---------| RECLAIMING |--------+   |    |
                 |        |             +-----+------+  LEAVE/    |    |
                 |        |                   |REMIT   WITHDRAW   |    |
                 |        |                   |< adv              |    |
                 |        |msg/               V            LEAVE/ |    |
                 |        |adv==ADV_IDLE+------------+   WITHDRAW |    |
                 |        +-------------|  REMITTED  |------------+    |
                 |                      +------------+                 |
                 |PUBLISH                                              |
      JOIN +-----------+                                LEAVE/WITHDRAW |
      ---->|  JOINING  |-----------------------------------------------+
           +-----------+
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d12d2e12
  7. 26 10月, 2017 1 次提交
  8. 13 10月, 2017 3 次提交
    • J
      tipc: introduce group anycast messaging · ee106d7f
      Jon Maloy 提交于
      In this commit, we make it possible to send connectionless unicast
      messages to any member corresponding to the given member identity,
      when there is more than one such member. The sender must use a
      TIPC_ADDR_NAME address to achieve this effect.
      
      We also perform load balancing between the destinations, i.e., we
      primarily select one which has advertised sufficient send window
      to not cause a block/EAGAIN delay, if any. This mechanism is
      overlayed on the always present round-robin selection.
      
      Anycast messages are subject to the same start synchronization
      and flow control mechanism as group broadcast messages.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee106d7f
    • J
      tipc: introduce communication groups · 75da2163
      Jon Maloy 提交于
      As a preparation for introducing flow control for multicast and datagram
      messaging we need a more strictly defined framework than we have now. A
      socket must be able keep track of exactly how many and which other
      sockets it is allowed to communicate with at any moment, and keep the
      necessary state for those.
      
      We therefore introduce a new concept we have named Communication Group.
      Sockets can join a group via a new setsockopt() call TIPC_GROUP_JOIN.
      The call takes four parameters: 'type' serves as group identifier,
      'instance' serves as an logical member identifier, and 'scope' indicates
      the visibility of the group (node/cluster/zone). Finally, 'flags' makes
      it possible to set certain properties for the member. For now, there is
      only one flag, indicating if the creator of the socket wants to receive
      a copy of broadcast or multicast messages it is sending via the socket,
      and if wants to be eligible as destination for its own anycasts.
      
      A group is closed, i.e., sockets which have not joined a group will
      not be able to send messages to or receive messages from members of
      the group, and vice versa.
      
      Any member of a group can send multicast ('group broadcast') messages
      to all group members, optionally including itself, using the primitive
      send(). The messages are received via the recvmsg() primitive. A socket
      can only be member of one group at a time.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      75da2163
    • J
      tipc: improve destination linked list · a80ae530
      Jon Maloy 提交于
      We often see a need for a linked list of destination identities,
      sometimes containing a port number, sometimes a node identity, and
      sometimes both. The currently defined struct u32_list is not generic
      enough to cover all cases, so we extend it to contain two u32 integers
      and rename it to struct tipc_dest_list.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a80ae530
  9. 29 3月, 2017 1 次提交
    • Y
      tipc: adjust the policy of holding subscription kref · 7efea60d
      Ying Xue 提交于
      When a new subscription object is inserted into name_seq->subscriptions
      list, it's under name_seq->lock protection; when a subscription is
      deleted from the list, it's also under the same lock protection;
      similarly, when accessing a subscription by going through subscriptions
      list, the entire process is also protected by the name_seq->lock.
      
      Therefore, if subscription refcount is increased before it's inserted
      into subscriptions list, and its refcount is decreased after it's
      deleted from the list, it will be unnecessary to hold refcount at all
      before accessing subscription object which is obtained by going through
      subscriptions list under name_seq->lock protection.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7efea60d
  10. 21 1月, 2017 1 次提交
  11. 04 1月, 2017 1 次提交
  12. 08 3月, 2016 1 次提交
  13. 06 2月, 2016 1 次提交
  14. 21 11月, 2015 1 次提交
  15. 05 5月, 2015 1 次提交
    • Y
      tipc: rename functions defined in subscr.c · 57f1d186
      Ying Xue 提交于
      When a topology server accepts a connection request from its client,
      it allocates a connection instance and a tipc_subscriber structure
      object. The former is used to communicate with client, and the latter
      is often treated as a subscriber which manages all subscription events
      requested from a same client. When a topology server receives a request
      of subscribing name services from a client through the connection, it
      creates a tipc_subscription structure instance which is seen as a
      subscription recording what name services are subscribed. In order to
      manage all subscriptions from a same client, topology server links
      them into the subscrp_list of the subscriber. So subscriber and
      subscription completely represents different meanings respectively,
      but function names associated with them make us so confused that we
      are unable to easily tell which function is against subscriber and
      which is to subscription. So we want to eliminate the confusion by
      renaming them.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      57f1d186
  16. 18 3月, 2015 1 次提交
    • Y
      tipc: fix a potential deadlock when nametable is purged · 8460504b
      Ying Xue 提交于
      [   28.531768] =============================================
      [   28.532322] [ INFO: possible recursive locking detected ]
      [   28.532322] 3.19.0+ #194 Not tainted
      [   28.532322] ---------------------------------------------
      [   28.532322] insmod/583 is trying to acquire lock:
      [   28.532322]  (&(&nseq->lock)->rlock){+.....}, at: [<ffffffffa000d219>] tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
      [   28.532322]
      [   28.532322] but task is already holding lock:
      [   28.532322]  (&(&nseq->lock)->rlock){+.....}, at: [<ffffffffa000e0dc>] tipc_nametbl_stop+0xfc/0x1f0 [tipc]
      [   28.532322]
      [   28.532322] other info that might help us debug this:
      [   28.532322]  Possible unsafe locking scenario:
      [   28.532322]
      [   28.532322]        CPU0
      [   28.532322]        ----
      [   28.532322]   lock(&(&nseq->lock)->rlock);
      [   28.532322]   lock(&(&nseq->lock)->rlock);
      [   28.532322]
      [   28.532322]  *** DEADLOCK ***
      [   28.532322]
      [   28.532322]  May be due to missing lock nesting notation
      [   28.532322]
      [   28.532322] 3 locks held by insmod/583:
      [   28.532322]  #0:  (net_mutex){+.+.+.}, at: [<ffffffff8163e30f>] register_pernet_subsys+0x1f/0x50
      [   28.532322]  #1:  (&(&tn->nametbl_lock)->rlock){+.....}, at: [<ffffffffa000e091>] tipc_nametbl_stop+0xb1/0x1f0 [tipc]
      [   28.532322]  #2:  (&(&nseq->lock)->rlock){+.....}, at: [<ffffffffa000e0dc>] tipc_nametbl_stop+0xfc/0x1f0 [tipc]
      [   28.532322]
      [   28.532322] stack backtrace:
      [   28.532322] CPU: 1 PID: 583 Comm: insmod Not tainted 3.19.0+ #194
      [   28.532322] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      [   28.532322]  ffffffff82394460 ffff8800144cb928 ffffffff81792f3e 0000000000000007
      [   28.532322]  ffffffff82394460 ffff8800144cba28 ffffffff810a8080 ffff8800144cb998
      [   28.532322]  ffffffff810a4df3 ffff880013e9cb10 ffffffff82b0d330 ffff880013e9cb38
      [   28.532322] Call Trace:
      [   28.532322]  [<ffffffff81792f3e>] dump_stack+0x4c/0x65
      [   28.532322]  [<ffffffff810a8080>] __lock_acquire+0x740/0x1ca0
      [   28.532322]  [<ffffffff810a4df3>] ? __bfs+0x23/0x270
      [   28.532322]  [<ffffffff810a7506>] ? check_irq_usage+0x96/0xe0
      [   28.532322]  [<ffffffff810a8a73>] ? __lock_acquire+0x1133/0x1ca0
      [   28.532322]  [<ffffffffa000d219>] ? tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
      [   28.532322]  [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140
      [   28.532322]  [<ffffffffa000d219>] ? tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
      [   28.532322]  [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
      [   28.532322]  [<ffffffffa000d219>] ? tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
      [   28.532322]  [<ffffffffa000d219>] tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
      [   28.532322]  [<ffffffffa000e11e>] tipc_nametbl_stop+0x13e/0x1f0 [tipc]
      [   28.532322]  [<ffffffffa000dfe5>] ? tipc_nametbl_stop+0x5/0x1f0 [tipc]
      [   28.532322]  [<ffffffffa0004bab>] tipc_init_net+0x13b/0x150 [tipc]
      [   28.532322]  [<ffffffffa0004a75>] ? tipc_init_net+0x5/0x150 [tipc]
      [   28.532322]  [<ffffffff8163dece>] ops_init+0x4e/0x150
      [   28.532322]  [<ffffffff810aa66d>] ? trace_hardirqs_on+0xd/0x10
      [   28.532322]  [<ffffffff8163e1d3>] register_pernet_operations+0xf3/0x190
      [   28.532322]  [<ffffffff8163e31e>] register_pernet_subsys+0x2e/0x50
      [   28.532322]  [<ffffffffa002406a>] tipc_init+0x6a/0x1000 [tipc]
      [   28.532322]  [<ffffffffa0024000>] ? 0xffffffffa0024000
      [   28.532322]  [<ffffffff810002d9>] do_one_initcall+0x89/0x1c0
      [   28.532322]  [<ffffffff811b7cb0>] ? kmem_cache_alloc_trace+0x50/0x1b0
      [   28.532322]  [<ffffffff810e725b>] ? do_init_module+0x2b/0x200
      [   28.532322]  [<ffffffff810e7294>] do_init_module+0x64/0x200
      [   28.532322]  [<ffffffff810e9353>] load_module+0x12f3/0x18e0
      [   28.532322]  [<ffffffff810e5890>] ? show_initstate+0x50/0x50
      [   28.532322]  [<ffffffff810e9a19>] SyS_init_module+0xd9/0x110
      [   28.532322]  [<ffffffff8179f3b3>] sysenter_dispatch+0x7/0x1f
      
      Before tipc_purge_publications() calls tipc_nametbl_remove_publ() to
      remove a publication with a name sequence, the name sequence's lock
      is held. However, when tipc_nametbl_remove_publ() calling
      tipc_nameseq_remove_publ() to remove the publication, it first tries
      to query name sequence instance with the publication, and then holds
      the lock of the found name sequence. But as the lock may be already
      taken in tipc_purge_publications(), deadlock happens like above
      scenario demonstrated. As tipc_nameseq_remove_publ() doesn't grab name
      sequence's lock, the deadlock can be avoided if it's directly invoked
      by tipc_purge_publications().
      
      Fixes: 97ede29e ("tipc: convert name table read-write lock to RCU")
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8460504b
  17. 10 2月, 2015 3 次提交
  18. 06 2月, 2015 1 次提交
    • J
      tipc: simplify socket multicast reception · 3c724acd
      Jon Paul Maloy 提交于
      The structure 'tipc_port_list' is used to collect port numbers
      representing multicast destination socket on a receiving node.
      The list is not based on a standard linked list, and is in reality
      optimized for the uncommon case that there are more than one
      multicast destinations per node. This makes the list handling
      unecessarily complex, and as a consequence, even the socket
      multicast reception becomes more complex.
      
      In this commit, we replace 'tipc_port_list' with a new 'struct
      tipc_plist', which is based on a standard list. We give the new
      list stack (push/pop) semantics, someting that simplifies
      the implementation of the function tipc_sk_mcast_rcv().
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c724acd
  19. 13 1月, 2015 4 次提交
  20. 10 12月, 2014 1 次提交
  21. 09 12月, 2014 2 次提交