1. 26 7月, 2019 2 次提交
    • T
      tipc: fix changeover issues due to large packet · 2320bcda
      Tuong Lien 提交于
      In conjunction with changing the interfaces' MTU (e.g. especially in
      the case of a bonding) where the TIPC links are brought up and down
      in a short time, a couple of issues were detected with the current link
      changeover mechanism:
      
      1) When one link is up but immediately forced down again, the failover
      procedure will be carried out in order to failover all the messages in
      the link's transmq queue onto the other working link. The link and node
      state is also set to FAILINGOVER as part of the process. The message
      will be transmited in form of a FAILOVER_MSG, so its size is plus of 40
      bytes (= the message header size). There is no problem if the original
      message size is not larger than the link's MTU - 40, and indeed this is
      the max size of a normal payload messages. However, in the situation
      above, because the link has just been up, the messages in the link's
      transmq are almost SYNCH_MSGs which had been generated by the link
      synching procedure, then their size might reach the max value already!
      When the FAILOVER_MSG is built on the top of such a SYNCH_MSG, its size
      will exceed the link's MTU. As a result, the messages are dropped
      silently and the failover procedure will never end up, the link will
      not be able to exit the FAILINGOVER state, so cannot be re-established.
      
      2) The same scenario above can happen more easily in case the MTU of
      the links is set differently or when changing. In that case, as long as
      a large message in the failure link's transmq queue was built and
      fragmented with its link's MTU > the other link's one, the issue will
      happen (there is no need of a link synching in advance).
      
      3) The link synching procedure also faces with the same issue but since
      the link synching is only started upon receipt of a SYNCH_MSG, dropping
      the message will not result in a state deadlock, but it is not expected
      as design.
      
      The 1) & 3) issues are resolved by the last commit that only a dummy
      SYNCH_MSG (i.e. without data) is generated at the link synching, so the
      size of a FAILOVER_MSG if any then will never exceed the link's MTU.
      
      For the 2) issue, the only solution is trying to fragment the messages
      in the failure link's transmq queue according to the working link's MTU
      so they can be failovered then. A new function is made to accomplish
      this, it will still be a TUNNEL PROTOCOL/FAILOVER MSG but if the
      original message size is too large, it will be fragmented & reassembled
      at the receiving side.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2320bcda
    • T
      tipc: optimize link synching mechanism · 4929a932
      Tuong Lien 提交于
      This commit along with the next one are to resolve the issues with the
      link changeover mechanism. See that commit for details.
      
      Basically, for the link synching, from now on, we will send only one
      single ("dummy") SYNCH message to peer. The SYNCH message does not
      contain any data, just a header conveying the synch point to the peer.
      
      A new node capability flag ("TIPC_TUNNEL_ENHANCED") is introduced for
      backward compatible!
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Suggested-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4929a932
  2. 02 7月, 2019 1 次提交
  3. 26 6月, 2019 3 次提交
  4. 19 6月, 2019 1 次提交
    • T
      tipc: fix issues with early FAILOVER_MSG from peer · d0f84d08
      Tuong Lien 提交于
      It appears that a FAILOVER_MSG can come from peer even when the failure
      link is resetting (i.e. just after the 'node_write_unlock()'...). This
      means the failover procedure on the node has not been started yet.
      The situation is as follows:
      
               node1                                node2
        linkb          linka                  linka        linkb
          |              |                      |            |
          |              |                      x failure    |
          |              |                  RESETTING        |
          |              |                      |            |
          |              x failure            RESET          |
          |          RESETTING             FAILINGOVER       |
          |              |   (FAILOVER_MSG)     |            |
          |<-------------------------------------------------|
          | *FAILINGOVER |                      |            |
          |              | (dummy FAILOVER_MSG) |            |
          |------------------------------------------------->|
          |            RESET                    |            | FAILOVER_END
          |         FAILINGOVER               RESET          |
          .              .                      .            .
          .              .                      .            .
          .              .                      .            .
      
      Once this happens, the link failover procedure will be triggered
      wrongly on the receiving node since the node isn't in FAILINGOVER state
      but then another link failover will be carried out.
      The consequences are:
      
      1) A peer might get stuck in FAILINGOVER state because the 'sync_point'
      was set, reset and set incorrectly, the criteria to end the failover
      would not be met, it could keep waiting for a message that has already
      received.
      
      2) The early FAILOVER_MSG(s) could be queued in the link failover
      deferdq but would be purged or not pulled out because the 'drop_point'
      was not set correctly.
      
      3) The early FAILOVER_MSG(s) could be dropped too.
      
      4) The dummy FAILOVER_MSG could make the peer leaving FAILINGOVER state
      shortly, but later on it would be restarted.
      
      The same situation can also happen when the link is in PEER_RESET state
      and a FAILOVER_MSG arrives.
      
      The commit resolves the issues by forcing the link down immediately, so
      the failover procedure will be started normally (which is the same as
      when receiving a FAILOVER_MSG and the link is in up state).
      
      Also, the function "tipc_node_link_failover()" is toughen to avoid such
      a situation from happening.
      Acked-by: NJon Maloy <jon.maloy@ericsson.se>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0f84d08
  5. 18 6月, 2019 1 次提交
  6. 04 5月, 2019 1 次提交
    • T
      tipc: fix missing Name entries due to half-failover · c0b14a08
      Tuong Lien 提交于
      TIPC link can temporarily fall into "half-establish" that only one of
      the link endpoints is ESTABLISHED and starts to send traffic, PROTOCOL
      messages, whereas the other link endpoint is not up (e.g. immediately
      when the endpoint receives ACTIVATE_MSG, the network interface goes
      down...).
      
      This is a normal situation and will be settled because the link
      endpoint will be eventually brought down after the link tolerance time.
      
      However, the situation will become worse when the second link is
      established before the first link endpoint goes down,
      For example:
      
         1. Both links <1A-2A>, <1B-2B> down
         2. Link endpoint 2A up, but 1A still down (e.g. due to network
            disturbance, wrong session, etc.)
         3. Link <1B-2B> up
         4. Link endpoint 2A down (e.g. due to link tolerance timeout)
         5. Node B starts failover onto link <1B-2B>
      
         ==> Node A does never start link failover.
      
      When the "half-failover" situation happens, two consequences have been
      observed:
      
      a) Peer link/node gets stuck in FAILINGOVER state;
      b) Traffic or user messages that peer node is trying to failover onto
      the second link can be partially or completely dropped by this node.
      
      The consequence a) was actually solved by commit c140eb16 ("tipc:
      fix failover problem"), but that commit didn't cover the b). It's due
      to the fact that the tunnel link endpoint has never been prepared for a
      failover, so the 'l->drop_point' (and the other data...) is not set
      correctly. When a TUNNEL_MSG from peer node arrives on the link,
      depending on the inner message's seqno and the current 'l->drop_point'
      value, the message can be dropped (- treated as a duplicate message) or
      processed.
      At this early stage, the traffic messages from peer are likely to be
      NAME_DISTRIBUTORs, this means some name table entries will be missed on
      the node forever!
      
      The commit resolves the issue by starting the FAILOVER process on this
      node as well. Another benefit from this solution is that we ensure the
      link will not be re-established until the failover ends.
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0b14a08
  7. 28 4月, 2019 2 次提交
    • J
      netlink: make validation more configurable for future strictness · 8cb08174
      Johannes Berg 提交于
      We currently have two levels of strict validation:
      
       1) liberal (default)
           - undefined (type >= max) & NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
           - garbage at end of message accepted
       2) strict (opt-in)
           - NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
      
      Split out parsing strictness into four different options:
       * TRAILING     - check that there's no trailing data after parsing
                        attributes (in message or nested)
       * MAXTYPE      - reject attrs > max known type
       * UNSPEC       - reject attributes with NLA_UNSPEC policy entries
       * STRICT_ATTRS - strictly validate attribute size
      
      The default for future things should be *everything*.
      The current *_strict() is a combination of TRAILING and MAXTYPE,
      and is renamed to _deprecated_strict().
      The current regular parsing has none of this, and is renamed to
      *_parse_deprecated().
      
      Additionally it allows us to selectively set one of the new flags
      even on old policies. Notably, the UNSPEC flag could be useful in
      this case, since it can be arranged (by filling in the policy) to
      not be an incompatible userspace ABI change, but would then going
      forward prevent forgetting attribute entries. Similar can apply
      to the POLICY flag.
      
      We end up with the following renames:
       * nla_parse           -> nla_parse_deprecated
       * nla_parse_strict    -> nla_parse_deprecated_strict
       * nlmsg_parse         -> nlmsg_parse_deprecated
       * nlmsg_parse_strict  -> nlmsg_parse_deprecated_strict
       * nla_parse_nested    -> nla_parse_nested_deprecated
       * nla_validate_nested -> nla_validate_nested_deprecated
      
      Using spatch, of course:
          @@
          expression TB, MAX, HEAD, LEN, POL, EXT;
          @@
          -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
          +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression TB, MAX, NLA, POL, EXT;
          @@
          -nla_parse_nested(TB, MAX, NLA, POL, EXT)
          +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
      
          @@
          expression START, MAX, POL, EXT;
          @@
          -nla_validate_nested(START, MAX, POL, EXT)
          +nla_validate_nested_deprecated(START, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, MAX, POL, EXT;
          @@
          -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
          +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
      
      For this patch, don't actually add the strict, non-renamed versions
      yet so that it breaks compile if I get it wrong.
      
      Also, while at it, make nla_validate and nla_parse go down to a
      common __nla_validate_parse() function to avoid code duplication.
      
      Ultimately, this allows us to have very strict validation for every
      new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
      next patch, while existing things will continue to work as is.
      
      In effect then, this adds fully strict validation for any new command.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8cb08174
    • M
      netlink: make nla_nest_start() add NLA_F_NESTED flag · ae0be8de
      Michal Kubecek 提交于
      Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
      netlink based interfaces (including recently added ones) are still not
      setting it in kernel generated messages. Without the flag, message parsers
      not aware of attribute semantics (e.g. wireshark dissector or libmnl's
      mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
      the structure of their contents.
      
      Unfortunately we cannot just add the flag everywhere as there may be
      userspace applications which check nlattr::nla_type directly rather than
      through a helper masking out the flags. Therefore the patch renames
      nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
      as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
      are rewritten to use nla_nest_start().
      
      Except for changes in include/net/netlink.h, the patch was generated using
      this semantic patch:
      
      @@ expression E1, E2; @@
      -nla_nest_start(E1, E2)
      +nla_nest_start_noflag(E1, E2)
      
      @@ expression E1, E2; @@
      -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
      +nla_nest_start(E1, E2)
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae0be8de
  8. 17 4月, 2019 1 次提交
    • T
      tipc: fix link established but not in session · f7a93780
      Tuong Lien 提交于
      According to the link FSM, when a link endpoint got RESET_MSG (- a
      traditional one without the stopping bit) from its peer, it moves to
      PEER_RESET state and raises a LINK_DOWN event which then resets the
      link itself. Its state will become ESTABLISHING after the reset event
      and the link will be re-established soon after this endpoint starts to
      send ACTIVATE_MSG to the peer.
      
      There is no problem with this mechanism, however the link resetting has
      cleared the link 'in_session' flag (along with the other important link
      data such as: the link 'mtu') that was correctly set up at the 1st step
      (i.e. when this endpoint received the peer RESET_MSG). As a result, the
      link will become ESTABLISHED, but the 'in_session' flag is not set, and
      all STATE_MSG from its peer will be dropped at the link_validate_msg().
      It means the link not synced and will sooner or later face a failure.
      
      Since the link reset action is obviously needed for a new link session
      (this is also true in the other situations), the problem here is that
      the link is re-established a bit too early when the link endpoints are
      not really in-sync yet. The commit forces a resync as already done in
      the previous commit 91986ee1 ("tipc: fix link session and
      re-establish issues") by simply varying the link 'peer_session' value
      at the link_reset().
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f7a93780
  9. 05 4月, 2019 3 次提交
    • T
      tipc: adapt link failover for new Gap-ACK algorithm · 58ee86b8
      Tuong Lien 提交于
      In commit 0ae955e2656d ("tipc: improve TIPC throughput by Gap ACK
      blocks"), we enhance the link transmq by releasing as many packets as
      possible with the multi-ACKs from peer node. This also means the queue
      is now non-linear and the peer link deferdq becomes vital.
      
      Whereas, in the case of link failover, all messages in the link transmq
      need to be transmitted as tunnel messages in such a way that message
      sequentiality and cardinality per sender is preserved. This requires us
      to maintain the link deferdq somehow, so that when the tunnel messages
      arrive, the inner user messages along with the ones in the deferdq will
      be delivered to upper layer correctly.
      
      The commit accomplishes this by defining a new queue in the TIPC link
      structure to hold the old link deferdq when link failover happens and
      process it upon receipt of tunnel messages.
      
      Also, in the case of link syncing, the link deferdq will not be purged
      to avoid unnecessary retransmissions that in the worst case will fail
      because the packets might have been freed on the sending side.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58ee86b8
    • T
      tipc: reduce duplicate packets for unicast traffic · 382f598f
      Tuong Lien 提交于
      For unicast transmission, the current NACK sending althorithm is over-
      active that forces the sending side to retransmit a packet that is not
      really lost but just arrived at the receiving side with some delay, or
      even retransmit same packets that have already been retransmitted
      before. As a result, many duplicates are observed also under normal
      condition, ie. without packet loss.
      
      One example case is: node1 transmits 1 2 3 4 10 5 6 7 8 9, when node2
      receives packet #10, it puts into the deferdq. When the packet #5 comes
      it sends NACK with gap [6 - 9]. However, shortly after that, when
      packet #6 arrives, it pulls out packet #10 from the deferfq, but it is
      still out of order, so it makes another NACK with gap [7 - 9] and so on
      ... Finally, node1 has to retransmit the packets 5 6 7 8 9 a number of
      times, but in fact all the packets are not lost at all, so duplicates!
      
      This commit reduces duplicates by changing the condition to send NACK,
      also restricting the retransmissions on individual packets via a timer
      of about 1ms. However, it also needs to say that too tricky condition
      for NACKs or too long timeout value for retransmissions will result in
      performance reducing! The criterias in this commit are found to be
      effective for both the requirements to reduce duplicates but not affect
      performance.
      
      The tipc_link_rcv() is also improved to only dequeue skb from the link
      deferdq if it is expected (ie. its seqno <= rcv_nxt).
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      382f598f
    • T
      tipc: improve TIPC throughput by Gap ACK blocks · 9195948f
      Tuong Lien 提交于
      During unicast link transmission, it's observed very often that because
      of one or a few lost/dis-ordered packets, the sending side will fastly
      reach the send window limit and must wait for the packets to be arrived
      at the receiving side or in the worst case, a retransmission must be
      done first. The sending side cannot release a lot of subsequent packets
      in its transmq even though all of them might have already been received
      by the receiving side.
      That is, one or two packets dis-ordered/lost and dozens of packets have
      to wait, this obviously reduces the overall throughput!
      
      This commit introduces an algorithm to overcome this by using "Gap ACK
      blocks". Basically, a Gap ACK block will consist of <ack, gap> numbers
      that describes the link deferdq where packets have been got by the
      receiving side but with gaps, for example:
      
            link deferdq: [1 2 3 4      10 11      13 14 15       20]
      --> Gap ACK blocks:       <4, 5>,   <11, 1>,      <15, 4>, <20, 0>
      
      The Gap ACK blocks will be sent to the sending side along with the
      traditional ACK or NACK message. Immediately when receiving the message
      the sending side will now not only release from its transmq the packets
      ack-ed by the ACK but also by the Gap ACK blocks! So, more packets can
      be enqueued and transmitted.
      In addition, the sending side can now do "multi-retransmissions"
      according to the Gaps reported in the Gap ACK blocks.
      
      The new algorithm as verified helps greatly improve the TIPC throughput
      especially under packet loss condition.
      
      So far, a maximum of 32 blocks is quite enough without any "Too few Gap
      ACK blocks" reports with a 5.0% packet loss rate, however this number
      can be increased in the furture if needed.
      
      Also, the patch is backward compatible.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9195948f
  10. 20 3月, 2019 1 次提交
    • H
      tipc: support broadcast/replicast configurable for bc-link · 02ec6caf
      Hoang Le 提交于
      Currently, a multicast stream uses either broadcast or replicast as
      transmission method, based on the ratio between number of actual
      destinations nodes and cluster size.
      
      However, when an L2 interface (e.g., VXLAN) provides pseudo
      broadcast support, this becomes very inefficient, as it blindly
      replicates multicast packets to all cluster/subnet nodes,
      irrespective of whether they host actual target sockets or not.
      
      The TIPC multicast algorithm is able to distinguish real destination
      nodes from other nodes, and hence provides a smarter and more
      efficient method for transferring multicast messages than
      pseudo broadcast can do.
      
      Because of this, we now make it possible for users to force
      the broadcast link to permanently switch to using replicast,
      irrespective of which capabilities the bearer provides,
      or pretend to provide.
      Conversely, we also make it possible to force the broadcast link
      to always use true broadcast. While maybe less useful in
      deployed systems, this may at least be useful for testing the
      broadcast algorithm in small clusters.
      
      We retain the current AUTOSELECT ability, i.e., to let the broadcast link
      automatically select which algorithm to use, and to switch back and forth
      between broadcast and replicast as the ratio between destination
      node number and cluster size changes. This remains the default method.
      
      Furthermore, we make it possible to configure the threshold ratio for
      such switches. The default ratio is now set to 10%, down from 25% in the
      earlier implementation.
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NHoang Le <hoang.h.le@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02ec6caf
  11. 12 2月, 2019 2 次提交
    • T
      tipc: fix link session and re-establish issues · 91986ee1
      Tuong Lien 提交于
      When a link endpoint is re-created (e.g. after a node reboot or
      interface reset), the link session number is varied by random, the peer
      endpoint will be synced with this new session number before the link is
      re-established.
      
      However, there is a shortcoming in this mechanism that can lead to the
      link never re-established or faced with a failure then. It happens when
      the peer endpoint is ready in ESTABLISHING state, the 'peer_session' as
      well as the 'in_session' flag have been set, but suddenly this link
      endpoint leaves. When it comes back with a random session number, there
      are two situations possible:
      
      1/ If the random session number is larger than (or equal to) the
      previous one, the peer endpoint will be updated with this new session
      upon receipt of a RESET_MSG from this endpoint, and the link can be re-
      established as normal. Otherwise, all the RESET_MSGs from this endpoint
      will be rejected by the peer. In turn, when this link endpoint receives
      one ACTIVATE_MSG from the peer, it will move to ESTABLISHED and start
      to send STATE_MSGs, but again these messages will be dropped by the
      peer due to wrong session.
      The peer link endpoint can still become ESTABLISHED after receiving a
      traffic message from this endpoint (e.g. a BCAST_PROTOCOL or
      NAME_DISTRIBUTOR), but since all the STATE_MSGs are invalid, the link
      will be forced down sooner or later!
      
      Even in case the random session number is larger than the previous one,
      it can be that the ACTIVATE_MSG from the peer arrives first, and this
      link endpoint moves quickly to ESTABLISHED without sending out any
      RESET_MSG yet. Consequently, the peer link will not be updated with the
      new session number, and the same link failure scenario as above will
      happen.
      
      2/ Another situation can be that, the peer link endpoint was reset due
      to any reasons in the meantime, its link state was set to RESET from
      ESTABLISHING but still in session, i.e. the 'in_session' flag is not
      reset...
      Now, if the random session number from this endpoint is less than the
      previous one, all the RESET_MSGs from this endpoint will be rejected by
      the peer. In the other direction, when this link endpoint receives a
      RESET_MSG from the peer, it moves to ESTABLISHING and starts to send
      ACTIVATE_MSGs, but all these messages will be rejected by the peer too.
      As a result, the link cannot be re-established but gets stuck with this
      link endpoint in state ESTABLISHING and the peer in RESET!
      
      Solution:
      
      ===========
      
      This link endpoint should not go directly to ESTABLISHED when getting
      ACTIVATE_MSG from the peer which may belong to the old session if the
      link was re-created. To ensure the session to be correct before the
      link is re-established, the peer endpoint in ESTABLISHING state will
      send back the last session number in ACTIVATE_MSG for a verification at
      this endpoint. Then, if needed, a new and more appropriate session
      number will be regenerated to force a re-synch first.
      
      In addition, when a link in ESTABLISHING state is reset, its state will
      move to RESET according to the link FSM, along with resetting the
      'in_session' flag (and the other data) as a normal link reset, it will
      also be deleted if requested.
      
      The solution is backward compatible.
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91986ee1
    • H
      tipc: fix skb may be leaky in tipc_link_input · 7384b538
      Hoang Le 提交于
      When we free skb at tipc_data_input, we return a 'false' boolean.
      Then, skb passed to subcalling tipc_link_input in tipc_link_rcv,
      
      <snip>
      1303 int tipc_link_rcv:
      ...
      1354    if (!tipc_data_input(l, skb, l->inputq))
      1355        rc |= tipc_link_input(l, skb, l->inputq);
      </snip>
      
      Fix it by simple changing to a 'true' boolean when skb is being free-ed.
      Then, tipc_link_rcv will bypassed to subcalling tipc_link_input as above
      condition.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <maloy@donjonn.com>
      Signed-off-by: NHoang Le <hoang.h.le@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7384b538
  12. 24 1月, 2019 1 次提交
    • G
      tipc: mark expected switch fall-throughs · f79e3365
      Gustavo A. R. Silva 提交于
      In preparation to enabling -Wimplicit-fallthrough, mark switch cases
      where we are expecting to fall through.
      
      This patch fixes the following warnings:
      
      net/tipc/link.c:1125:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      net/tipc/socket.c:736:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      net/tipc/socket.c:2418:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
      
      Warning level 3 was used: -Wimplicit-fallthrough=3
      
      This patch is part of the ongoing efforts to enabling
      -Wimplicit-fallthrough.
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f79e3365
  13. 20 12月, 2018 3 次提交
    • H
      tipc: fix uninitialized value for broadcast retransmission · 05572271
      Hoang Le 提交于
      When sending broadcast message on high load system, there are a lot of
      unnecessary packets restranmission. That issue was caused by missing in
      initial criteria for retransmission.
      
      To prevent this happen, just initialize this criteria for retransmission
      in next 10 milliseconds.
      
      Fixes: 31c4f4cc ("tipc: improve broadcast retransmission algorithm")
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NHoang Le <hoang.h.le@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05572271
    • T
      tipc: add trace_events for tipc link · 26574db0
      Tuong Lien 提交于
      The commit adds the new trace_events for TIPC link object:
      
      trace_tipc_link_timeout()
      trace_tipc_link_fsm()
      trace_tipc_link_reset()
      trace_tipc_link_too_silent()
      trace_tipc_link_retrans()
      trace_tipc_link_bc_ack()
      trace_tipc_link_conges()
      
      And the traces for PROTOCOL messages at building and receiving:
      
      trace_tipc_proto_build()
      trace_tipc_proto_rcv()
      
      Note:
      a) The 'tipc_link_too_silent' event will only happen when the
      'silent_intv_cnt' is about to reach the 'abort_limit' value (and the
      event is enabled). The benefit for this kind of event is that we can
      get an early indication about TIPC link loss issue due to timeout, then
      can do some necessary actions for troubleshooting.
      
      For example: To trigger the 'tipc_proto_rcv' when the 'too_silent'
      event occurs:
      
      echo 'enable_event:tipc:tipc_proto_rcv' > \
            events/tipc/tipc_link_too_silent/trigger
      
      And disable it when TIPC link is reset:
      
      echo 'disable_event:tipc:tipc_proto_rcv' > \
            events/tipc/tipc_link_reset/trigger
      
      b) The 'tipc_link_retrans' or 'tipc_link_bc_ack' event is useful to
      trace TIPC retransmission issues.
      
      In addition, the commit adds the 'trace_tipc_list/link_dump()' at the
      'retransmission failure' case. Then, if the issue occurs, the link
      'transmq' along with the link data can be dumped for post-analysis.
      These dump events should be enabled by default since it will only take
      effect when the failure happens.
      
      The same approach is also applied for the faulty case that the
      validation of protocol message is failed.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26574db0
    • T
      tipc: enable tracepoints in tipc · b4b9771b
      Tuong Lien 提交于
      As for the sake of debugging/tracing, the commit enables tracepoints in
      TIPC along with some general trace_events as shown below. It also
      defines some 'tipc_*_dump()' functions that allow to dump TIPC object
      data whenever needed, that is, for general debug purposes, ie. not just
      for the trace_events.
      
      The following trace_events are now available:
      
      - trace_tipc_skb_dump(): allows to trace and dump TIPC msg & skb data,
        e.g. message type, user, droppable, skb truesize, cloned skb, etc.
      
      - trace_tipc_list_dump(): allows to trace and dump any TIPC buffers or
        queues, e.g. TIPC link transmq, socket receive queue, etc.
      
      - trace_tipc_sk_dump(): allows to trace and dump TIPC socket data, e.g.
        sk state, sk type, connection type, rmem_alloc, socket queues, etc.
      
      - trace_tipc_link_dump(): allows to trace and dump TIPC link data, e.g.
        link state, silent_intv_cnt, gap, bc_gap, link queues, etc.
      
      - trace_tipc_node_dump(): allows to trace and dump TIPC node data, e.g.
        node state, active links, capabilities, link entries, etc.
      
      How to use:
      Put the trace functions at any places where we want to dump TIPC data
      or events.
      
      Note:
      a) The dump functions will generate raw data only, that is, to offload
      the trace event's processing, it can require a tool or script to parse
      the data but this should be simple.
      
      b) The trace_tipc_*_dump() should be reserved for a failure cases only
      (e.g. the retransmission failure case) or where we do not expect to
      happen too often, then we can consider enabling these events by default
      since they will almost not take any effects under normal conditions,
      but once the rare condition or failure occurs, we get the dumped data
      fully for post-analysis.
      
      For other trace purposes, we can reuse these trace classes as template
      but different events.
      
      c) A trace_event is only effective when we enable it. To enable the
      TIPC trace_events, echo 1 to 'enable' files in the events/tipc/
      directory in the 'debugfs' file system. Normally, they are located at:
      
      /sys/kernel/debug/tracing/events/tipc/
      
      For example:
      
      To enable the tipc_link_dump event:
      
      echo 1 > /sys/kernel/debug/tracing/events/tipc/tipc_link_dump/enable
      
      To enable all the TIPC trace_events:
      
      echo 1 > /sys/kernel/debug/tracing/events/tipc/enable
      
      To collect the trace data:
      
      cat trace
      
      or
      
      cat trace_pipe > /trace.out &
      
      To disable all the TIPC trace_events:
      
      echo 0 > /sys/kernel/debug/tracing/events/tipc/enable
      
      To clear the trace buffer:
      
      echo > trace
      
      d) Like the other trace_events, the feature like 'filter' or 'trigger'
      is also usable for the tipc trace_events.
      For more details, have a look at:
      
      Documentation/trace/ftrace.txt
      
      MAINTAINERS | add two new files 'trace.h' & 'trace.c' in tipc
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4b9771b
  14. 12 11月, 2018 2 次提交
    • J
      tipc: fix link re-establish failure · 7ab412d3
      Jon Maloy 提交于
      When a link failure is detected locally, the link is reset, the flag
      link->in_session is set to false, and a RESET_MSG with the 'stopping'
      bit set is sent to the peer.
      
      The purpose of this bit is to inform the peer that this endpoint just
      is going down, and that the peer should handle the reception of this
      particular RESET message as a local failure. This forces the peer to
      accept another RESET or ACTIVATE message from this endpoint before it
      can re-establish the link. This again is necessary to ensure that
      link session numbers are properly exchanged before the link comes up
      again.
      
      If a failure is detected locally at the same time at the peer endpoint
      this will do the same, which is also a correct behavior.
      
      However, when receiving such messages, the endpoints will not
      distinguish between 'stopping' RESETs and ordinary ones when it comes
      to updating session numbers. Both endpoints will copy the received
      session number and set their 'in_session' flags to true at the
      reception, while they are still expecting another RESET from the
      peer before they can go ahead and re-establish. This is contradictory,
      since, after applying the validation check referred to below, the
      'in_session' flag will cause rejection of all such messages, and the
      link will never come up again.
      
      We now fix this by not only handling received RESET/STOPPING messages
      as a local failure, but also by omitting to set a new session number
      and the 'in_session' flag in such cases.
      
      Fixes: 7ea817f4 ("tipc: check session number before accepting link protocol messages")
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ab412d3
    • L
      tipc: improve broadcast retransmission algorithm · 31c4f4cc
      LUU Duc Canh 提交于
      Currently, the broadcast retransmission algorithm is using the
      'prev_retr' field in struct tipc_link to time stamp the latest broadcast
      retransmission occasion. This helps to restrict retransmission of
      individual broadcast packets to max once per 10 milliseconds, even
      though all other criteria for retransmission are met.
      
      We now move this time stamp to the control block of each individual
      packet, and remove other limiting criteria. This simplifies the
      retransmission algorithm, and eliminates any risk of logical errors
      in selecting which packets can be retransmitted.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NLUU Duc Canh <canh.d.luu@dektech.com.au>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31c4f4cc
  15. 16 10月, 2018 1 次提交
    • J
      tipc: initialize broadcast link stale counter correctly · 4af00f4c
      Jon Maloy 提交于
      In the commit referred to below we added link tolerance as an additional
      criteria for declaring broadcast transmission "stale" and resetting the
      unicast links to the affected node.
      
      Unfortunately, this 'improvement' introduced two bugs, which each and
      one alone cause only limited problems, but combined lead to seemingly
      stochastic unicast link resets, depending on the amount of broadcast
      traffic transmitted.
      
      The first issue, a missing initialization of the 'tolerance' field of
      the receiver broadcast link, was recently fixed by commit 047491ea
      ("tipc: set link tolerance correctly in broadcast link").
      
      Ths second issue, where we omit to reset the 'stale_cnt' field of
      the same link after a 'stale' period is over, leads to this counter
      accumulating over time, and in the absence of the 'tolerance' criteria
      leads to the above described symptoms. This commit adds the missing
      initialization.
      
      Fixes: a4dc70d4 ("tipc: extend link reset criteria for stale packet retransmission")
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4af00f4c
  16. 12 10月, 2018 1 次提交
    • Y
      tipc: eliminate possible recursive locking detected by LOCKDEP · a1f8dd34
      Ying Xue 提交于
      When booting kernel with LOCKDEP option, below warning info was found:
      
      WARNING: possible recursive locking detected
      4.19.0-rc7+ #14 Not tainted
      --------------------------------------------
      swapper/0/1 is trying to acquire lock:
      00000000dcfc0fc8 (&(&list->lock)->rlock#4){+...}, at: spin_lock_bh
      include/linux/spinlock.h:334 [inline]
      00000000dcfc0fc8 (&(&list->lock)->rlock#4){+...}, at:
      tipc_link_reset+0x125/0xdf0 net/tipc/link.c:850
      
      but task is already holding lock:
      00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at: spin_lock_bh
      include/linux/spinlock.h:334 [inline]
      00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at:
      tipc_link_reset+0xfa/0xdf0 net/tipc/link.c:849
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&(&list->lock)->rlock#4);
        lock(&(&list->lock)->rlock#4);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      2 locks held by swapper/0/1:
       #0: 00000000f7539d34 (pernet_ops_rwsem){+.+.}, at:
      register_pernet_subsys+0x19/0x40 net/core/net_namespace.c:1051
       #1: 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at:
      spin_lock_bh include/linux/spinlock.h:334 [inline]
       #1: 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at:
      tipc_link_reset+0xfa/0xdf0 net/tipc/link.c:849
      
      stack backtrace:
      CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc7+ #14
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1af/0x295 lib/dump_stack.c:113
       print_deadlock_bug kernel/locking/lockdep.c:1759 [inline]
       check_deadlock kernel/locking/lockdep.c:1803 [inline]
       validate_chain kernel/locking/lockdep.c:2399 [inline]
       __lock_acquire+0xf1e/0x3c60 kernel/locking/lockdep.c:3411
       lock_acquire+0x1db/0x520 kernel/locking/lockdep.c:3900
       __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
       _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168
       spin_lock_bh include/linux/spinlock.h:334 [inline]
       tipc_link_reset+0x125/0xdf0 net/tipc/link.c:850
       tipc_link_bc_create+0xb5/0x1f0 net/tipc/link.c:526
       tipc_bcast_init+0x59b/0xab0 net/tipc/bcast.c:521
       tipc_init_net+0x472/0x610 net/tipc/core.c:82
       ops_init+0xf7/0x520 net/core/net_namespace.c:129
       __register_pernet_operations net/core/net_namespace.c:940 [inline]
       register_pernet_operations+0x453/0xac0 net/core/net_namespace.c:1011
       register_pernet_subsys+0x28/0x40 net/core/net_namespace.c:1052
       tipc_init+0x83/0x104 net/tipc/core.c:140
       do_one_initcall+0x109/0x70a init/main.c:885
       do_initcall_level init/main.c:953 [inline]
       do_initcalls init/main.c:961 [inline]
       do_basic_setup init/main.c:979 [inline]
       kernel_init_freeable+0x4bd/0x57f init/main.c:1144
       kernel_init+0x13/0x180 init/main.c:1063
       ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413
      
      The reason why the noise above was complained by LOCKDEP is because we
      nested to hold l->wakeupq.lock and l->inputq->lock in tipc_link_reset
      function. In fact it's unnecessary to move skb buffer from l->wakeupq
      queue to l->inputq queue while holding the two locks at the same time.
      Instead, we can move skb buffers in l->wakeupq queue to a temporary
      list first and then move the buffers of the temporary list to l->inputq
      queue, which is also safe for us.
      
      Fixes: 3f32d0be ("tipc: lock wakeup & inputq at tipc_link_reset()")
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1f8dd34
  17. 11 10月, 2018 1 次提交
    • J
      tipc: set link tolerance correctly in broadcast link · 047491ea
      Jon Maloy 提交于
      In the patch referred to below we added link tolerance as an additional
      criteria for declaring broadcast transmission "stale" and resetting the
      affected links.
      
      However, the 'tolerance' field of the broadcast link is never set, and
      remains at zero. This renders the whole commit without the intended
      improving effect, but luckily also with no negative effect.
      
      In this commit we add the missing initialization.
      
      Fixes: a4dc70d4 ("tipc: extend link reset criteria for stale packet retransmission")
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      047491ea
  18. 02 10月, 2018 1 次提交
    • L
      tipc: ignore STATE_MSG on wrong link session · d949cfed
      LUU Duc Canh 提交于
      The initial session number when a link is created is based on a random
      value, taken from struct tipc_net->random. It is then incremented for
      each link reset to avoid mixing protocol messages from different link
      sessions.
      
      However, when a bearer is reset all its links are deleted, and will
      later be re-created using the same random value as the first time.
      This means that if the link never went down between creation and
      deletion we will still sometimes have two subsequent sessions with
      the same session number. In virtual environments with potentially
      long transmission times this has turned out to be a real problem.
      
      We now fix this by randomizing the session number each time a link
      is created.
      
      With a session number size of 16 bits this gives a risk of session
      collision of 1/64k. To reduce this further, we also introduce a sanity
      check on the very first STATE message arriving at a link. If this has
      an acknowledge value differing from 0, which is logically impossible,
      we ignore the message. The final risk for session collision is hence
      reduced to 1/4G, which should be sufficient.
      Signed-off-by: NLUU Duc Canh <canh.d.luu@dektech.com.au>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d949cfed
  19. 30 9月, 2018 1 次提交
    • L
      tipc: fix failover problem · c140eb16
      LUU Duc Canh 提交于
      We see the following scenario:
      1) Link endpoint B on node 1 discovers that its peer endpoint is gone.
         Since there is a second working link, failover procedure is started.
      2) Link endpoint A on node 1 sends a FAILOVER message to peer endpoint
         A on node 2. The node item 1->2 goes to state FAILINGOVER.
      3) Linke endpoint A/2 receives the failover, and is supposed to take
         down its parallell link endpoint B/2, while producing a FAILOVER
         message to send back to A/1.
      4) However, B/2 has already been deleted, so no FAILOVER message can
         created.
      5) Node 1->2 remains in state FAILINGOVER forever, refusing to receive
         any messages that can bring B/1 up again. We are left with a non-
         redundant link between node 1 and 2.
      
      We fix this with letting endpoint A/2 build a dummy FAILOVER message
      to send to back to A/1, so that the situation can be resolved.
      Signed-off-by: NLUU Duc Canh <canh.d.luu@dektech.com.au>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c140eb16
  20. 26 9月, 2018 1 次提交
  21. 22 7月, 2018 1 次提交
    • Y
      tipc: make some functions static · e064cce1
      YueHaibing 提交于
      Fixes the following sparse warnings:
      
      net/tipc/link.c:376:5: warning: symbol 'link_bc_rcv_gap' was not declared. Should it be static?
      net/tipc/link.c:823:6: warning: symbol 'link_prepare_wakeup' was not declared. Should it be static?
      net/tipc/link.c:959:6: warning: symbol 'tipc_link_advance_backlog' was not declared. Should it be static?
      net/tipc/link.c:1009:5: warning: symbol 'tipc_link_retrans' was not declared. Should it be static?
      net/tipc/monitor.c:687:5: warning: symbol '__tipc_nl_add_monitor_peer' was not declared. Should it be static?
      net/tipc/group.c:230:20: warning: symbol 'tipc_group_find_member' was not declared. Should it be static?
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e064cce1
  22. 19 7月, 2018 1 次提交
  23. 12 7月, 2018 2 次提交
    • J
      tipc: check session number before accepting link protocol messages · 7ea817f4
      Jon Maloy 提交于
      In some virtual environments we observe a significant higher number of
      packet reordering and delays than we have been used to traditionally.
      
      This makes it necessary with stricter checks on incoming link protocol
      messages' session number, which until now only has been validated for
      RESET messages.
      
      Since the other two message types, ACTIVATE and STATE messages also
      carry this number, it is easy to extend the validation check to those
      messages.
      
      We also introduce a flag indicating if a link has a valid peer session
      number or not. This eliminates the mixing of 32- and 16-bit arithmethics
      we are currently using to achieve this.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ea817f4
    • J
      tipc: add sequence number check for link STATE messages · 9012de50
      Jon Maloy 提交于
      Some switch infrastructures produce huge amounts of packet duplicates.
      This becomes a problem if those messages are STATE/NACK protocol
      messages, causing unnecessary retransmissions of already accepted
      packets.
      
      We now introduce a unique sequence number per STATE protocol message
      so that duplicates can be identified and ignored. This will also be
      useful when tracing such cases, and to avert replay attacks when TIPC
      is encrypted.
      
      For compatibility reasons we have to introduce a new capability flag
      TIPC_LINK_PROTO_SEQNO to handle this new feature.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9012de50
  24. 07 7月, 2018 1 次提交
    • J
      tipc: extend link reset criteria for stale packet retransmission · a4dc70d4
      Jon Maloy 提交于
      Currently a link is declared stale and reset if there has been 100
      repeated attempts to retransmit the same packet. However, in certain
      infrastructures we see that packet (NACK) duplicates and delays may
      cause such retransmit attempts to occur at a high rate, so that the
      peer doesn't have a reasonable chance to acknowledge the reception
      before the 100-limit is hit. This may take much less than the
      stipulated link tolerance time, and despite that probe/probe replies
      otherwise go through as normal.
      
      We now extend the criteria for link reset to also being time based.
      I.e., we don't reset the link until the link tolerance time is passed
      AND we have made 100 retransmissions attempts.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4dc70d4
  25. 05 7月, 2018 1 次提交
  26. 01 4月, 2018 2 次提交
    • J
      tipc: avoid possible string overflow · 7494cfa6
      Jon Maloy 提交于
      gcc points out that the combined length of the fixed-length inputs to
      l->name is larger than the destination buffer size:
      
      net/tipc/link.c: In function 'tipc_link_create':
      net/tipc/link.c:465:26: error: '%s' directive writing up to 32 bytes
      into a region of size between 26 and 58 [-Werror=format-overflow=]
      sprintf(l->name, "%s:%s-%s:unknown", self_str, if_name, peer_str);
      
      net/tipc/link.c:465:2: note: 'sprintf' output 11 or more bytes
      (assuming 75) into a destination of size 60
      sprintf(l->name, "%s:%s-%s:unknown", self_str, if_name, peer_str);
      
      A detailed analysis reveals that the theoretical maximum length of
      a link name is:
      max self_str + 1 + max if_name + 1 + max peer_str + 1 + max if_name =
      16 + 1 + 15 + 1 + 16 + 1 + 15 = 65
      Since we also need space for a trailing zero we now set MAX_LINK_NAME
      to 68.
      
      Just to be on the safe side we also replace the sprintf() call with
      snprintf().
      
      Fixes: 25b0b9c4 ("tipc: handle collisions of 32-bit node address
      hash values")
      Reported-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7494cfa6
    • J
      tipc: replace name table service range array with rb tree · 218527fe
      Jon Maloy 提交于
      The current design of the binding table has an unnecessary memory
      consuming and complex data structure. It aggregates the service range
      items into an array, which is expanded by a factor two every time it
      becomes too small to hold a new item. Furthermore, the arrays never
      shrink when the number of ranges diminishes.
      
      We now replace this array with an RB tree that is holding the range
      items as tree nodes, each range directly holding a list of bindings.
      
      This, along with a few name changes, improves both readability and
      volume of the code, as well as reducing memory consumption and hopefully
      improving cache hit rate.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      218527fe
  27. 24 3月, 2018 2 次提交
    • J
      tipc: handle collisions of 32-bit node address hash values · 25b0b9c4
      Jon Maloy 提交于
      When a 32-bit node address is generated from a 128-bit identifier,
      there is a risk of collisions which must be discovered and handled.
      
      We do this as follows:
      - We don't apply the generated address immediately to the node, but do
        instead initiate a 1 sec trial period to allow other cluster members
        to discover and handle such collisions.
      
      - During the trial period the node periodically sends out a new type
        of message, DSC_TRIAL_MSG, using broadcast or emulated broadcast,
        to all the other nodes in the cluster.
      
      - When a node is receiving such a message, it must check that the
        presented 32-bit identifier either is unused, or was used by the very
        same peer in a previous session. In both cases it accepts the request
        by not responding to it.
      
      - If it finds that the same node has been up before using a different
        address, it responds with a DSC_TRIAL_FAIL_MSG containing that
        address.
      
      - If it finds that the address has already been taken by some other
        node, it generates a new, unused address and returns it to the
        requester.
      
      - During the trial period the requesting node must always be prepared
        to accept a failure message, i.e., a message where a peer suggests a
        different (or equal)  address to the one tried. In those cases it
        must apply the suggested value as trial address and restart the trial
        period.
      
      This algorithm ensures that in the vast majority of cases a node will
      have the same address before and after a reboot. If a legacy user
      configures the address explicitly, there will be no trial period and
      messages, so this protocol addition is completely backwards compatible.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      25b0b9c4
    • J
      tipc: add 128-bit node identifier · d50ccc2d
      Jon Maloy 提交于
      We add a 128-bit node identity, as an alternative to the currently used
      32-bit node address.
      
      For the sake of compatibility and to minimize message header changes
      we retain the existing 32-bit address field. When not set explicitly by
      the user, this field will be filled with a hash value generated from the
      much longer node identity, and be used as a shorthand value for the
      latter.
      
      We permit either the address or the identity to be set by configuration,
      but not both, so when the address value is set by a legacy user the
      corresponding 128-bit node identity is generated based on the that value.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d50ccc2d