1. 27 5月, 2020 3 次提交
    • T
      tipc: add support for broadcast rcv stats dumping · 03b6fefd
      Tuong Lien 提交于
      This commit enables dumping the statistics of a broadcast-receiver link
      like the traditional 'broadcast-link' one (which is for broadcast-
      sender). The link dumping can be triggered via netlink (e.g. the
      iproute2/tipc tool) by the link flag - 'TIPC_NLA_LINK_BROADCAST' as the
      indicator.
      
      The name of a broadcast-receiver link of a specific peer will be in the
      format: 'broadcast-link:<peer-id>'.
      
      For example:
      
      Link <broadcast-link:1001002>
        Window:50 packets
        RX packets:7841 fragments:2408/440 bundles:0/0
        TX packets:0 fragments:0/0 bundles:0/0
        RX naks:0 defs:124 dups:0
        TX naks:21 acks:0 retrans:0
        Congestion link:0  Send queue max:0 avg:0
      
      In addition, the broadcast-receiver link statistics can be reset in the
      usual way via netlink by specifying that link name in command.
      
      Note: the 'tipc_link_name_ext()' is removed because the link name can
      now be retrieved simply via the 'l->name'.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      03b6fefd
    • T
      tipc: enable broadcast retrans via unicast · a91d55d1
      Tuong Lien 提交于
      In some environment, broadcast traffic is suppressed at high rate (i.e.
      a kind of bandwidth limit setting). When it is applied, TIPC broadcast
      can still run successfully. However, when it comes to a high load, some
      packets will be dropped first and TIPC tries to retransmit them but the
      packet retransmission is intentionally broadcast too, so making things
      worse and not helpful at all.
      
      This commit enables the broadcast retransmission via unicast which only
      retransmits packets to the specific peer that has really reported a gap
      i.e. not broadcasting to all nodes in the cluster, so will prevent from
      being suppressed, and also reduce some overheads on the other peers due
      to duplicates, finally improve the overall TIPC broadcast performance.
      
      Note: the functionality can be turned on/off via the sysctl file:
      
      echo 1 > /proc/sys/net/tipc/bc_retruni
      echo 0 > /proc/sys/net/tipc/bc_retruni
      
      Default is '0', i.e. the broadcast retransmission still works as usual.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a91d55d1
    • T
      tipc: introduce Gap ACK blocks for broadcast link · d7626b5a
      Tuong Lien 提交于
      As achieved through commit 9195948f ("tipc: improve TIPC throughput
      by Gap ACK blocks"), we apply the same mechanism for the broadcast link
      as well. The 'Gap ACK blocks' data field in a 'PROTOCOL/STATE_MSG' will
      consist of two parts built for both the broadcast and unicast types:
      
       31                       16 15                        0
      +-------------+-------------+-------------+-------------+
      |  bgack_cnt  |  ugack_cnt  |            len            |
      +-------------+-------------+-------------+-------------+  -
      |            gap            |            ack            |   |
      +-------------+-------------+-------------+-------------+    > bc gacks
      :                           :                           :   |
      +-------------+-------------+-------------+-------------+  -
      |            gap            |            ack            |   |
      +-------------+-------------+-------------+-------------+    > uc gacks
      :                           :                           :   |
      +-------------+-------------+-------------+-------------+  -
      
      which is "automatically" backward-compatible.
      
      We also increase the max number of Gap ACK blocks to 128, allowing upto
      64 blocks per type (total buffer size = 516 bytes).
      
      Besides, the 'tipc_link_advance_transmq()' function is refactored which
      is applicable for both the unicast and broadcast cases now, so some old
      functions can be removed and the code is optimized.
      
      With the patch, TIPC broadcast is more robust regardless of packet loss
      or disorder, latency, ... in the underlying network. Its performance is
      boost up significantly.
      For example, experiment with a 5% packet loss rate results:
      
      $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000
      real    0m 42.46s
      user    0m 1.16s
      sys     0m 17.67s
      
      Without the patch:
      
      $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000
      real    8m 27.94s
      user    0m 0.55s
      sys     0m 2.38s
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7626b5a
  2. 11 12月, 2019 1 次提交
    • J
      tipc: introduce variable window congestion control · 16ad3f40
      Jon Maloy 提交于
      We introduce a simple variable window congestion control for links.
      The algorithm is inspired by the Reno algorithm, covering both 'slow
      start', 'congestion avoidance', and 'fast recovery' modes.
      
      - We introduce hard lower and upper window limits per link, still
        different and configurable per bearer type.
      
      - We introduce a 'slow start theshold' variable, initially set to
        the maximum window size.
      
      - We let a link start at the minimum congestion window, i.e. in slow
        start mode, and then let is grow rapidly (+1 per rceived ACK) until
        it reaches the slow start threshold and enters congestion avoidance
        mode.
      
      - In congestion avoidance mode we increment the congestion window for
        each window-size number of acked packets, up to a possible maximum
        equal to the configured maximum window.
      
      - For each non-duplicate NACK received, we drop back to fast recovery
        mode, by setting the both the slow start threshold to and the
        congestion window to (current_congestion_window / 2).
      
      - If the timeout handler finds that the transmit queue has not moved
        since the previous timeout, it drops the link back to slow start
        and forces a probe containing the last sent sequence number to the
        sent to the peer, so that this can discover the stale situation.
      
      This change does in reality have effect only on unicast ethernet
      transport, as we have seen that there is no room whatsoever for
      increasing the window max size for the UDP bearer.
      For now, we also choose to keep the limits for the broadcast link
      unchanged and equal.
      
      This algorithm seems to give a 50-100% throughput improvement for
      messages larger than MTU.
      Suggested-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16ad3f40
  3. 09 11月, 2019 1 次提交
    • T
      tipc: introduce TIPC encryption & authentication · fc1b6d6d
      Tuong Lien 提交于
      This commit offers an option to encrypt and authenticate all messaging,
      including the neighbor discovery messages. The currently most advanced
      algorithm supported is the AEAD AES-GCM (like IPSec or TLS). All
      encryption/decryption is done at the bearer layer, just before leaving
      or after entering TIPC.
      
      Supported features:
      - Encryption & authentication of all TIPC messages (header + data);
      - Two symmetric-key modes: Cluster and Per-node;
      - Automatic key switching;
      - Key-expired revoking (sequence number wrapped);
      - Lock-free encryption/decryption (RCU);
      - Asynchronous crypto, Intel AES-NI supported;
      - Multiple cipher transforms;
      - Logs & statistics;
      
      Two key modes:
      - Cluster key mode: One single key is used for both TX & RX in all
      nodes in the cluster.
      - Per-node key mode: Each nodes in the cluster has one specific TX key.
      For RX, a node requires its peers' TX key to be able to decrypt the
      messages from those peers.
      
      Key setting from user-space is performed via netlink by a user program
      (e.g. the iproute2 'tipc' tool).
      
      Internal key state machine:
      
                                       Attach    Align(RX)
                                           +-+   +-+
                                           | V   | V
              +---------+      Attach     +---------+
              |  IDLE   |---------------->| PENDING |(user = 0)
              +---------+                 +---------+
                 A   A                   Switch|  A
                 |   |                         |  |
                 |   | Free(switch/revoked)    |  |
           (Free)|   +----------------------+  |  |Timeout
                 |              (TX)        |  |  |(RX)
                 |                          |  |  |
                 |                          |  v  |
              +---------+      Switch     +---------+
              | PASSIVE |<----------------| ACTIVE  |
              +---------+       (RX)      +---------+
              (user = 1)                  (user >= 1)
      
      The number of TFMs is 10 by default and can be changed via the procfs
      'net/tipc/max_tfms'. At this moment, as for simplicity, this file is
      also used to print the crypto statistics at runtime:
      
      echo 0xfff1 > /proc/sys/net/tipc/max_tfms
      
      The patch defines a new TIPC version (v7) for the encryption message (-
      backward compatibility as well). The message is basically encapsulated
      as follows:
      
         +----------------------------------------------------------+
         | TIPCv7 encryption  | Original TIPCv2    | Authentication |
         | header             | packet (encrypted) | Tag            |
         +----------------------------------------------------------+
      
      The throughput is about ~40% for small messages (compared with non-
      encryption) and ~9% for large messages. With the support from hardware
      crypto i.e. the Intel AES-NI CPU instructions, the throughput increases
      upto ~85% for small messages and ~55% for large messages.
      
      By default, the new feature is inactive (i.e. no encryption) until user
      sets a key for TIPC. There is however also a new option - "TIPC_CRYPTO"
      in the kernel configuration to enable/disable the new code when needed.
      
      MAINTAINERS | add two new files 'crypto.h' & 'crypto.c' in tipc
      Acked-by: NYing Xue <ying.xue@windreiver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc1b6d6d
  4. 04 5月, 2019 1 次提交
    • T
      tipc: fix missing Name entries due to half-failover · c0b14a08
      Tuong Lien 提交于
      TIPC link can temporarily fall into "half-establish" that only one of
      the link endpoints is ESTABLISHED and starts to send traffic, PROTOCOL
      messages, whereas the other link endpoint is not up (e.g. immediately
      when the endpoint receives ACTIVATE_MSG, the network interface goes
      down...).
      
      This is a normal situation and will be settled because the link
      endpoint will be eventually brought down after the link tolerance time.
      
      However, the situation will become worse when the second link is
      established before the first link endpoint goes down,
      For example:
      
         1. Both links <1A-2A>, <1B-2B> down
         2. Link endpoint 2A up, but 1A still down (e.g. due to network
            disturbance, wrong session, etc.)
         3. Link <1B-2B> up
         4. Link endpoint 2A down (e.g. due to link tolerance timeout)
         5. Node B starts failover onto link <1B-2B>
      
         ==> Node A does never start link failover.
      
      When the "half-failover" situation happens, two consequences have been
      observed:
      
      a) Peer link/node gets stuck in FAILINGOVER state;
      b) Traffic or user messages that peer node is trying to failover onto
      the second link can be partially or completely dropped by this node.
      
      The consequence a) was actually solved by commit c140eb16 ("tipc:
      fix failover problem"), but that commit didn't cover the b). It's due
      to the fact that the tunnel link endpoint has never been prepared for a
      failover, so the 'l->drop_point' (and the other data...) is not set
      correctly. When a TUNNEL_MSG from peer node arrives on the link,
      depending on the inner message's seqno and the current 'l->drop_point'
      value, the message can be dropped (- treated as a duplicate message) or
      processed.
      At this early stage, the traffic messages from peer are likely to be
      NAME_DISTRIBUTORs, this means some name table entries will be missed on
      the node forever!
      
      The commit resolves the issue by starting the FAILOVER process on this
      node as well. Another benefit from this solution is that we ensure the
      link will not be re-established until the failover ends.
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0b14a08
  5. 20 12月, 2018 2 次提交
    • T
      tipc: add trace_events for tipc link · 26574db0
      Tuong Lien 提交于
      The commit adds the new trace_events for TIPC link object:
      
      trace_tipc_link_timeout()
      trace_tipc_link_fsm()
      trace_tipc_link_reset()
      trace_tipc_link_too_silent()
      trace_tipc_link_retrans()
      trace_tipc_link_bc_ack()
      trace_tipc_link_conges()
      
      And the traces for PROTOCOL messages at building and receiving:
      
      trace_tipc_proto_build()
      trace_tipc_proto_rcv()
      
      Note:
      a) The 'tipc_link_too_silent' event will only happen when the
      'silent_intv_cnt' is about to reach the 'abort_limit' value (and the
      event is enabled). The benefit for this kind of event is that we can
      get an early indication about TIPC link loss issue due to timeout, then
      can do some necessary actions for troubleshooting.
      
      For example: To trigger the 'tipc_proto_rcv' when the 'too_silent'
      event occurs:
      
      echo 'enable_event:tipc:tipc_proto_rcv' > \
            events/tipc/tipc_link_too_silent/trigger
      
      And disable it when TIPC link is reset:
      
      echo 'disable_event:tipc:tipc_proto_rcv' > \
            events/tipc/tipc_link_reset/trigger
      
      b) The 'tipc_link_retrans' or 'tipc_link_bc_ack' event is useful to
      trace TIPC retransmission issues.
      
      In addition, the commit adds the 'trace_tipc_list/link_dump()' at the
      'retransmission failure' case. Then, if the issue occurs, the link
      'transmq' along with the link data can be dumped for post-analysis.
      These dump events should be enabled by default since it will only take
      effect when the failure happens.
      
      The same approach is also applied for the faulty case that the
      validation of protocol message is failed.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26574db0
    • T
      tipc: enable tracepoints in tipc · b4b9771b
      Tuong Lien 提交于
      As for the sake of debugging/tracing, the commit enables tracepoints in
      TIPC along with some general trace_events as shown below. It also
      defines some 'tipc_*_dump()' functions that allow to dump TIPC object
      data whenever needed, that is, for general debug purposes, ie. not just
      for the trace_events.
      
      The following trace_events are now available:
      
      - trace_tipc_skb_dump(): allows to trace and dump TIPC msg & skb data,
        e.g. message type, user, droppable, skb truesize, cloned skb, etc.
      
      - trace_tipc_list_dump(): allows to trace and dump any TIPC buffers or
        queues, e.g. TIPC link transmq, socket receive queue, etc.
      
      - trace_tipc_sk_dump(): allows to trace and dump TIPC socket data, e.g.
        sk state, sk type, connection type, rmem_alloc, socket queues, etc.
      
      - trace_tipc_link_dump(): allows to trace and dump TIPC link data, e.g.
        link state, silent_intv_cnt, gap, bc_gap, link queues, etc.
      
      - trace_tipc_node_dump(): allows to trace and dump TIPC node data, e.g.
        node state, active links, capabilities, link entries, etc.
      
      How to use:
      Put the trace functions at any places where we want to dump TIPC data
      or events.
      
      Note:
      a) The dump functions will generate raw data only, that is, to offload
      the trace event's processing, it can require a tool or script to parse
      the data but this should be simple.
      
      b) The trace_tipc_*_dump() should be reserved for a failure cases only
      (e.g. the retransmission failure case) or where we do not expect to
      happen too often, then we can consider enabling these events by default
      since they will almost not take any effects under normal conditions,
      but once the rare condition or failure occurs, we get the dumped data
      fully for post-analysis.
      
      For other trace purposes, we can reuse these trace classes as template
      but different events.
      
      c) A trace_event is only effective when we enable it. To enable the
      TIPC trace_events, echo 1 to 'enable' files in the events/tipc/
      directory in the 'debugfs' file system. Normally, they are located at:
      
      /sys/kernel/debug/tracing/events/tipc/
      
      For example:
      
      To enable the tipc_link_dump event:
      
      echo 1 > /sys/kernel/debug/tracing/events/tipc/tipc_link_dump/enable
      
      To enable all the TIPC trace_events:
      
      echo 1 > /sys/kernel/debug/tracing/events/tipc/enable
      
      To collect the trace data:
      
      cat trace
      
      or
      
      cat trace_pipe > /trace.out &
      
      To disable all the TIPC trace_events:
      
      echo 0 > /sys/kernel/debug/tracing/events/tipc/enable
      
      To clear the trace buffer:
      
      echo > trace
      
      d) Like the other trace_events, the feature like 'filter' or 'trigger'
      is also usable for the tipc trace_events.
      For more details, have a look at:
      
      Documentation/trace/ftrace.txt
      
      MAINTAINERS | add two new files 'trace.h' & 'trace.c' in tipc
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4b9771b
  6. 30 9月, 2018 1 次提交
    • L
      tipc: fix failover problem · c140eb16
      LUU Duc Canh 提交于
      We see the following scenario:
      1) Link endpoint B on node 1 discovers that its peer endpoint is gone.
         Since there is a second working link, failover procedure is started.
      2) Link endpoint A on node 1 sends a FAILOVER message to peer endpoint
         A on node 2. The node item 1->2 goes to state FAILINGOVER.
      3) Linke endpoint A/2 receives the failover, and is supposed to take
         down its parallell link endpoint B/2, while producing a FAILOVER
         message to send back to A/1.
      4) However, B/2 has already been deleted, so no FAILOVER message can
         created.
      5) Node 1->2 remains in state FAILINGOVER forever, refusing to receive
         any messages that can bring B/1 up again. We are left with a non-
         redundant link between node 1 and 2.
      
      We fix this with letting endpoint A/2 build a dummy FAILOVER message
      to send to back to A/1, so that the situation can be resolved.
      Signed-off-by: NLUU Duc Canh <canh.d.luu@dektech.com.au>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c140eb16
  7. 12 7月, 2018 2 次提交
    • J
      tipc: check session number before accepting link protocol messages · 7ea817f4
      Jon Maloy 提交于
      In some virtual environments we observe a significant higher number of
      packet reordering and delays than we have been used to traditionally.
      
      This makes it necessary with stricter checks on incoming link protocol
      messages' session number, which until now only has been validated for
      RESET messages.
      
      Since the other two message types, ACTIVATE and STATE messages also
      carry this number, it is easy to extend the validation check to those
      messages.
      
      We also introduce a flag indicating if a link has a valid peer session
      number or not. This eliminates the mixing of 32- and 16-bit arithmethics
      we are currently using to achieve this.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ea817f4
    • J
      tipc: add sequence number check for link STATE messages · 9012de50
      Jon Maloy 提交于
      Some switch infrastructures produce huge amounts of packet duplicates.
      This becomes a problem if those messages are STATE/NACK protocol
      messages, causing unnecessary retransmissions of already accepted
      packets.
      
      We now introduce a unique sequence number per STATE protocol message
      so that duplicates can be identified and ignored. This will also be
      useful when tracing such cases, and to avert replay attacks when TIPC
      is encrypted.
      
      For compatibility reasons we have to introduce a new capability flag
      TIPC_LINK_PROTO_SEQNO to handle this new feature.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9012de50
  8. 24 3月, 2018 1 次提交
    • J
      tipc: handle collisions of 32-bit node address hash values · 25b0b9c4
      Jon Maloy 提交于
      When a 32-bit node address is generated from a 128-bit identifier,
      there is a risk of collisions which must be discovered and handled.
      
      We do this as follows:
      - We don't apply the generated address immediately to the node, but do
        instead initiate a 1 sec trial period to allow other cluster members
        to discover and handle such collisions.
      
      - During the trial period the node periodically sends out a new type
        of message, DSC_TRIAL_MSG, using broadcast or emulated broadcast,
        to all the other nodes in the cluster.
      
      - When a node is receiving such a message, it must check that the
        presented 32-bit identifier either is unused, or was used by the very
        same peer in a previous session. In both cases it accepts the request
        by not responding to it.
      
      - If it finds that the same node has been up before using a different
        address, it responds with a DSC_TRIAL_FAIL_MSG containing that
        address.
      
      - If it finds that the address has already been taken by some other
        node, it generates a new, unused address and returns it to the
        requester.
      
      - During the trial period the requesting node must always be prepared
        to accept a failure message, i.e., a message where a peer suggests a
        different (or equal)  address to the one tried. In those cases it
        must apply the suggested value as trial address and restart the trial
        period.
      
      This algorithm ensures that in the vast majority of cases a node will
      have the same address before and after a reboot. If a legacy user
      configures the address explicitly, there will be no trial period and
      messages, so this protocol addition is completely backwards compatible.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      25b0b9c4
  9. 03 9月, 2016 1 次提交
    • J
      tipc: transfer broadcast nacks in link state messages · 02d11ca2
      Jon Paul Maloy 提交于
      When we send broadcasts in clusters of more 70-80 nodes, we sometimes
      see the broadcast link resetting because of an excessive number of
      retransmissions. This is caused by a combination of two factors:
      
      1) A 'NACK crunch", where loss of broadcast packets is discovered
         and NACK'ed by several nodes simultaneously, leading to multiple
         redundant broadcast retransmissions.
      
      2) The fact that the NACKS as such also are sent as broadcast, leading
         to excessive load and packet loss on the transmitting switch/bridge.
      
      This commit deals with the latter problem, by moving sending of
      broadcast nacks from the dedicated BCAST_PROTOCOL/NACK message type
      to regular unicast LINK_PROTOCOL/STATE messages. We allocate 10 unused
      bits in word 8 of the said message for this purpose, and introduce a
      new capability bit, TIPC_BCAST_STATE_NACK in order to keep the change
      backwards compatible.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02d11ca2
  10. 16 4月, 2016 1 次提交
    • J
      tipc: let first message on link be a state message · 34b9cd64
      Jon Paul Maloy 提交于
      According to the link FSM, a received traffic packet can take a link
      from state ESTABLISHING to ESTABLISHED, but the link can still not be
      fully set up in one atomic operation. This means that even if the the
      very first packet on the link is a traffic packet with sequence number
      1 (one), it has to be dropped and retransmitted.
      
      This can be avoided if we let the mentioned packet be preceded by a
      LINK_PROTOCOL/STATE message, which takes up the endpoint before the
      arrival of the traffic.
      
      We add this small feature in this commit.
      
      This is a fully compatible change.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34b9cd64
  11. 07 3月, 2016 1 次提交
  12. 06 2月, 2016 1 次提交
    • R
      tipc: fix link attribute propagation bug · d01332f1
      Richard Alpe 提交于
      Changing certain link attributes (link tolerance and link priority)
      from the TIPC management tool is supposed to automatically take
      effect at both endpoints of the affected link.
      
      Currently the media address is not instantiated for the link and is
      used uninstantiated when crafting protocol messages designated for the
      peer endpoint. This means that changing a link property currently
      results in the property being changed on the local machine but the
      protocol message designated for the peer gets lost. Resulting in
      property discrepancy between the endpoints.
      
      In this patch we resolve this by using the media address from the
      link entry and using the bearer transmit function to send it. Hence,
      we can now eliminate the redundant function tipc_link_prot_xmit() and
      the redundant field tipc_link::media_addr.
      
      Fixes: 2af5ae37 (tipc: clean up unused code and structures)
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Reported-by: NJason Hu <huzhijiang@gmail.com>
      Signed-off-by: NRichard Alpe <richard.alpe@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d01332f1
  13. 21 11月, 2015 3 次提交
  14. 24 10月, 2015 9 次提交
  15. 16 10月, 2015 4 次提交
    • J
      tipc: update node FSM when peer RESET message is received · c8199300
      Jon Paul Maloy 提交于
      The change made in the previous commit revealed a small flaw in the way
      the node FSM is updated. When the function tipc_node_link_down() is
      called for the last link to a node, we should check whether this was
      caused by a local reset or by a received RESET message from the peer.
      In the latter case, we can directly issue a PEER_LOST_CONTACT_EVT to
      the node FSM, so that it is ready to re-establish contact. If this is
      not done, the peer node will sometimes have to go through a second
      establish cycle before the link becomes stable.
      
      We fix this in this commit by conditionally issuing the mentioned
      event in the function tipc_node_link_down(). We also move LINK_RESET
      FSM even away from the link_reset() function and into the caller
      function, partially because it is easier to follow the code when state
      changes are gathered at a limited number of locations, partially
      because there will be cases in future commits where we don't want the
      link to go RESET mode when link_reset() is called.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c8199300
    • J
      tipc: send out RESET immediately when link goes down · 282b3a05
      Jon Paul Maloy 提交于
      When a link is taken down because of a node local event, such as
      disabling of a bearer or an interface, we currently leave it to the
      peer node to discover the broken communication. The default time for
      such failure discovery is 1.5-2 seconds.
      
      If we instead allow the terminating link endpoint to send out a RESET
      message at the moment it is reset, we can achieve the impression that
      both endpoints are going down instantly. Since this is a very common
      scenario, we find it worthwhile to make this small modification.
      
      Apart from letting the link produce the said message, we also have to
      ensure that the interface is able to transmit it before TIPC is
      detached. We do this by performing the disabling of a bearer in three
      steps:
      
      1) Disable reception of TIPC packets from the interface in question.
      2) Take down the links, while allowing them so send out a RESET message.
      3) Disable transmission of TIPC packets on the interface.
      
      Apart from this, we now have to react on the NETDEV_GOING_DOWN event,
      instead of as currently the NEDEV_DOWN event, to ensure that such
      transmission is possible during the teardown phase.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      282b3a05
    • J
      tipc: delay ESTABLISH state event when link is established · 73f646ce
      Jon Paul Maloy 提交于
      Link establishing, just like link teardown, is a non-atomic action, in
      the sense that discovering that conditions are right to establish a link,
      and the actual adding of the link to one of the node's send slots is done
      in two different lock contexts. The link FSM is designed to help bridging
      the gap between the two contexts in a safe manner.
      
      We have now discovered a weakness in the implementaton of this FSM.
      Because we directly let the link go from state LINK_ESTABLISHING to
      state LINK_ESTABLISHED already in the first lock context, we are unable
      to distinguish between a fully established link, i.e., a link that has
      been added to its slot, and a link that has not yet reached the second
      lock context. It may hence happen that a manual intervention, e.g., when
      disabling an interface, causes the function tipc_node_link_down() to try
      removing the link from the node slots, decrementing its active link
      counter etc, although the link was never added there in the first place.
      
      We solve this by delaying the actual state change until we reach the
      second lock context, inside the function tipc_node_link_up(). This
      makes it possible for potentail callers of __tipc_node_link_down() to
      know if they should proceed or not, and the problem is solved.
      
      Unforunately, the situation described above also has a second problem.
      Since there by necessity is a tipc_node_link_up() call pending once
      the node lock has been released, we must defuse that call by setting
      the link back from LINK_ESTABLISHING to LINK_RESET state. This forces
      us to make a slight modification to the link FSM, which will now look
      as follows.
      
       +------------------------------------+
       |RESET_EVT                           |
       |                                    |
       |                             +--------------+
       |           +-----------------|   SYNCHING   |-----------------+
       |           |FAILURE_EVT      +--------------+   PEER_RESET_EVT|
       |           |                  A            |                  |
       |           |                  |            |                  |
       |           |                  |            |                  |
       |           |                  |SYNCH_      |SYNCH_            |
       |           |                  |BEGIN_EVT   |END_EVT           |
       |           |                  |            |                  |
       |           V                  |            V                  V
       |    +-------------+          +--------------+          +------------+
       |    |  RESETTING  |<---------|  ESTABLISHED |--------->| PEER_RESET |
       |    +-------------+ FAILURE_ +--------------+ PEER_    +------------+
       |           |        EVT        |    A         RESET_EVT       |
       |           |                   |    |                         |
       |           |  +----------------+    |                         |
       |  RESET_EVT|  |RESET_EVT            |                         |
       |           |  |                     |                         |
       |           |  |                     |ESTABLISH_EVT            |
       |           |  |  +-------------+    |                         |
       |           |  |  | RESET_EVT   |    |                         |
       |           |  |  |             |    |                         |
       |           V  V  V             |    |                         |
       |    +-------------+          +--------------+        RESET_EVT|
       +--->|    RESET    |--------->| ESTABLISHING |<----------------+
            +-------------+ PEER_    +--------------+
             |           A  RESET_EVT       |
             |           |                  |
             |           |                  |
             |FAILOVER_  |FAILOVER_         |FAILOVER_
             |BEGIN_EVT  |END_EVT           |BEGIN_EVT
             |           |                  |
             V           |                  |
            +-------------+                 |
            | FAILINGOVER |<----------------+
            +-------------+
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73f646ce
    • J
      tipc: improve sequence number checking · 81204c49
      Jon Paul Maloy 提交于
      The sequence number of an incoming packet is currently only checked
      for less than, equality to, or bigger than the next expected number,
      meaning that the receive window in practice becomes one half sequence
      number cycle, or U16_MAX/2. This does not make sense, and may not even
      be safe if there are extreme delays in the network. Any packet sent by
      the peer during the ongoing cycle must belong inside his current send
      window, or should otherwise be dropped if possible.
      
      Since a link endpoint cannot know its peer's current send window, it
      has to base this sanity check on a worst-case assumption, i.e., that
      the peer is using a maximum sized window of 8191 packets. Using this
      assumption, we now add a check that the sequence number is not bigger
      than next_expected + TIPC_MAX_LINK_WIN. We also re-order the checks
      done, so that the receive window test is performed before the gap test.
      This way, we are guaranteed that no packet with illegal sequence numbers
      are ever added to the deferred queue.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      81204c49
  16. 31 7月, 2015 6 次提交
    • J
      tipc: clean up link creation · 440d8963
      Jon Paul Maloy 提交于
      We simplify the link creation function tipc_link_create() and the way
      the link struct it is connected to the node struct. In particular, we
      remove the duplicate initialization of some fields which are anyway set
      in tipc_link_reset().
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      440d8963
    • J
      tipc: merge link->exec_mode and link->state into one FSM · 662921cd
      Jon Paul Maloy 提交于
      Until now, we have been handling link failover and synchronization
      by using an additional link state variable, "exec_mode". This variable
      is not independent of the link FSM state, something causing a risk of
      inconsistencies, apart from the fact that it clutters the code.
      
      The conditions are now in place to define a new link FSM that covers
      all existing use cases, including failover and synchronization, and
      eliminate the "exec_mode" field altogether. The FSM must also support
      non-atomic resetting of links, which will be introduced later.
      
      The new link FSM is shown below, with 7 states and 8 events.
      Only events leading to state change are shown as edges.
      
      +------------------------------------+
      |RESET_EVT                           |
      |                                    |
      |                             +--------------+
      |           +-----------------|   SYNCHING   |-----------------+
      |           |FAILURE_EVT      +--------------+   PEER_RESET_EVT|
      |           |                  A            |                  |
      |           |                  |            |                  |
      |           |                  |            |                  |
      |           |                  |SYNCH_      |SYNCH_            |
      |           |                  |BEGIN_EVT   |END_EVT           |
      |           |                  |            |                  |
      |           V                  |            V                  V
      |    +-------------+          +--------------+          +------------+
      |    |  RESETTING  |<---------|  ESTABLISHED |--------->| PEER_RESET |
      |    +-------------+ FAILURE_ +--------------+ PEER_    +------------+
      |           |        EVT        |    A         RESET_EVT       |
      |           |                   |    |                         |
      |           |                   |    |                         |
      |           |    +--------------+    |                         |
      |  RESET_EVT|    |RESET_EVT          |ESTABLISH_EVT            |
      |           |    |                   |                         |
      |           |    |                   |                         |
      |           V    V                   |                         |
      |    +-------------+          +--------------+        RESET_EVT|
      +--->|    RESET    |--------->| ESTABLISHING |<----------------+
           +-------------+ PEER_    +--------------+
            |           A  RESET_EVT       |
            |           |                  |
            |           |                  |
            |FAILOVER_  |FAILOVER_         |FAILOVER_
            |BEGIN_EVT  |END_EVT           |BEGIN_EVT
            |           |                  |
            V           |                  |
           +-------------+                 |
           | FAILINGOVER |<----------------+
           +-------------+
      
      These changes are fully backwards compatible.
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      662921cd
    • J
      tipc: move protocol message sending away from link FSM · 5045f7b9
      Jon Paul Maloy 提交于
      The implementation of the link FSM currently takes decisions about and
      sends out link protocol messages. This is unnecessary, since such
      actions are not the result of any link state change, and are even
      decided based on non-FSM state information ("silent_intv_cnt").
      
      We now move the sending of unicast link protocol messages to the
      function tipc_link_timeout(), and the initial broadcast synchronization
      message to tipc_node_link_up(). The latter is done because a link
      instance should not need to know whether it is the first or second
      link to a destination. Such information is now restricted to and
      handled by the link aggregation layer in node.c
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5045f7b9
    • J
      tipc: move link synch and failover to link aggregation level · 6e498158
      Jon Paul Maloy 提交于
      Link failover and synchronization have until now been handled by the
      links themselves, forcing them to have knowledge about and to access
      parallel links in order to make the two algorithms work correctly.
      
      In this commit, we move the control part of this functionality to the
      link aggregation level in node.c, which is the right location for this.
      As a result, the two algorithms become easier to follow, and the link
      implementation becomes simpler.
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e498158
    • J
      tipc: move all link_reset() calls to link aggregation level · 6144a996
      Jon Paul Maloy 提交于
      In line with our effort to let the node level have full control over
      its links, we want to move all link reset calls from link.c to node.c.
      Some of the calls can be moved by simply moving the calling function,
      when this is the right thing to do. For the remaining calls we use
      the now established technique of returning a TIPC_LINK_DOWN_EVT
      flag from tipc_link_rcv(), whereafter we perform the reset call when
      the call returns.
      
      This change serves as a preparation for the coming commits.
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6144a996
    • J
      tipc: eliminate function tipc_link_activate() · cbeb83ca
      Jon Paul Maloy 提交于
      The function tipc_link_activate() is redundant, since it mostly performs
      settings that have already been done in a preceding tipc_link_reset().
      
      There are three exceptions to this:
      - The actual state change to TIPC_LINK_WORKING. This should anyway be done
        in the FSM, and not in a separate function.
      - Registration of the link with the bearer. This should be done by the
        node, since we don't want the link to have any knowledge about its
        specific bearer.
      - Call to tipc_node_link_up() for user access registration. With the new
        role distribution between link aggregation and link level this becomes
        the wrong call order; tipc_node_link_up() should instead be called
        directly as a result of a TIPC_LINK_UP event, hence by the node itself.
      
      This commit implements those changes.
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cbeb83ca
  17. 21 7月, 2015 2 次提交
    • J
      tipc: reduce locking scope during packet reception · d999297c
      Jon Paul Maloy 提交于
      We convert packet/message reception according to the same principle
      we have been using for message sending and timeout handling:
      
      We move the function tipc_rcv() to node.c, hence handling the initial
      packet reception at the link aggregation level. The function grabs
      the node lock, selects the receiving link, and accesses it via a new
      call tipc_link_rcv(). This function appends buffers to the input
      queue for delivery upwards, but it may also append outgoing packets
      to the xmit queue, just as we do during regular message sending. The
      latter will happen when buffers are forwarded from the link backlog,
      or when retransmission is requested.
      
      Upon return of this function, and after having released the node lock,
      tipc_rcv() delivers/tranmsits the contents of those queues, but it may
      also perform actions such as link activation or reset, as indicated by
      the return flags from the link.
      
      This reduces the number of cpu cycles spent inside the node spinlock,
      and reduces contention on that lock.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d999297c
    • J
      tipc: move link supervision timer to node level · 8a1577c9
      Jon Paul Maloy 提交于
      In our effort to move control of the links to the link aggregation
      layer, we move the perodic link supervision timer to struct tipc_node.
      The new timer is shared between all links belonging to the node, thus
      saving resources, while still kicking the FSM on both its pertaining
      links at each expiration.
      
      The current link timer and corresponding functions are removed.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a1577c9