1. 10 10月, 2020 1 次提交
    • H
      tipc: fix NULL pointer dereference in tipc_named_rcv · 7b50ee3d
      Hoang Huu Le 提交于
      In the function node_lost_contact(), we call __skb_queue_purge() without
      grabbing the list->lock. This can cause to a race-condition why processing
      the list 'namedq' in calling path tipc_named_rcv()->tipc_named_dequeue().
      
          [] BUG: kernel NULL pointer dereference, address: 0000000000000000
          [] #PF: supervisor read access in kernel mode
          [] #PF: error_code(0x0000) - not-present page
          [] PGD 7ca63067 P4D 7ca63067 PUD 6c553067 PMD 0
          [] Oops: 0000 [#1] SMP NOPTI
          [] CPU: 1 PID: 15 Comm: ksoftirqd/1 Tainted: G  O  5.9.0-rc6+ #2
          [] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS [...]
          [] RIP: 0010:tipc_named_rcv+0x103/0x320 [tipc]
          [] Code: 41 89 44 24 10 49 8b 16 49 8b 46 08 49 c7 06 00 00 00 [...]
          [] RSP: 0018:ffffc900000a7c58 EFLAGS: 00000282
          [] RAX: 00000000000012ec RBX: 0000000000000000 RCX: ffff88807bde1270
          [] RDX: 0000000000002c7c RSI: 0000000000002c7c RDI: ffff88807b38f1a8
          [] RBP: ffff88807b006288 R08: ffff88806a367800 R09: ffff88806a367900
          [] R10: ffff88806a367a00 R11: ffff88806a367b00 R12: ffff88807b006258
          [] R13: ffff88807b00628a R14: ffff888069334d00 R15: ffff88806a434600
          [] FS:  0000000000000000(0000) GS:ffff888079480000(0000) knlGS:0[...]
          [] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          [] CR2: 0000000000000000 CR3: 0000000077320000 CR4: 00000000000006e0
          [] Call Trace:
          []  ? tipc_bcast_rcv+0x9a/0x1a0 [tipc]
          []  tipc_rcv+0x40d/0x670 [tipc]
          []  ? _raw_spin_unlock+0xa/0x20
          []  tipc_l2_rcv_msg+0x55/0x80 [tipc]
          []  __netif_receive_skb_one_core+0x8c/0xa0
          []  process_backlog+0x98/0x140
          []  net_rx_action+0x13a/0x420
          []  __do_softirq+0xdb/0x316
          []  ? smpboot_thread_fn+0x2f/0x1e0
          []  ? smpboot_thread_fn+0x74/0x1e0
          []  ? smpboot_thread_fn+0x14e/0x1e0
          []  run_ksoftirqd+0x1a/0x40
          []  smpboot_thread_fn+0x149/0x1e0
          []  ? sort_range+0x20/0x20
          []  kthread+0x131/0x150
          []  ? kthread_unuse_mm+0xa0/0xa0
          []  ret_from_fork+0x22/0x30
          [] Modules linked in: veth tipc(O) ip6_udp_tunnel udp_tunnel [...]
          [] CR2: 0000000000000000
          [] ---[ end trace 65c276a8e2e2f310 ]---
      
      To fix this, we need to grab the lock of the 'namedq' list on both
      path calling.
      
      Fixes: cad2929d ("tipc: update a binding service via broadcast")
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NHoang Huu Le <hoang.h.le@dektech.com.au>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      7b50ee3d
  2. 19 9月, 2020 4 次提交
    • T
      tipc: add automatic rekeying for encryption key · 23700da2
      Tuong Lien 提交于
      Rekeying is required for security since a key is less secure when using
      for a long time. Also, key will be detached when its nonce value (or
      seqno ...) is exhausted. We now make the rekeying process automatic and
      configurable by user.
      
      Basically, TIPC will at a specific interval generate a new key by using
      the kernel 'Random Number Generator' cipher, then attach it as the node
      TX key and securely distribute to others in the cluster as RX keys (-
      the key exchange). The automatic key switching will then take over, and
      make the new key active shortly. Afterwards, the traffic from this node
      will be encrypted with the new session key. The same can happen in peer
      nodes but not necessarily at the same time.
      
      For simplicity, the automatically generated key will be initiated as a
      per node key. It is not too hard to also support a cluster key rekeying
      (e.g. a given node will generate a unique cluster key and update to the
      others in the cluster...), but that doesn't bring much benefit, while a
      per-node key is even more secure.
      
      We also enable user to force a rekeying or change the rekeying interval
      via netlink, the new 'set key' command option: 'TIPC_NLA_NODE_REKEYING'
      is added for these purposes as follows:
      - A value >= 1 will be set as the rekeying interval (in minutes);
      - A value of 0 will disable the rekeying;
      - A value of 'TIPC_REKEYING_NOW' (~0) will force an immediate rekeying;
      
      The default rekeying interval is (60 * 24) minutes i.e. done every day.
      There isn't any restriction for the value but user shouldn't set it too
      small or too large which results in an "ineffective" rekeying (thats ok
      for testing though).
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23700da2
    • T
      tipc: add automatic session key exchange · 1ef6f7c9
      Tuong Lien 提交于
      With support from the master key option in the previous commit, it
      becomes easy to make frequent updates/exchanges of session keys between
      authenticated cluster nodes.
      Basically, there are two situations where the key exchange will take in
      place:
      
      - When a new node joins the cluster (with the master key), it will need
        to get its peer's TX key, so that be able to decrypt further messages
        from that peer.
      
      - When a new session key is generated (by either user manual setting or
        later automatic rekeying feature), the key will be distributed to all
        peer nodes in the cluster.
      
      A key to be exchanged is encapsulated in the data part of a 'MSG_CRYPTO
      /KEY_DISTR_MSG' TIPC v2 message, then xmit-ed as usual and encrypted by
      using the master key before sending out. Upon receipt of the message it
      will be decrypted in the same way as regular messages, then attached as
      the sender's RX key in the receiver node.
      
      In this way, the key exchange is reliable by the link layer, as well as
      security, integrity and authenticity by the crypto layer.
      
      Also, the forward security will be easily achieved by user changing the
      master key actively but this should not be required very frequently.
      
      The key exchange feature is independent on the presence of a master key
      Note however that the master key still is needed for new nodes to be
      able to join the cluster. It is also optional, and can be turned off/on
      via the sysfs: 'net/tipc/key_exchange_enabled' [default 1: enabled].
      
      Backward compatibility is guaranteed because for nodes that do not have
      master key support, key exchange using master key ie. tx_key = 0 if any
      will be shortly discarded at the message validation step. In other
      words, the key exchange feature will be automatically disabled to those
      nodes.
      
      v2: fix the "implicit declaration of function 'tipc_crypto_key_flush'"
      error in node.c. The function only exists when built with the TIPC
      "CONFIG_TIPC_CRYPTO" option.
      
      v3: use 'info->extack' for a message emitted due to netlink operations
      instead (- David's comment).
      Reported-by: Nkernel test robot <lkp@intel.com>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ef6f7c9
    • T
      tipc: introduce encryption master key · daef1ee3
      Tuong Lien 提交于
      In addition to the supported cluster & per-node encryption keys for the
      en/decryption of TIPC messages, we now introduce one option for user to
      set a cluster key as 'master key', which is simply a symmetric key like
      the former but has a longer life cycle. It has two purposes:
      
      - Authentication of new member nodes in the cluster. New nodes, having
        no knowledge of current session keys in the cluster will still be
        able to join the cluster as long as they know the master key. This is
        because all neighbor discovery (LINK_CONFIG) messages must be
        encrypted with this key.
      
      - Encryption of session encryption keys during automatic exchange and
        update of those.This is a feature we will introduce in a later commit
        in this series.
      
      We insert the new key into the currently unused slot 0 in the key array
      and start using it immediately once the user has set it.
      After joining, a node only knowing the master key should be fully
      communicable to existing nodes in the cluster, although those nodes may
      have their own session keys activated (i.e. not the master one). To
      support this, we define a 'grace period', starting from the time a node
      itself reports having no RX keys, so the existing nodes will use the
      master key for encryption instead. The grace period can be extended but
      will automatically stop after e.g. 5 seconds without a new report. This
      is also the basis for later key exchanging feature as the new node will
      be impossible to decrypt anything without the support from master key.
      
      For user to set a master key, we define a new netlink flag -
      'TIPC_NLA_NODE_KEY_MASTER', so it can be added to the current 'set key'
      netlink command to specify the setting key to be a master key.
      
      Above all, the traditional cluster/per-node key mechanism is guaranteed
      to work when user comes not to use this master key option. This is also
      compatible to legacy nodes without the feature supported.
      
      Even this master key can be updated without any interruption of cluster
      connectivity but is so is needed, this has to be coordinated and set by
      the user.
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      daef1ee3
    • T
      tipc: optimize key switching time and logic · f779bf79
      Tuong Lien 提交于
      We reduce the lasting time for a pending TX key to be active as well as
      for a passive RX key to be freed which generally helps speed up the key
      switching. It is not expected to be too fast but should not be too slow
      either. Also the key handling logic is simplified that a pending RX key
      will be removed automatically if it is found not working after a number
      of times; the probing for a pending TX key is now carried on a specific
      message user ('LINK_PROTOCOL' or 'LINK_CONFIG') which is more efficient
      than using a timer on broadcast messages, the timer is reserved for use
      later as needed.
      
      The kernel logs or 'pr***()' are now made as clear as possible to user.
      Some prints are added, removed or changed to the debug-level. The
      'TIPC_CRYPTO_DEBUG' definition is removed, and the 'pr_debug()' is used
      instead which will be much helpful in runtime.
      
      Besides we also optimize the code in some other places as a preparation
      for later commits.
      
      v2: silent more kernel logs, also use 'info->extack' for a message
      emitted due to netlink operations instead (- David's comments).
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f779bf79
  3. 14 7月, 2020 1 次提交
  4. 17 6月, 2020 1 次提交
    • H
      tipc: update a binding service via broadcast · cad2929d
      Hoang Huu Le 提交于
      Currently, updating binding table (add service binding to
      name table/withdraw a service binding) is being sent over replicast.
      However, if we are scaling up clusters to > 100 nodes/containers this
      method is less affection because of looping through nodes in a cluster one
      by one.
      
      It is worth to use broadcast to update a binding service. This way, the
      binding table can be updated on all peer nodes in one shot.
      
      Broadcast is used when all peer nodes, as indicated by a new capability
      flag TIPC_NAMED_BCAST, support reception of this message type.
      
      Four problems need to be considered when introducing this feature.
      1) When establishing a link to a new peer node we still update this by a
      unicast 'bulk' update. This may lead to race conditions, where a later
      broadcast publication/withdrawal bypass the 'bulk', resulting in
      disordered publications, or even that a withdrawal may arrive before the
      corresponding publication. We solve this by adding an 'is_last_bulk' bit
      in the last bulk messages so that it can be distinguished from all other
      messages. Only when this message has arrived do we open up for reception
      of broadcast publications/withdrawals.
      
      2) When a first legacy node is added to the cluster all distribution
      will switch over to use the legacy 'replicast' method, while the
      opposite happens when the last legacy node leaves the cluster. This
      entails another risk of message disordering that has to be handled. We
      solve this by adding a sequence number to the broadcast/replicast
      messages, so that disordering can be discovered and corrected. Note
      however that we don't need to consider potential message loss or
      duplication at this protocol level.
      
      3) Bulk messages don't contain any sequence numbers, and will always
      arrive in order. Hence we must exempt those from the sequence number
      control and deliver them unconditionally. We solve this by adding a new
      'is_bulk' bit in those messages so that they can be recognized.
      
      4) Legacy messages, which don't contain any new bits or sequence
      numbers, but neither can arrive out of order, also need to be exempt
      from the initial synchronization and sequence number check, and
      delivered unconditionally. Therefore, we add another 'is_not_legacy' bit
      to all new messages so that those can be distinguished from legacy
      messages and the latter delivered directly.
      
      v1->v2:
       - fix warning issue reported by kbuild test robot <lkp@intel.com>
       - add santiy check to drop the publication message with a sequence
      number that is lower than the agreed synch point
      Signed-off-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NHoang Huu Le <hoang.h.le@dektech.com.au>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cad2929d
  5. 03 6月, 2020 1 次提交
    • T
      Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv" · 049fa17f
      Tuong Lien 提交于
      This reverts commit de058420.
      
      There is no actual tipc_node refcnt leak as stated in the above commit.
      The refcnt is hold carefully for the case of an asynchronous decryption
      (i.e. -EINPROGRESS/-EBUSY and skb = NULL is returned), so that the node
      object cannot be freed in the meantime. The counter will be re-balanced
      when the operation's callback arrives with the decrypted buffer if any.
      In other cases, e.g. a synchronous crypto the counter will be decreased
      immediately when it is done.
      
      Now with that commit, a kernel panic will occur when there is no node
      found (i.e. n = NULL) in the 'tipc_rcv()' or a premature release of the
      node object.
      
      This commit solves the issues by reverting the said commit, but keeping
      one valid case that the 'skb_linearize()' is failed.
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Tested-by: NHoang Le <hoang.h.le@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      049fa17f
  6. 27 5月, 2020 3 次提交
    • T
      tipc: add support for broadcast rcv stats dumping · 03b6fefd
      Tuong Lien 提交于
      This commit enables dumping the statistics of a broadcast-receiver link
      like the traditional 'broadcast-link' one (which is for broadcast-
      sender). The link dumping can be triggered via netlink (e.g. the
      iproute2/tipc tool) by the link flag - 'TIPC_NLA_LINK_BROADCAST' as the
      indicator.
      
      The name of a broadcast-receiver link of a specific peer will be in the
      format: 'broadcast-link:<peer-id>'.
      
      For example:
      
      Link <broadcast-link:1001002>
        Window:50 packets
        RX packets:7841 fragments:2408/440 bundles:0/0
        TX packets:0 fragments:0/0 bundles:0/0
        RX naks:0 defs:124 dups:0
        TX naks:21 acks:0 retrans:0
        Congestion link:0  Send queue max:0 avg:0
      
      In addition, the broadcast-receiver link statistics can be reset in the
      usual way via netlink by specifying that link name in command.
      
      Note: the 'tipc_link_name_ext()' is removed because the link name can
      now be retrieved simply via the 'l->name'.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      03b6fefd
    • T
      tipc: enable broadcast retrans via unicast · a91d55d1
      Tuong Lien 提交于
      In some environment, broadcast traffic is suppressed at high rate (i.e.
      a kind of bandwidth limit setting). When it is applied, TIPC broadcast
      can still run successfully. However, when it comes to a high load, some
      packets will be dropped first and TIPC tries to retransmit them but the
      packet retransmission is intentionally broadcast too, so making things
      worse and not helpful at all.
      
      This commit enables the broadcast retransmission via unicast which only
      retransmits packets to the specific peer that has really reported a gap
      i.e. not broadcasting to all nodes in the cluster, so will prevent from
      being suppressed, and also reduce some overheads on the other peers due
      to duplicates, finally improve the overall TIPC broadcast performance.
      
      Note: the functionality can be turned on/off via the sysctl file:
      
      echo 1 > /proc/sys/net/tipc/bc_retruni
      echo 0 > /proc/sys/net/tipc/bc_retruni
      
      Default is '0', i.e. the broadcast retransmission still works as usual.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a91d55d1
    • T
      tipc: introduce Gap ACK blocks for broadcast link · d7626b5a
      Tuong Lien 提交于
      As achieved through commit 9195948f ("tipc: improve TIPC throughput
      by Gap ACK blocks"), we apply the same mechanism for the broadcast link
      as well. The 'Gap ACK blocks' data field in a 'PROTOCOL/STATE_MSG' will
      consist of two parts built for both the broadcast and unicast types:
      
       31                       16 15                        0
      +-------------+-------------+-------------+-------------+
      |  bgack_cnt  |  ugack_cnt  |            len            |
      +-------------+-------------+-------------+-------------+  -
      |            gap            |            ack            |   |
      +-------------+-------------+-------------+-------------+    > bc gacks
      :                           :                           :   |
      +-------------+-------------+-------------+-------------+  -
      |            gap            |            ack            |   |
      +-------------+-------------+-------------+-------------+    > uc gacks
      :                           :                           :   |
      +-------------+-------------+-------------+-------------+  -
      
      which is "automatically" backward-compatible.
      
      We also increase the max number of Gap ACK blocks to 128, allowing upto
      64 blocks per type (total buffer size = 516 bytes).
      
      Besides, the 'tipc_link_advance_transmq()' function is refactored which
      is applicable for both the unicast and broadcast cases now, so some old
      functions can be removed and the code is optimized.
      
      With the patch, TIPC broadcast is more robust regardless of packet loss
      or disorder, latency, ... in the underlying network. Its performance is
      boost up significantly.
      For example, experiment with a 5% packet loss rate results:
      
      $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000
      real    0m 42.46s
      user    0m 1.16s
      sys     0m 17.67s
      
      Without the patch:
      
      $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000
      real    8m 27.94s
      user    0m 0.55s
      sys     0m 2.38s
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7626b5a
  7. 19 4月, 2020 1 次提交
    • X
      tipc: Fix potential tipc_node refcnt leak in tipc_rcv · de058420
      Xiyu Yang 提交于
      tipc_rcv() invokes tipc_node_find() twice, which returns a reference of
      the specified tipc_node object to "n" with increased refcnt.
      
      When tipc_rcv() returns or a new object is assigned to "n", the original
      local reference of "n" becomes invalid, so the refcount should be
      decreased to keep refcount balanced.
      
      The issue happens in some paths of tipc_rcv(), which forget to decrease
      the refcnt increased by tipc_node_find() and will cause a refcnt leak.
      
      Fix this issue by calling tipc_node_put() before the original object
      pointed by "n" becomes invalid.
      Signed-off-by: NXiyu Yang <xiyuyang19@fudan.edu.cn>
      Signed-off-by: NXin Tan <tanxin.ctf@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de058420
  8. 27 3月, 2020 1 次提交
  9. 10 2月, 2020 1 次提交
    • C
      tipc: make three functions static · 2437fd7b
      Chen Wandun 提交于
      Fix the following sparse warning:
      
      net/tipc/node.c:281:6: warning: symbol 'tipc_node_free' was not declared. Should it be static?
      net/tipc/node.c:2801:5: warning: symbol '__tipc_nl_node_set_key' was not declared. Should it be static?
      net/tipc/node.c:2878:5: warning: symbol '__tipc_nl_node_flush_key' was not declared. Should it be static?
      
      Fixes: fc1b6d6d ("tipc: introduce TIPC encryption & authentication")
      Fixes: e1f32190 ("tipc: add support for AEAD key setting via netlink")
      Signed-off-by: NChen Wandun <chenwandun@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2437fd7b
  10. 11 12月, 2019 1 次提交
    • J
      tipc: introduce variable window congestion control · 16ad3f40
      Jon Maloy 提交于
      We introduce a simple variable window congestion control for links.
      The algorithm is inspired by the Reno algorithm, covering both 'slow
      start', 'congestion avoidance', and 'fast recovery' modes.
      
      - We introduce hard lower and upper window limits per link, still
        different and configurable per bearer type.
      
      - We introduce a 'slow start theshold' variable, initially set to
        the maximum window size.
      
      - We let a link start at the minimum congestion window, i.e. in slow
        start mode, and then let is grow rapidly (+1 per rceived ACK) until
        it reaches the slow start threshold and enters congestion avoidance
        mode.
      
      - In congestion avoidance mode we increment the congestion window for
        each window-size number of acked packets, up to a possible maximum
        equal to the configured maximum window.
      
      - For each non-duplicate NACK received, we drop back to fast recovery
        mode, by setting the both the slow start threshold to and the
        congestion window to (current_congestion_window / 2).
      
      - If the timeout handler finds that the transmit queue has not moved
        since the previous timeout, it drops the link back to slow start
        and forces a probe containing the last sent sequence number to the
        sent to the peer, so that this can discover the stale situation.
      
      This change does in reality have effect only on unicast ethernet
      transport, as we have seen that there is no room whatsoever for
      increasing the window max size for the UDP bearer.
      For now, we also choose to keep the limits for the broadcast link
      unchanged and equal.
      
      This algorithm seems to give a 50-100% throughput improvement for
      messages larger than MTU.
      Suggested-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16ad3f40
  11. 23 11月, 2019 1 次提交
  12. 09 11月, 2019 3 次提交
    • T
      tipc: add support for AEAD key setting via netlink · e1f32190
      Tuong Lien 提交于
      This commit adds two netlink commands to TIPC in order for user to be
      able to set or remove AEAD keys:
      - TIPC_NL_KEY_SET
      - TIPC_NL_KEY_FLUSH
      
      When the 'KEY_SET' is given along with the key data, the key will be
      initiated and attached to TIPC crypto. On the other hand, the
      'KEY_FLUSH' command will remove all existing keys if any.
      Acked-by: NYing Xue <ying.xue@windreiver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e1f32190
    • T
      tipc: introduce TIPC encryption & authentication · fc1b6d6d
      Tuong Lien 提交于
      This commit offers an option to encrypt and authenticate all messaging,
      including the neighbor discovery messages. The currently most advanced
      algorithm supported is the AEAD AES-GCM (like IPSec or TLS). All
      encryption/decryption is done at the bearer layer, just before leaving
      or after entering TIPC.
      
      Supported features:
      - Encryption & authentication of all TIPC messages (header + data);
      - Two symmetric-key modes: Cluster and Per-node;
      - Automatic key switching;
      - Key-expired revoking (sequence number wrapped);
      - Lock-free encryption/decryption (RCU);
      - Asynchronous crypto, Intel AES-NI supported;
      - Multiple cipher transforms;
      - Logs & statistics;
      
      Two key modes:
      - Cluster key mode: One single key is used for both TX & RX in all
      nodes in the cluster.
      - Per-node key mode: Each nodes in the cluster has one specific TX key.
      For RX, a node requires its peers' TX key to be able to decrypt the
      messages from those peers.
      
      Key setting from user-space is performed via netlink by a user program
      (e.g. the iproute2 'tipc' tool).
      
      Internal key state machine:
      
                                       Attach    Align(RX)
                                           +-+   +-+
                                           | V   | V
              +---------+      Attach     +---------+
              |  IDLE   |---------------->| PENDING |(user = 0)
              +---------+                 +---------+
                 A   A                   Switch|  A
                 |   |                         |  |
                 |   | Free(switch/revoked)    |  |
           (Free)|   +----------------------+  |  |Timeout
                 |              (TX)        |  |  |(RX)
                 |                          |  |  |
                 |                          |  v  |
              +---------+      Switch     +---------+
              | PASSIVE |<----------------| ACTIVE  |
              +---------+       (RX)      +---------+
              (user = 1)                  (user >= 1)
      
      The number of TFMs is 10 by default and can be changed via the procfs
      'net/tipc/max_tfms'. At this moment, as for simplicity, this file is
      also used to print the crypto statistics at runtime:
      
      echo 0xfff1 > /proc/sys/net/tipc/max_tfms
      
      The patch defines a new TIPC version (v7) for the encryption message (-
      backward compatibility as well). The message is basically encapsulated
      as follows:
      
         +----------------------------------------------------------+
         | TIPCv7 encryption  | Original TIPCv2    | Authentication |
         | header             | packet (encrypted) | Tag            |
         +----------------------------------------------------------+
      
      The throughput is about ~40% for small messages (compared with non-
      encryption) and ~9% for large messages. With the support from hardware
      crypto i.e. the Intel AES-NI CPU instructions, the throughput increases
      upto ~85% for small messages and ~55% for large messages.
      
      By default, the new feature is inactive (i.e. no encryption) until user
      sets a key for TIPC. There is however also a new option - "TIPC_CRYPTO"
      in the kernel configuration to enable/disable the new code when needed.
      
      MAINTAINERS | add two new files 'crypto.h' & 'crypto.c' in tipc
      Acked-by: NYing Xue <ying.xue@windreiver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc1b6d6d
    • T
      tipc: enable creating a "preliminary" node · 4cbf8ac2
      Tuong Lien 提交于
      When user sets RX key for a peer not existing on the own node, a new
      node entry is needed to which the RX key will be attached. However,
      since the peer node address (& capabilities) is unknown at that moment,
      only the node-ID is provided, this commit allows the creation of a node
      with only the data that we call as “preliminary”.
      
      A preliminary node is not the object of the “tipc_node_find()” but the
      “tipc_node_find_by_id()”. Once the first message i.e. LINK_CONFIG comes
      from that peer, and is successfully decrypted by the own node, the
      actual peer node data will be properly updated and the node will
      function as usual.
      
      In addition, the node timer always starts when a node object is created
      so if a preliminary node is not used, it will be cleaned up.
      
      The later encryption functions will also use the node timer and be able
      to create a preliminary node automatically when needed.
      Acked-by: NYing Xue <ying.xue@windreiver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4cbf8ac2
  13. 08 11月, 2019 1 次提交
  14. 07 11月, 2019 1 次提交
  15. 30 10月, 2019 1 次提交
    • H
      tipc: improve throughput between nodes in netns · f73b1281
      Hoang Le 提交于
      Currently, TIPC transports intra-node user data messages directly
      socket to socket, hence shortcutting all the lower layers of the
      communication stack. This gives TIPC very good intra node performance,
      both regarding throughput and latency.
      
      We now introduce a similar mechanism for TIPC data traffic across
      network namespaces located in the same kernel. On the send path, the
      call chain is as always accompanied by the sending node's network name
      space pointer. However, once we have reliably established that the
      receiving node is represented by a namespace on the same host, we just
      replace the namespace pointer with the receiving node/namespace's
      ditto, and follow the regular socket receive patch though the receiving
      node. This technique gives us a throughput similar to the node internal
      throughput, several times larger than if we let the traffic go though
      the full network stacks. As a comparison, max throughput for 64k
      messages is four times larger than TCP throughput for the same type of
      traffic.
      
      To meet any security concerns, the following should be noted.
      
      - All nodes joining a cluster are supposed to have been be certified
      and authenticated by mechanisms outside TIPC. This is no different for
      nodes/namespaces on the same host; they have to auto discover each
      other using the attached interfaces, and establish links which are
      supervised via the regular link monitoring mechanism. Hence, a kernel
      local node has no other way to join a cluster than any other node, and
      have to obey to policies set in the IP or device layers of the stack.
      
      - Only when a sender has established with 100% certainty that the peer
      node is located in a kernel local namespace does it choose to let user
      data messages, and only those, take the crossover path to the receiving
      node/namespace.
      
      - If the receiving node/namespace is removed, its namespace pointer
      is invalidated at all peer nodes, and their neighbor link monitoring
      will eventually note that this node is gone.
      
      - To ensure the "100% certainty" criteria, and prevent any possible
      spoofing, received discovery messages must contain a proof that the
      sender knows a common secret. We use the hash mix of the sending
      node/namespace for this purpose, since it can be accessed directly by
      all other namespaces in the kernel. Upon reception of a discovery
      message, the receiver checks this proof against all the local
      namespaces'hash_mix:es. If it finds a match, that, along with a
      matching node id and cluster id, this is deemed sufficient proof that
      the peer node in question is in a local namespace, and a wormhole can
      be opened.
      
      - We should also consider that TIPC is intended to be a cluster local
      IPC mechanism (just like e.g. UNIX sockets) rather than a network
      protocol, and hence we think it can justified to allow it to shortcut the
      lower protocol layers.
      
      Regarding traceability, we should notice that since commit 6c9081a3
      ("tipc: add loopback device tracking") it is possible to follow the node
      internal packet flow by just activating tcpdump on the loopback
      interface. This will be true even for this mechanism; by activating
      tcpdump on the involved nodes' loopback interfaces their inter-name
      space messaging can easily be tracked.
      
      v2:
      - update 'net' pointer when node left/rejoined
      v3:
      - grab read/write lock when using node ref obj
      v4:
      - clone traffics between netns to loopback
      Suggested-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NHoang Le <hoang.h.le@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f73b1281
  16. 06 10月, 2019 1 次提交
  17. 19 8月, 2019 1 次提交
    • J
      tipc: clean up skb list lock handling on send path · e654f9f5
      Jon Maloy 提交于
      The policy for handling the skb list locks on the send and receive paths
      is simple.
      
      - On the send path we never need to grab the lock on the 'xmitq' list
        when the destination is an exernal node.
      
      - On the receive path we always need to grab the lock on the 'inputq'
        list, irrespective of source node.
      
      However, when transmitting node local messages those will eventually
      end up on the receive path of a local socket, meaning that the argument
      'xmitq' in tipc_node_xmit() will become the 'ínputq' argument in  the
      function tipc_sk_rcv(). This has been handled by always initializing
      the spinlock of the 'xmitq' list at message creation, just in case it
      may end up on the receive path later, and despite knowing that the lock
      in most cases never will be used.
      
      This approach is inaccurate and confusing, and has also concealed the
      fact that the stated 'no lock grabbing' policy for the send path is
      violated in some cases.
      
      We now clean up this by never initializing the lock at message creation,
      instead doing this at the moment we find that the message actually will
      enter the receive path. At the same time we fix the four locations
      where we incorrectly access the spinlock on the send/error path.
      
      This patch also reverts commit d12cffe9 ("tipc: ensure head->lock
      is initialised") which has now become redundant.
      
      CC: Eric Dumazet <edumazet@google.com>
      Reported-by: NChris Packham <chris.packham@alliedtelesis.co.nz>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Reviewed-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e654f9f5
  18. 09 8月, 2019 1 次提交
    • J
      tipc: add loopback device tracking · 6c9081a3
      John Rutherford 提交于
      Since node internal messages are passed directly to the socket, it is not
      possible to observe those messages via tcpdump or wireshark.
      
      We now remedy this by making it possible to clone such messages and send
      the clones to the loopback interface.  The clones are dropped at reception
      and have no functional role except making the traffic visible.
      
      The feature is enabled if network taps are active for the loopback device.
      pcap filtering restrictions require the messages to be presented to the
      receiving side of the loopback device.
      
      v3 - Function dev_nit_active used to check for network taps.
         - Procedure netif_rx_ni used to send cloned messages to loopback device.
      Signed-off-by: NJohn Rutherford <john.rutherford@dektech.com.au>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c9081a3
  19. 26 7月, 2019 1 次提交
  20. 18 7月, 2019 1 次提交
    • J
      tipc: initialize 'validated' field of received packets · 866e5fd8
      Jon Maloy 提交于
      The tipc_msg_validate() function leaves a boolean flag 'validated' in
      the validated buffer's control block, to avoid performing this action
      more than once. However, at reception of new packets, the position of
      this field may already have been set by lower layer protocols, so
      that the packet is erroneously perceived as already validated by TIPC.
      
      We fix this by initializing the said field to 'false' before performing
      the initial validation.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      866e5fd8
  21. 26 6月, 2019 1 次提交
  22. 19 6月, 2019 1 次提交
    • T
      tipc: fix issues with early FAILOVER_MSG from peer · d0f84d08
      Tuong Lien 提交于
      It appears that a FAILOVER_MSG can come from peer even when the failure
      link is resetting (i.e. just after the 'node_write_unlock()'...). This
      means the failover procedure on the node has not been started yet.
      The situation is as follows:
      
               node1                                node2
        linkb          linka                  linka        linkb
          |              |                      |            |
          |              |                      x failure    |
          |              |                  RESETTING        |
          |              |                      |            |
          |              x failure            RESET          |
          |          RESETTING             FAILINGOVER       |
          |              |   (FAILOVER_MSG)     |            |
          |<-------------------------------------------------|
          | *FAILINGOVER |                      |            |
          |              | (dummy FAILOVER_MSG) |            |
          |------------------------------------------------->|
          |            RESET                    |            | FAILOVER_END
          |         FAILINGOVER               RESET          |
          .              .                      .            .
          .              .                      .            .
          .              .                      .            .
      
      Once this happens, the link failover procedure will be triggered
      wrongly on the receiving node since the node isn't in FAILINGOVER state
      but then another link failover will be carried out.
      The consequences are:
      
      1) A peer might get stuck in FAILINGOVER state because the 'sync_point'
      was set, reset and set incorrectly, the criteria to end the failover
      would not be met, it could keep waiting for a message that has already
      received.
      
      2) The early FAILOVER_MSG(s) could be queued in the link failover
      deferdq but would be purged or not pulled out because the 'drop_point'
      was not set correctly.
      
      3) The early FAILOVER_MSG(s) could be dropped too.
      
      4) The dummy FAILOVER_MSG could make the peer leaving FAILINGOVER state
      shortly, but later on it would be restarted.
      
      The same situation can also happen when the link is in PEER_RESET state
      and a FAILOVER_MSG arrives.
      
      The commit resolves the issues by forcing the link down immediately, so
      the failover procedure will be started normally (which is the same as
      when receiving a FAILOVER_MSG and the link is in up state).
      
      Also, the function "tipc_node_link_failover()" is toughen to avoid such
      a situation from happening.
      Acked-by: NJon Maloy <jon.maloy@ericsson.se>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0f84d08
  23. 04 5月, 2019 1 次提交
    • T
      tipc: fix missing Name entries due to half-failover · c0b14a08
      Tuong Lien 提交于
      TIPC link can temporarily fall into "half-establish" that only one of
      the link endpoints is ESTABLISHED and starts to send traffic, PROTOCOL
      messages, whereas the other link endpoint is not up (e.g. immediately
      when the endpoint receives ACTIVATE_MSG, the network interface goes
      down...).
      
      This is a normal situation and will be settled because the link
      endpoint will be eventually brought down after the link tolerance time.
      
      However, the situation will become worse when the second link is
      established before the first link endpoint goes down,
      For example:
      
         1. Both links <1A-2A>, <1B-2B> down
         2. Link endpoint 2A up, but 1A still down (e.g. due to network
            disturbance, wrong session, etc.)
         3. Link <1B-2B> up
         4. Link endpoint 2A down (e.g. due to link tolerance timeout)
         5. Node B starts failover onto link <1B-2B>
      
         ==> Node A does never start link failover.
      
      When the "half-failover" situation happens, two consequences have been
      observed:
      
      a) Peer link/node gets stuck in FAILINGOVER state;
      b) Traffic or user messages that peer node is trying to failover onto
      the second link can be partially or completely dropped by this node.
      
      The consequence a) was actually solved by commit c140eb16 ("tipc:
      fix failover problem"), but that commit didn't cover the b). It's due
      to the fact that the tunnel link endpoint has never been prepared for a
      failover, so the 'l->drop_point' (and the other data...) is not set
      correctly. When a TUNNEL_MSG from peer node arrives on the link,
      depending on the inner message's seqno and the current 'l->drop_point'
      value, the message can be dropped (- treated as a duplicate message) or
      processed.
      At this early stage, the traffic messages from peer are likely to be
      NAME_DISTRIBUTORs, this means some name table entries will be missed on
      the node forever!
      
      The commit resolves the issue by starting the FAILOVER process on this
      node as well. Another benefit from this solution is that we ensure the
      link will not be re-established until the failover ends.
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0b14a08
  24. 28 4月, 2019 2 次提交
    • J
      netlink: make validation more configurable for future strictness · 8cb08174
      Johannes Berg 提交于
      We currently have two levels of strict validation:
      
       1) liberal (default)
           - undefined (type >= max) & NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
           - garbage at end of message accepted
       2) strict (opt-in)
           - NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
      
      Split out parsing strictness into four different options:
       * TRAILING     - check that there's no trailing data after parsing
                        attributes (in message or nested)
       * MAXTYPE      - reject attrs > max known type
       * UNSPEC       - reject attributes with NLA_UNSPEC policy entries
       * STRICT_ATTRS - strictly validate attribute size
      
      The default for future things should be *everything*.
      The current *_strict() is a combination of TRAILING and MAXTYPE,
      and is renamed to _deprecated_strict().
      The current regular parsing has none of this, and is renamed to
      *_parse_deprecated().
      
      Additionally it allows us to selectively set one of the new flags
      even on old policies. Notably, the UNSPEC flag could be useful in
      this case, since it can be arranged (by filling in the policy) to
      not be an incompatible userspace ABI change, but would then going
      forward prevent forgetting attribute entries. Similar can apply
      to the POLICY flag.
      
      We end up with the following renames:
       * nla_parse           -> nla_parse_deprecated
       * nla_parse_strict    -> nla_parse_deprecated_strict
       * nlmsg_parse         -> nlmsg_parse_deprecated
       * nlmsg_parse_strict  -> nlmsg_parse_deprecated_strict
       * nla_parse_nested    -> nla_parse_nested_deprecated
       * nla_validate_nested -> nla_validate_nested_deprecated
      
      Using spatch, of course:
          @@
          expression TB, MAX, HEAD, LEN, POL, EXT;
          @@
          -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
          +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression TB, MAX, NLA, POL, EXT;
          @@
          -nla_parse_nested(TB, MAX, NLA, POL, EXT)
          +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
      
          @@
          expression START, MAX, POL, EXT;
          @@
          -nla_validate_nested(START, MAX, POL, EXT)
          +nla_validate_nested_deprecated(START, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, MAX, POL, EXT;
          @@
          -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
          +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
      
      For this patch, don't actually add the strict, non-renamed versions
      yet so that it breaks compile if I get it wrong.
      
      Also, while at it, make nla_validate and nla_parse go down to a
      common __nla_validate_parse() function to avoid code duplication.
      
      Ultimately, this allows us to have very strict validation for every
      new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
      next patch, while existing things will continue to work as is.
      
      In effect then, this adds fully strict validation for any new command.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8cb08174
    • M
      netlink: make nla_nest_start() add NLA_F_NESTED flag · ae0be8de
      Michal Kubecek 提交于
      Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
      netlink based interfaces (including recently added ones) are still not
      setting it in kernel generated messages. Without the flag, message parsers
      not aware of attribute semantics (e.g. wireshark dissector or libmnl's
      mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
      the structure of their contents.
      
      Unfortunately we cannot just add the flag everywhere as there may be
      userspace applications which check nlattr::nla_type directly rather than
      through a helper masking out the flags. Therefore the patch renames
      nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
      as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
      are rewritten to use nla_nest_start().
      
      Except for changes in include/net/netlink.h, the patch was generated using
      this semantic patch:
      
      @@ expression E1, E2; @@
      -nla_nest_start(E1, E2)
      +nla_nest_start_noflag(E1, E2)
      
      @@ expression E1, E2; @@
      -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
      +nla_nest_start(E1, E2)
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae0be8de
  25. 12 4月, 2019 1 次提交
  26. 24 3月, 2019 1 次提交
    • J
      tipc: tipc clang warning · 737889ef
      Jon Maloy 提交于
      When checking the code with clang -Wsometimes-uninitialized we get the
      following warning:
      
      if (!tipc_link_is_establishing(l)) {
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      net/tipc/node.c:847:46: note: uninitialized use occurs here
            tipc_bearer_xmit(n->net, bearer_id, &xmitq, maddr);
      
      net/tipc/node.c:831:2: note: remove the 'if' if its condition is always
      true
      if (!tipc_link_is_establishing(l)) {
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      net/tipc/node.c:821:31: note: initialize the variable 'maddr' to silence
      this warning
      struct tipc_media_addr *maddr;
      
      We fix this by initializing 'maddr' to NULL. For the matter of clarity,
      we also test if 'xmitq' is non-empty before we use it and 'maddr'
      further down in the  function. It will never happen that 'xmitq' is non-
      empty at the same time as 'maddr' is NULL, so this is a sufficient test.
      
      Fixes: 598411d7 ("tipc: make resetting of links non-atomic")
      Reported-by: NNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      737889ef
  27. 20 3月, 2019 1 次提交
  28. 12 2月, 2019 1 次提交
    • T
      tipc: fix link session and re-establish issues · 91986ee1
      Tuong Lien 提交于
      When a link endpoint is re-created (e.g. after a node reboot or
      interface reset), the link session number is varied by random, the peer
      endpoint will be synced with this new session number before the link is
      re-established.
      
      However, there is a shortcoming in this mechanism that can lead to the
      link never re-established or faced with a failure then. It happens when
      the peer endpoint is ready in ESTABLISHING state, the 'peer_session' as
      well as the 'in_session' flag have been set, but suddenly this link
      endpoint leaves. When it comes back with a random session number, there
      are two situations possible:
      
      1/ If the random session number is larger than (or equal to) the
      previous one, the peer endpoint will be updated with this new session
      upon receipt of a RESET_MSG from this endpoint, and the link can be re-
      established as normal. Otherwise, all the RESET_MSGs from this endpoint
      will be rejected by the peer. In turn, when this link endpoint receives
      one ACTIVATE_MSG from the peer, it will move to ESTABLISHED and start
      to send STATE_MSGs, but again these messages will be dropped by the
      peer due to wrong session.
      The peer link endpoint can still become ESTABLISHED after receiving a
      traffic message from this endpoint (e.g. a BCAST_PROTOCOL or
      NAME_DISTRIBUTOR), but since all the STATE_MSGs are invalid, the link
      will be forced down sooner or later!
      
      Even in case the random session number is larger than the previous one,
      it can be that the ACTIVATE_MSG from the peer arrives first, and this
      link endpoint moves quickly to ESTABLISHED without sending out any
      RESET_MSG yet. Consequently, the peer link will not be updated with the
      new session number, and the same link failure scenario as above will
      happen.
      
      2/ Another situation can be that, the peer link endpoint was reset due
      to any reasons in the meantime, its link state was set to RESET from
      ESTABLISHING but still in session, i.e. the 'in_session' flag is not
      reset...
      Now, if the random session number from this endpoint is less than the
      previous one, all the RESET_MSGs from this endpoint will be rejected by
      the peer. In the other direction, when this link endpoint receives a
      RESET_MSG from the peer, it moves to ESTABLISHING and starts to send
      ACTIVATE_MSGs, but all these messages will be rejected by the peer too.
      As a result, the link cannot be re-established but gets stuck with this
      link endpoint in state ESTABLISHING and the peer in RESET!
      
      Solution:
      
      ===========
      
      This link endpoint should not go directly to ESTABLISHED when getting
      ACTIVATE_MSG from the peer which may belong to the old session if the
      link was re-created. To ensure the session to be correct before the
      link is re-established, the peer endpoint in ESTABLISHING state will
      send back the last session number in ACTIVATE_MSG for a verification at
      this endpoint. Then, if needed, a new and more appropriate session
      number will be regenerated to force a re-synch first.
      
      In addition, when a link in ESTABLISHING state is reset, its state will
      move to RESET according to the link FSM, along with resetting the
      'in_session' flag (and the other data) as a normal link reset, it will
      also be deleted if requested.
      
      The solution is backward compatible.
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91986ee1
  29. 20 12月, 2018 3 次提交
    • T
      tipc: add trace_events for tipc node · eb18a510
      Tuong Lien 提交于
      The commit adds the new trace_events for TIPC node object:
      
      trace_tipc_node_create()
      trace_tipc_node_delete()
      trace_tipc_node_lost_contact()
      trace_tipc_node_timeout()
      trace_tipc_node_link_up()
      trace_tipc_node_link_down()
      trace_tipc_node_reset_links()
      trace_tipc_node_fsm_evt()
      trace_tipc_node_check_state()
      
      Also, enables the traces for the following cases:
      - When a node is created/deleted;
      - When a node contact is lost;
      - When a node timer is timed out;
      - When a node link is up/down;
      - When all node links are reset;
      - When node state is changed;
      - When a skb comes and node state needs to be checked/updated.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb18a510
    • T
      tipc: add trace_events for tipc link · 26574db0
      Tuong Lien 提交于
      The commit adds the new trace_events for TIPC link object:
      
      trace_tipc_link_timeout()
      trace_tipc_link_fsm()
      trace_tipc_link_reset()
      trace_tipc_link_too_silent()
      trace_tipc_link_retrans()
      trace_tipc_link_bc_ack()
      trace_tipc_link_conges()
      
      And the traces for PROTOCOL messages at building and receiving:
      
      trace_tipc_proto_build()
      trace_tipc_proto_rcv()
      
      Note:
      a) The 'tipc_link_too_silent' event will only happen when the
      'silent_intv_cnt' is about to reach the 'abort_limit' value (and the
      event is enabled). The benefit for this kind of event is that we can
      get an early indication about TIPC link loss issue due to timeout, then
      can do some necessary actions for troubleshooting.
      
      For example: To trigger the 'tipc_proto_rcv' when the 'too_silent'
      event occurs:
      
      echo 'enable_event:tipc:tipc_proto_rcv' > \
            events/tipc/tipc_link_too_silent/trigger
      
      And disable it when TIPC link is reset:
      
      echo 'disable_event:tipc:tipc_proto_rcv' > \
            events/tipc/tipc_link_reset/trigger
      
      b) The 'tipc_link_retrans' or 'tipc_link_bc_ack' event is useful to
      trace TIPC retransmission issues.
      
      In addition, the commit adds the 'trace_tipc_list/link_dump()' at the
      'retransmission failure' case. Then, if the issue occurs, the link
      'transmq' along with the link data can be dumped for post-analysis.
      These dump events should be enabled by default since it will only take
      effect when the failure happens.
      
      The same approach is also applied for the faulty case that the
      validation of protocol message is failed.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26574db0
    • T
      tipc: enable tracepoints in tipc · b4b9771b
      Tuong Lien 提交于
      As for the sake of debugging/tracing, the commit enables tracepoints in
      TIPC along with some general trace_events as shown below. It also
      defines some 'tipc_*_dump()' functions that allow to dump TIPC object
      data whenever needed, that is, for general debug purposes, ie. not just
      for the trace_events.
      
      The following trace_events are now available:
      
      - trace_tipc_skb_dump(): allows to trace and dump TIPC msg & skb data,
        e.g. message type, user, droppable, skb truesize, cloned skb, etc.
      
      - trace_tipc_list_dump(): allows to trace and dump any TIPC buffers or
        queues, e.g. TIPC link transmq, socket receive queue, etc.
      
      - trace_tipc_sk_dump(): allows to trace and dump TIPC socket data, e.g.
        sk state, sk type, connection type, rmem_alloc, socket queues, etc.
      
      - trace_tipc_link_dump(): allows to trace and dump TIPC link data, e.g.
        link state, silent_intv_cnt, gap, bc_gap, link queues, etc.
      
      - trace_tipc_node_dump(): allows to trace and dump TIPC node data, e.g.
        node state, active links, capabilities, link entries, etc.
      
      How to use:
      Put the trace functions at any places where we want to dump TIPC data
      or events.
      
      Note:
      a) The dump functions will generate raw data only, that is, to offload
      the trace event's processing, it can require a tool or script to parse
      the data but this should be simple.
      
      b) The trace_tipc_*_dump() should be reserved for a failure cases only
      (e.g. the retransmission failure case) or where we do not expect to
      happen too often, then we can consider enabling these events by default
      since they will almost not take any effects under normal conditions,
      but once the rare condition or failure occurs, we get the dumped data
      fully for post-analysis.
      
      For other trace purposes, we can reuse these trace classes as template
      but different events.
      
      c) A trace_event is only effective when we enable it. To enable the
      TIPC trace_events, echo 1 to 'enable' files in the events/tipc/
      directory in the 'debugfs' file system. Normally, they are located at:
      
      /sys/kernel/debug/tracing/events/tipc/
      
      For example:
      
      To enable the tipc_link_dump event:
      
      echo 1 > /sys/kernel/debug/tracing/events/tipc/tipc_link_dump/enable
      
      To enable all the TIPC trace_events:
      
      echo 1 > /sys/kernel/debug/tracing/events/tipc/enable
      
      To collect the trace data:
      
      cat trace
      
      or
      
      cat trace_pipe > /trace.out &
      
      To disable all the TIPC trace_events:
      
      echo 0 > /sys/kernel/debug/tracing/events/tipc/enable
      
      To clear the trace buffer:
      
      echo > trace
      
      d) Like the other trace_events, the feature like 'filter' or 'trigger'
      is also usable for the tipc trace_events.
      For more details, have a look at:
      
      Documentation/trace/ftrace.txt
      
      MAINTAINERS | add two new files 'trace.h' & 'trace.c' in tipc
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Tested-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4b9771b
  30. 19 12月, 2018 1 次提交