1. 25 7月, 2020 1 次提交
  2. 14 7月, 2020 1 次提交
  3. 09 7月, 2020 1 次提交
    • H
      tipc: fix retransmission on unicast links · a34f8291
      Hamish Martin 提交于
      A scenario has been observed where a 'bc_init' message for a link is not
      retransmitted if it fails to be received by the peer. This leads to the
      peer never establishing the link fully and it discarding all other data
      received on the link. In this scenario the message is lost in transit to
      the peer.
      
      The issue is traced to the 'nxt_retr' field of the skb not being
      initialised for links that aren't a bc_sndlink. This leads to the
      comparison in tipc_link_advance_transmq() that gates whether to attempt
      retransmission of a message performing in an undesirable way.
      Depending on the relative value of 'jiffies', this comparison:
          time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr)
      may return true or false given that 'nxt_retr' remains at the
      uninitialised value of 0 for non bc_sndlinks.
      
      This is most noticeable shortly after boot when jiffies is initialised
      to a high value (to flush out rollover bugs) and we compare a jiffies of,
      say, 4294940189 to zero. In that case time_before returns 'true' leading
      to the skb not being retransmitted.
      
      The fix is to ensure that all skbs have a valid 'nxt_retr' time set for
      them and this is achieved by refactoring the setting of this value into
      a central function.
      With this fix, transmission losses of 'bc_init' messages do not stall
      the link establishment forever because the 'bc_init' message is
      retransmitted and the link eventually establishes correctly.
      
      Fixes: 382f598f ("tipc: reduce duplicate packets for unicast traffic")
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NHamish Martin <hamish.martin@alliedtelesis.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a34f8291
  4. 20 6月, 2020 1 次提交
  5. 17 6月, 2020 1 次提交
    • H
      tipc: update a binding service via broadcast · cad2929d
      Hoang Huu Le 提交于
      Currently, updating binding table (add service binding to
      name table/withdraw a service binding) is being sent over replicast.
      However, if we are scaling up clusters to > 100 nodes/containers this
      method is less affection because of looping through nodes in a cluster one
      by one.
      
      It is worth to use broadcast to update a binding service. This way, the
      binding table can be updated on all peer nodes in one shot.
      
      Broadcast is used when all peer nodes, as indicated by a new capability
      flag TIPC_NAMED_BCAST, support reception of this message type.
      
      Four problems need to be considered when introducing this feature.
      1) When establishing a link to a new peer node we still update this by a
      unicast 'bulk' update. This may lead to race conditions, where a later
      broadcast publication/withdrawal bypass the 'bulk', resulting in
      disordered publications, or even that a withdrawal may arrive before the
      corresponding publication. We solve this by adding an 'is_last_bulk' bit
      in the last bulk messages so that it can be distinguished from all other
      messages. Only when this message has arrived do we open up for reception
      of broadcast publications/withdrawals.
      
      2) When a first legacy node is added to the cluster all distribution
      will switch over to use the legacy 'replicast' method, while the
      opposite happens when the last legacy node leaves the cluster. This
      entails another risk of message disordering that has to be handled. We
      solve this by adding a sequence number to the broadcast/replicast
      messages, so that disordering can be discovered and corrected. Note
      however that we don't need to consider potential message loss or
      duplication at this protocol level.
      
      3) Bulk messages don't contain any sequence numbers, and will always
      arrive in order. Hence we must exempt those from the sequence number
      control and deliver them unconditionally. We solve this by adding a new
      'is_bulk' bit in those messages so that they can be recognized.
      
      4) Legacy messages, which don't contain any new bits or sequence
      numbers, but neither can arrive out of order, also need to be exempt
      from the initial synchronization and sequence number check, and
      delivered unconditionally. Therefore, we add another 'is_not_legacy' bit
      to all new messages so that those can be distinguished from legacy
      messages and the latter delivered directly.
      
      v1->v2:
       - fix warning issue reported by kbuild test robot <lkp@intel.com>
       - add santiy check to drop the publication message with a sequence
      number that is lower than the agreed synch point
      Signed-off-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NHoang Huu Le <hoang.h.le@dektech.com.au>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cad2929d
  6. 14 6月, 2020 1 次提交
    • M
      treewide: replace '---help---' in Kconfig files with 'help' · a7f7f624
      Masahiro Yamada 提交于
      Since commit 84af7a61 ("checkpatch: kconfig: prefer 'help' over
      '---help---'"), the number of '---help---' has been gradually
      decreasing, but there are still more than 2400 instances.
      
      This commit finishes the conversion. While I touched the lines,
      I also fixed the indentation.
      
      There are a variety of indentation styles found.
      
        a) 4 spaces + '---help---'
        b) 7 spaces + '---help---'
        c) 8 spaces + '---help---'
        d) 1 space + 1 tab + '---help---'
        e) 1 tab + '---help---'    (correct indentation)
        f) 1 tab + 1 space + '---help---'
        g) 1 tab + 2 spaces + '---help---'
      
      In order to convert all of them to 1 tab + 'help', I ran the
      following commend:
      
        $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      a7f7f624
  7. 12 6月, 2020 2 次提交
    • T
      tipc: fix NULL pointer dereference in tipc_disc_rcv() · 97982782
      Tuong Lien 提交于
      When a bearer is enabled, we create a 'tipc_discoverer' object to store
      the bearer related data along with a timer and a preformatted discovery
      message buffer for later probing... However, this is only carried after
      the bearer was set 'up', that left a race condition resulting in kernel
      panic.
      
      It occurs when a discovery message from a peer node is received and
      processed in bottom half (since the bearer is 'up' already) just before
      the discoverer object is created but is now accessed in order to update
      the preformatted buffer (with a new trial address, ...) so leads to the
      NULL pointer dereference.
      
      We solve the problem by simply moving the bearer 'up' setting to later,
      so make sure everything is ready prior to any message receiving.
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      97982782
    • T
      tipc: fix kernel WARNING in tipc_msg_append() · c9aa81fa
      Tuong Lien 提交于
      syzbot found the following issue:
      
      WARNING: CPU: 0 PID: 6808 at include/linux/thread_info.h:150 check_copy_size include/linux/thread_info.h:150 [inline]
      WARNING: CPU: 0 PID: 6808 at include/linux/thread_info.h:150 copy_from_iter include/linux/uio.h:144 [inline]
      WARNING: CPU: 0 PID: 6808 at include/linux/thread_info.h:150 tipc_msg_append+0x49a/0x5e0 net/tipc/msg.c:242
      Kernel panic - not syncing: panic_on_warn set ...
      
      This happens after commit 5e9eeccc ("tipc: fix NULL pointer
      dereference in streaming") that tried to build at least one buffer even
      when the message data length is zero... However, it now exposes another
      bug that the 'mss' can be zero and the 'cpy' will be negative, thus the
      above kernel WARNING will appear!
      The zero value of 'mss' is never expected because it means Nagle is not
      enabled for the socket (actually the socket type was 'SOCK_SEQPACKET'),
      so the function 'tipc_msg_append()' must not be called at all. But that
      was in this particular case since the message data length was zero, and
      the 'send <= maxnagle' check became true.
      
      We resolve the issue by explicitly checking if Nagle is enabled for the
      socket, i.e. 'maxnagle != 0' before calling the 'tipc_msg_append()'. We
      also reinforce the function to against such a negative values if any.
      
      Reported-by: syzbot+75139a7d2605236b0b7f@syzkaller.appspotmail.com
      Fixes: c0bceb97 ("tipc: add smart nagle feature")
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9aa81fa
  8. 05 6月, 2020 1 次提交
    • T
      tipc: fix NULL pointer dereference in streaming · 5e9eeccc
      Tuong Lien 提交于
      syzbot found the following crash:
      
      general protection fault, probably for non-canonical address 0xdffffc0000000019: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x00000000000000c8-0x00000000000000cf]
      CPU: 1 PID: 7060 Comm: syz-executor394 Not tainted 5.7.0-rc6-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:__tipc_sendstream+0xbde/0x11f0 net/tipc/socket.c:1591
      Code: 00 00 00 00 48 39 5c 24 28 48 0f 44 d8 e8 fa 3e db f9 48 b8 00 00 00 00 00 fc ff df 48 8d bb c8 00 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 e2 04 00 00 48 8b 9b c8 00 00 00 48 b8 00 00 00
      RSP: 0018:ffffc90003ef7818 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff8797fd9d
      RDX: 0000000000000019 RSI: ffffffff8797fde6 RDI: 00000000000000c8
      RBP: ffff888099848040 R08: ffff88809a5f6440 R09: fffffbfff1860b4c
      R10: ffffffff8c305a5f R11: fffffbfff1860b4b R12: ffff88809984857e
      R13: 0000000000000000 R14: ffff888086aa4000 R15: 0000000000000000
      FS:  00000000009b4880(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000140 CR3: 00000000a7fdf000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       tipc_sendstream+0x4c/0x70 net/tipc/socket.c:1533
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:672
       ____sys_sendmsg+0x32f/0x810 net/socket.c:2352
       ___sys_sendmsg+0x100/0x170 net/socket.c:2406
       __sys_sendmmsg+0x195/0x480 net/socket.c:2496
       __do_sys_sendmmsg net/socket.c:2525 [inline]
       __se_sys_sendmmsg net/socket.c:2522 [inline]
       __x64_sys_sendmmsg+0x99/0x100 net/socket.c:2522
       do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
       entry_SYSCALL_64_after_hwframe+0x49/0xb3
      RIP: 0033:0x440199
      ...
      
      This bug was bisected to commit 0a3e060f ("tipc: add test for Nagle
      algorithm effectiveness"). However, it is not the case, the trouble was
      from the base in the case of zero data length message sending, we would
      unexpectedly make an empty 'txq' queue after the 'tipc_msg_append()' in
      Nagle mode.
      
      A similar crash can be generated even without the bisected patch but at
      the link layer when it accesses the empty queue.
      
      We solve the issues by building at least one buffer to go with socket's
      header and an optional data section that may be empty like what we had
      with the 'tipc_msg_build()'.
      
      Note: the previous commit 4c21daae ("tipc: Fix NULL pointer
      dereference in __tipc_sendstream()") is obsoleted by this one since the
      'txq' will be never empty and the check of 'skb != NULL' is unnecessary
      but it is safe anyway.
      
      Reported-by: syzbot+8eac6d030e7807c21d32@syzkaller.appspotmail.com
      Fixes: c0bceb97 ("tipc: add smart nagle feature")
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e9eeccc
  9. 03 6月, 2020 2 次提交
  10. 02 6月, 2020 1 次提交
  11. 30 5月, 2020 1 次提交
  12. 29 5月, 2020 1 次提交
  13. 27 5月, 2020 5 次提交
    • T
      tipc: add test for Nagle algorithm effectiveness · 0a3e060f
      Tuong Lien 提交于
      When streaming in Nagle mode, we try to bundle small messages from user
      as many as possible if there is one outstanding buffer, i.e. not ACK-ed
      by the receiving side, which helps boost up the overall throughput. So,
      the algorithm's effectiveness really depends on when Nagle ACK comes or
      what the specific network latency (RTT) is, compared to the user's
      message sending rate.
      
      In a bad case, the user's sending rate is low or the network latency is
      small, there will not be many bundles, so making a Nagle ACK or waiting
      for it is not meaningful.
      For example: a user sends its messages every 100ms and the RTT is 50ms,
      then for each messages, we require one Nagle ACK but then there is only
      one user message sent without any bundles.
      
      In a better case, even if we have a few bundles (e.g. the RTT = 300ms),
      but now the user sends messages in medium size, then there will not be
      any difference at all, that says 3 x 1000-byte data messages if bundled
      will still result in 3 bundles with MTU = 1500.
      
      When Nagle is ineffective, the delay in user message sending is clearly
      wasted instead of sending directly.
      
      Besides, adding Nagle ACKs will consume some processor load on both the
      sending and receiving sides.
      
      This commit adds a test on the effectiveness of the Nagle algorithm for
      an individual connection in the network on which it actually runs.
      Particularly, upon receipt of a Nagle ACK we will compare the number of
      bundles in the backlog queue to the number of user messages which would
      be sent directly without Nagle. If the ratio is good (e.g. >= 2), Nagle
      mode will be kept for further message sending. Otherwise, we will leave
      Nagle and put a 'penalty' on the connection, so it will have to spend
      more 'one-way' messages before being able to re-enter Nagle.
      
      In addition, the 'ack-required' bit is only set when really needed that
      the number of Nagle ACKs will be reduced during Nagle mode.
      
      Testing with benchmark showed that with the patch, there was not much
      difference in throughput for small messages since the tool continuously
      sends messages without a break, so Nagle would still take in effect.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a3e060f
    • T
      tipc: add support for broadcast rcv stats dumping · 03b6fefd
      Tuong Lien 提交于
      This commit enables dumping the statistics of a broadcast-receiver link
      like the traditional 'broadcast-link' one (which is for broadcast-
      sender). The link dumping can be triggered via netlink (e.g. the
      iproute2/tipc tool) by the link flag - 'TIPC_NLA_LINK_BROADCAST' as the
      indicator.
      
      The name of a broadcast-receiver link of a specific peer will be in the
      format: 'broadcast-link:<peer-id>'.
      
      For example:
      
      Link <broadcast-link:1001002>
        Window:50 packets
        RX packets:7841 fragments:2408/440 bundles:0/0
        TX packets:0 fragments:0/0 bundles:0/0
        RX naks:0 defs:124 dups:0
        TX naks:21 acks:0 retrans:0
        Congestion link:0  Send queue max:0 avg:0
      
      In addition, the broadcast-receiver link statistics can be reset in the
      usual way via netlink by specifying that link name in command.
      
      Note: the 'tipc_link_name_ext()' is removed because the link name can
      now be retrieved simply via the 'l->name'.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      03b6fefd
    • T
      tipc: enable broadcast retrans via unicast · a91d55d1
      Tuong Lien 提交于
      In some environment, broadcast traffic is suppressed at high rate (i.e.
      a kind of bandwidth limit setting). When it is applied, TIPC broadcast
      can still run successfully. However, when it comes to a high load, some
      packets will be dropped first and TIPC tries to retransmit them but the
      packet retransmission is intentionally broadcast too, so making things
      worse and not helpful at all.
      
      This commit enables the broadcast retransmission via unicast which only
      retransmits packets to the specific peer that has really reported a gap
      i.e. not broadcasting to all nodes in the cluster, so will prevent from
      being suppressed, and also reduce some overheads on the other peers due
      to duplicates, finally improve the overall TIPC broadcast performance.
      
      Note: the functionality can be turned on/off via the sysctl file:
      
      echo 1 > /proc/sys/net/tipc/bc_retruni
      echo 0 > /proc/sys/net/tipc/bc_retruni
      
      Default is '0', i.e. the broadcast retransmission still works as usual.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a91d55d1
    • T
      tipc: add back link trace events · c6ed7a5c
      Tuong Lien 提交于
      In the previous commit ("tipc: add Gap ACK blocks support for broadcast
      link"), we have removed the following link trace events due to the code
      changes:
      
      - tipc_link_bc_ack
      - tipc_link_retrans
      
      This commit adds them back along with some minor changes to adapt to
      the new code.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c6ed7a5c
    • T
      tipc: introduce Gap ACK blocks for broadcast link · d7626b5a
      Tuong Lien 提交于
      As achieved through commit 9195948f ("tipc: improve TIPC throughput
      by Gap ACK blocks"), we apply the same mechanism for the broadcast link
      as well. The 'Gap ACK blocks' data field in a 'PROTOCOL/STATE_MSG' will
      consist of two parts built for both the broadcast and unicast types:
      
       31                       16 15                        0
      +-------------+-------------+-------------+-------------+
      |  bgack_cnt  |  ugack_cnt  |            len            |
      +-------------+-------------+-------------+-------------+  -
      |            gap            |            ack            |   |
      +-------------+-------------+-------------+-------------+    > bc gacks
      :                           :                           :   |
      +-------------+-------------+-------------+-------------+  -
      |            gap            |            ack            |   |
      +-------------+-------------+-------------+-------------+    > uc gacks
      :                           :                           :   |
      +-------------+-------------+-------------+-------------+  -
      
      which is "automatically" backward-compatible.
      
      We also increase the max number of Gap ACK blocks to 128, allowing upto
      64 blocks per type (total buffer size = 516 bytes).
      
      Besides, the 'tipc_link_advance_transmq()' function is refactored which
      is applicable for both the unicast and broadcast cases now, so some old
      functions can be removed and the code is optimized.
      
      With the patch, TIPC broadcast is more robust regardless of packet loss
      or disorder, latency, ... in the underlying network. Its performance is
      boost up significantly.
      For example, experiment with a 5% packet loss rate results:
      
      $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000
      real    0m 42.46s
      user    0m 1.16s
      sys     0m 17.67s
      
      Without the patch:
      
      $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000
      real    8m 27.94s
      user    0m 0.55s
      sys     0m 2.38s
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7626b5a
  14. 23 5月, 2020 1 次提交
    • E
      tipc: block BH before using dst_cache · 13788174
      Eric Dumazet 提交于
      dst_cache_get() documents it must be used with BH disabled.
      
      sysbot reported :
      
      BUG: using smp_processor_id() in preemptible [00000000] code: /21697
      caller is dst_cache_get+0x3a/0xb0 net/core/dst_cache.c:68
      CPU: 0 PID: 21697 Comm:  Not tainted 5.7.0-rc6-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x188/0x20d lib/dump_stack.c:118
       check_preemption_disabled lib/smp_processor_id.c:47 [inline]
       debug_smp_processor_id.cold+0x88/0x9b lib/smp_processor_id.c:57
       dst_cache_get+0x3a/0xb0 net/core/dst_cache.c:68
       tipc_udp_xmit.isra.0+0xb9/0xad0 net/tipc/udp_media.c:164
       tipc_udp_send_msg+0x3e6/0x490 net/tipc/udp_media.c:244
       tipc_bearer_xmit_skb+0x1de/0x3f0 net/tipc/bearer.c:526
       tipc_enable_bearer+0xb2f/0xd60 net/tipc/bearer.c:331
       __tipc_nl_bearer_enable+0x2bf/0x390 net/tipc/bearer.c:995
       tipc_nl_bearer_enable+0x1e/0x30 net/tipc/bearer.c:1003
       genl_family_rcv_msg_doit net/netlink/genetlink.c:673 [inline]
       genl_family_rcv_msg net/netlink/genetlink.c:718 [inline]
       genl_rcv_msg+0x627/0xdf0 net/netlink/genetlink.c:735
       netlink_rcv_skb+0x15a/0x410 net/netlink/af_netlink.c:2469
       genl_rcv+0x24/0x40 net/netlink/genetlink.c:746
       netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
       netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329
       netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:672
       ____sys_sendmsg+0x6bf/0x7e0 net/socket.c:2362
       ___sys_sendmsg+0x100/0x170 net/socket.c:2416
       __sys_sendmsg+0xec/0x1b0 net/socket.c:2449
       do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
       entry_SYSCALL_64_after_hwframe+0x49/0xb3
      RIP: 0033:0x45ca29
      
      Fixes: e9c1a793 ("tipc: add dst_cache support for udp media")
      Cc: Xin Long <lucien.xin@gmail.com>
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      13788174
  15. 14 5月, 2020 3 次提交
    • T
      tipc: fix failed service subscription deletion · 88690b10
      Tuong Lien 提交于
      When a service subscription is expired or canceled by user, it needs to
      be deleted from the subscription list, so that new subscriptions can be
      registered (max = 65535 per net). However, there are two issues in code
      that can cause such an unused subscription to persist:
      
      1) The 'tipc_conn_delete_sub()' has a loop on the subscription list but
      it makes a break shortly when the 1st subscription differs from the one
      specified, so the subscription will not be deleted.
      
      2) In case a subscription is canceled, the code to remove the
      'TIPC_SUB_CANCEL' flag from the subscription filter does not work if it
      is a local subscription (i.e. the little endian isn't involved). So, it
      will be no matches when looking for the subscription to delete later.
      
      The subscription(s) will be removed eventually when the user terminates
      its topology connection but that could be a long time later. Meanwhile,
      the number of available subscriptions may be exhausted.
      
      This commit fixes the two issues above, so as needed a subscription can
      be deleted correctly.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88690b10
    • T
      tipc: fix memory leak in service subscripting · 0771d7df
      Tuong Lien 提交于
      Upon receipt of a service subscription request from user via a topology
      connection, one 'sub' object will be allocated in kernel, so it will be
      able to send an event of the service if any to the user correspondingly
      then. Also, in case of any failure, the connection will be shutdown and
      all the pertaining 'sub' objects will be freed.
      
      However, there is a race condition as follows resulting in memory leak:
      
             receive-work       connection        send-work
                    |                |                |
              sub-1 |<------//-------|                |
              sub-2 |<------//-------|                |
                    |                |<---------------| evt for sub-x
              sub-3 |<------//-------|                |
                    :                :                :
                    :                :                :
                    |       /--------|                |
                    |       |        * peer closed    |
                    |       |        |                |
                    |       |        |<-------X-------| evt for sub-y
                    |       |        |<===============|
              sub-n |<------/        X    shutdown    |
          -> orphan |                                 |
      
      That is, the 'receive-work' may get the last subscription request while
      the 'send-work' is shutting down the connection due to peer close.
      
      We had a 'lock' on the connection, so the two actions cannot be carried
      out simultaneously. If the last subscription is allocated e.g. 'sub-n',
      before the 'send-work' closes the connection, there will be no issue at
      all, the 'sub' objects will be freed. In contrast the last subscription
      will become orphan since the connection was closed, and we released all
      references.
      
      This commit fixes the issue by simply adding one test if the connection
      remains in 'connected' state right after we obtain the connection lock,
      then a subscription object can be created as usual, otherwise we ignore
      it.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Reported-by: NThang Ngo <thang.h.ngo@dektech.com.au>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0771d7df
    • T
      tipc: fix large latency in smart Nagle streaming · c7268589
      Tuong Lien 提交于
      Currently when a connection is in Nagle mode, we set the 'ack_required'
      bit in the last sending buffer and wait for the corresponding ACK prior
      to pushing more data. However, on the receiving side, the ACK is issued
      only when application really  reads the whole data. Even if part of the
      last buffer is received, we will not do the ACK as required. This might
      cause an unnecessary delay since the receiver does not always fetch the
      message as fast as the sender, resulting in a large latency in the user
      message sending, which is: [one RTT + the receiver processing time].
      
      The commit makes Nagle ACK as soon as possible i.e. when a message with
      the 'ack_required' arrives in the receiving side's stack even before it
      is processed or put in the socket receive queue...
      This way, we can limit the streaming latency to one RTT as committed in
      Nagle mode.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7268589
  16. 05 5月, 2020 1 次提交
    • T
      tipc: fix partial topology connection closure · 980d6927
      Tuong Lien 提交于
      When an application connects to the TIPC topology server and subscribes
      to some services, a new connection is created along with some objects -
      'tipc_subscription' to store related data correspondingly...
      However, there is one omission in the connection handling that when the
      connection or application is orderly shutdown (e.g. via SIGQUIT, etc.),
      the connection is not closed in kernel, the 'tipc_subscription' objects
      are not freed too.
      This results in:
      - The maximum number of subscriptions (65535) will be reached soon, new
      subscriptions will be rejected;
      - TIPC module cannot be removed (unless the objects  are somehow forced
      to release first);
      
      The commit fixes the issue by closing the connection if the 'recvmsg()'
      returns '0' i.e. when the peer is shutdown gracefully. It also includes
      the other unexpected cases.
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      980d6927
  17. 19 4月, 2020 2 次提交
    • X
      tipc: Fix potential tipc_node refcnt leak in tipc_rcv · de058420
      Xiyu Yang 提交于
      tipc_rcv() invokes tipc_node_find() twice, which returns a reference of
      the specified tipc_node object to "n" with increased refcnt.
      
      When tipc_rcv() returns or a new object is assigned to "n", the original
      local reference of "n" becomes invalid, so the refcount should be
      decreased to keep refcount balanced.
      
      The issue happens in some paths of tipc_rcv(), which forget to decrease
      the refcnt increased by tipc_node_find() and will cause a refcnt leak.
      
      Fix this issue by calling tipc_node_put() before the original object
      pointed by "n" becomes invalid.
      Signed-off-by: NXiyu Yang <xiyuyang19@fudan.edu.cn>
      Signed-off-by: NXin Tan <tanxin.ctf@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de058420
    • X
      tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv · 441870ee
      Xiyu Yang 提交于
      tipc_crypto_rcv() invokes tipc_aead_get(), which returns a reference of
      the tipc_aead object to "aead" with increased refcnt.
      
      When tipc_crypto_rcv() returns, the original local reference of "aead"
      becomes invalid, so the refcount should be decreased to keep refcount
      balanced.
      
      The issue happens in one error path of tipc_crypto_rcv(). When TIPC
      message decryption status is EINPROGRESS or EBUSY, the function forgets
      to decrease the refcnt increased by tipc_aead_get() and causes a refcnt
      leak.
      
      Fix this issue by calling tipc_aead_put() on the error path when TIPC
      message decryption status is EINPROGRESS or EBUSY.
      Signed-off-by: NXiyu Yang <xiyuyang19@fudan.edu.cn>
      Signed-off-by: NXin Tan <tanxin.ctf@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      441870ee
  18. 16 4月, 2020 1 次提交
    • T
      tipc: fix incorrect increasing of link window · edadedf1
      Tuong Lien 提交于
      In commit 16ad3f40 ("tipc: introduce variable window congestion
      control"), we allow link window to change with the congestion avoidance
      algorithm. However, there is a bug that during the slow-start if packet
      retransmission occurs, the link will enter the fast-recovery phase, set
      its window to the 'ssthresh' which is never less than 300, so the link
      window suddenly increases to that limit instead of decreasing.
      
      Consequently, two issues have been observed:
      
      - For broadcast-link: it can leave a gap between the link queues that a
      new packet will be inserted and sent before the previous ones, i.e. not
      in-order.
      
      - For unicast: the algorithm does not work as expected, the link window
      jumps to the slow-start threshold whereas packet retransmission occurs.
      
      This commit fixes the issues by avoiding such the link window increase,
      but still decreasing if the 'ssthresh' is lowered.
      
      Fixes: 16ad3f40 ("tipc: introduce variable window congestion control")
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      edadedf1
  19. 27 3月, 2020 1 次提交
  20. 15 3月, 2020 2 次提交
  21. 04 3月, 2020 1 次提交
  22. 10 2月, 2020 2 次提交
    • T
      tipc: fix successful connect() but timed out · 5391a877
      Tuong Lien 提交于
      In commit 9546a0b7 ("tipc: fix wrong connect() return code"), we
      fixed the issue with the 'connect()' that returns zero even though the
      connecting has failed by waiting for the connection to be 'ESTABLISHED'
      really. However, the approach has one drawback in conjunction with our
      'lightweight' connection setup mechanism that the following scenario
      can happen:
      
                (server)                        (client)
      
         +- accept()|                      |             wait_for_conn()
         |          |                      |connect() -------+
         |          |<-------[SYN]---------|                 > sleeping
         |          |                      *CONNECTING       |
         |--------->*ESTABLISHED           |                 |
                    |--------[ACK]-------->*ESTABLISHED      > wakeup()
              send()|--------[DATA]------->|\                > wakeup()
              send()|--------[DATA]------->| |               > wakeup()
                .   .          .           . |-> recvq       .
                .   .          .           . |               .
              send()|--------[DATA]------->|/                > wakeup()
             close()|--------[FIN]-------->*DISCONNECTING    |
                    *DISCONNECTING         |                 |
                    |                      ~~~~~~~~~~~~~~~~~~> schedule()
                                                             | wait again
                                                             .
                                                             .
                                                             | ETIMEDOUT
      
      Upon the receipt of the server 'ACK', the client becomes 'ESTABLISHED'
      and the 'wait_for_conn()' process is woken up but not run. Meanwhile,
      the server starts to send a number of data following by a 'close()'
      shortly without waiting any response from the client, which then forces
      the client socket to be 'DISCONNECTING' immediately. When the wait
      process is switched to be running, it continues to wait until the timer
      expires because of the unexpected socket state. The client 'connect()'
      will finally get ‘-ETIMEDOUT’ and force to release the socket whereas
      there remains the messages in its receive queue.
      
      Obviously the issue would not happen if the server had some delay prior
      to its 'close()' (or the number of 'DATA' messages is large enough),
      but any kind of delay would make the connection setup/shutdown "heavy".
      We solve this by simply allowing the 'connect()' returns zero in this
      particular case. The socket is already 'DISCONNECTING', so any further
      write will get '-EPIPE' but the socket is still able to read the
      messages existing in its receive queue.
      
      Note: This solution doesn't break the previous one as it deals with a
      different situation that the socket state is 'DISCONNECTING' but has no
      error (i.e. sk->sk_err = 0).
      
      Fixes: 9546a0b7 ("tipc: fix wrong connect() return code")
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5391a877
    • C
      tipc: make three functions static · 2437fd7b
      Chen Wandun 提交于
      Fix the following sparse warning:
      
      net/tipc/node.c:281:6: warning: symbol 'tipc_node_free' was not declared. Should it be static?
      net/tipc/node.c:2801:5: warning: symbol '__tipc_nl_node_set_key' was not declared. Should it be static?
      net/tipc/node.c:2878:5: warning: symbol '__tipc_nl_node_flush_key' was not declared. Should it be static?
      
      Fixes: fc1b6d6d ("tipc: introduce TIPC encryption & authentication")
      Fixes: e1f32190 ("tipc: add support for AEAD key setting via netlink")
      Signed-off-by: NChen Wandun <chenwandun@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2437fd7b
  23. 09 1月, 2020 4 次提交
    • T
      tipc: fix wrong connect() return code · 9546a0b7
      Tuong Lien 提交于
      The current 'tipc_wait_for_connect()' function does a wait-loop for the
      condition 'sk->sk_state != TIPC_CONNECTING' to conclude if the socket
      connecting has done. However, when the condition is met, it returns '0'
      even in the case the connecting is actually failed, the socket state is
      set to 'TIPC_DISCONNECTING' (e.g. when the server socket has closed..).
      This results in a wrong return code for the 'connect()' call from user,
      making it believe that the connection is established and go ahead with
      building, sending a message, etc. but finally failed e.g. '-EPIPE'.
      
      This commit fixes the issue by changing the wait condition to the
      'tipc_sk_connected(sk)', so the function will return '0' only when the
      connection is really established. Otherwise, either the socket 'sk_err'
      if any or '-ETIMEDOUT'/'-EINTR' will be returned correspondingly.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9546a0b7
    • T
      tipc: fix link overflow issue at socket shutdown · 49afb806
      Tuong Lien 提交于
      When a socket is suddenly shutdown or released, it will reject all the
      unreceived messages in its receive queue. This applies to a connected
      socket too, whereas there is only one 'FIN' message required to be sent
      back to its peer in this case.
      
      In case there are many messages in the queue and/or some connections
      with such messages are shutdown at the same time, the link layer will
      easily get overflowed at the 'TIPC_SYSTEM_IMPORTANCE' backlog level
      because of the message rejections. As a result, the link will be taken
      down. Moreover, immediately when the link is re-established, the socket
      layer can continue to reject the messages and the same issue happens...
      
      The commit refactors the '__tipc_shutdown()' function to only send one
      'FIN' in the situation mentioned above. For the connectionless case, it
      is unavoidable but usually there is no rejections for such socket
      messages because they are 'dest-droppable' by default.
      
      In addition, the new code makes the other socket states clear
      (e.g.'TIPC_LISTEN') and treats as a separate case to avoid misbehaving.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49afb806
    • M
      tipc: remove meaningless assignment in Makefile · b969fee1
      Masahiro Yamada 提交于
      There is no module named tipc_diag.
      
      The assignment to tipc_diag-y has no effect.
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b969fee1
    • M
      tipc: do not add socket.o to tipc-y twice · ea04b445
      Masahiro Yamada 提交于
      net/tipc/Makefile adds socket.o twice.
      
      tipc-y	+= addr.o bcast.o bearer.o \
                 core.o link.o discover.o msg.o  \
                 name_distr.o  subscr.o monitor.o name_table.o net.o  \
                 netlink.o netlink_compat.o node.o socket.o eth_media.o \
                                                   ^^^^^^^^
                 topsrv.o socket.o group.o trace.o
                          ^^^^^^^^
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea04b445
  24. 07 1月, 2020 1 次提交
    • Y
      tipc: eliminate KMSAN: uninit-value in __tipc_nl_compat_dumpit error · a7869e5f
      Ying Xue 提交于
      syzbot found the following crash on:
      =====================================================
      BUG: KMSAN: uninit-value in __nlmsg_parse include/net/netlink.h:661 [inline]
      BUG: KMSAN: uninit-value in nlmsg_parse_deprecated
      include/net/netlink.h:706 [inline]
      BUG: KMSAN: uninit-value in __tipc_nl_compat_dumpit+0x553/0x11e0
      net/tipc/netlink_compat.c:215
      CPU: 0 PID: 12425 Comm: syz-executor062 Not tainted 5.5.0-rc1-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:77 [inline]
        dump_stack+0x1c9/0x220 lib/dump_stack.c:118
        kmsan_report+0x128/0x220 mm/kmsan/kmsan_report.c:108
        __msan_warning+0x57/0xa0 mm/kmsan/kmsan_instr.c:245
        __nlmsg_parse include/net/netlink.h:661 [inline]
        nlmsg_parse_deprecated include/net/netlink.h:706 [inline]
        __tipc_nl_compat_dumpit+0x553/0x11e0 net/tipc/netlink_compat.c:215
        tipc_nl_compat_dumpit+0x761/0x910 net/tipc/netlink_compat.c:308
        tipc_nl_compat_handle net/tipc/netlink_compat.c:1252 [inline]
        tipc_nl_compat_recv+0x12e9/0x2870 net/tipc/netlink_compat.c:1311
        genl_family_rcv_msg_doit net/netlink/genetlink.c:672 [inline]
        genl_family_rcv_msg net/netlink/genetlink.c:717 [inline]
        genl_rcv_msg+0x1dd0/0x23a0 net/netlink/genetlink.c:734
        netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477
        genl_rcv+0x63/0x80 net/netlink/genetlink.c:745
        netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
        netlink_unicast+0xfa0/0x1100 net/netlink/af_netlink.c:1328
        netlink_sendmsg+0x11f0/0x1480 net/netlink/af_netlink.c:1917
        sock_sendmsg_nosec net/socket.c:639 [inline]
        sock_sendmsg net/socket.c:659 [inline]
        ____sys_sendmsg+0x1362/0x13f0 net/socket.c:2330
        ___sys_sendmsg net/socket.c:2384 [inline]
        __sys_sendmsg+0x4f0/0x5e0 net/socket.c:2417
        __do_sys_sendmsg net/socket.c:2426 [inline]
        __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424
        __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424
        do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:295
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x444179
      Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7
      48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
      ff 0f 83 1b d8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffd2d6409c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00000000004002e0 RCX: 0000000000444179
      RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003
      RBP: 00000000006ce018 R08: 0000000000000000 R09: 00000000004002e0
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401e20
      R13: 0000000000401eb0 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was created at:
        kmsan_save_stack_with_flags mm/kmsan/kmsan.c:149 [inline]
        kmsan_internal_poison_shadow+0x5c/0x110 mm/kmsan/kmsan.c:132
        kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:86
        slab_alloc_node mm/slub.c:2774 [inline]
        __kmalloc_node_track_caller+0xe47/0x11f0 mm/slub.c:4382
        __kmalloc_reserve net/core/skbuff.c:141 [inline]
        __alloc_skb+0x309/0xa50 net/core/skbuff.c:209
        alloc_skb include/linux/skbuff.h:1049 [inline]
        nlmsg_new include/net/netlink.h:888 [inline]
        tipc_nl_compat_dumpit+0x6e4/0x910 net/tipc/netlink_compat.c:301
        tipc_nl_compat_handle net/tipc/netlink_compat.c:1252 [inline]
        tipc_nl_compat_recv+0x12e9/0x2870 net/tipc/netlink_compat.c:1311
        genl_family_rcv_msg_doit net/netlink/genetlink.c:672 [inline]
        genl_family_rcv_msg net/netlink/genetlink.c:717 [inline]
        genl_rcv_msg+0x1dd0/0x23a0 net/netlink/genetlink.c:734
        netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477
        genl_rcv+0x63/0x80 net/netlink/genetlink.c:745
        netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
        netlink_unicast+0xfa0/0x1100 net/netlink/af_netlink.c:1328
        netlink_sendmsg+0x11f0/0x1480 net/netlink/af_netlink.c:1917
        sock_sendmsg_nosec net/socket.c:639 [inline]
        sock_sendmsg net/socket.c:659 [inline]
        ____sys_sendmsg+0x1362/0x13f0 net/socket.c:2330
        ___sys_sendmsg net/socket.c:2384 [inline]
        __sys_sendmsg+0x4f0/0x5e0 net/socket.c:2417
        __do_sys_sendmsg net/socket.c:2426 [inline]
        __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424
        __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424
        do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:295
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      =====================================================
      
      The complaint above occurred because the memory region pointed by attrbuf
      variable was not initialized. To eliminate this warning, we use kcalloc()
      rather than kmalloc_array() to allocate memory for attrbuf.
      
      Reported-by: syzbot+b1fd2bf2c89d8407e15f@syzkaller.appspotmail.com
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7869e5f
  25. 21 12月, 2019 1 次提交
  26. 18 12月, 2019 1 次提交
    • J
      tipc: don't send gap blocks in ACK messages · b7ffa045
      Jon Maloy 提交于
      In the commit referred to below we eliminated sending of the 'gap'
      indicator in regular ACK messages, reserving this to explicit NACK
      ditto.
      
      Unfortunately we missed to also eliminate building of the 'gap block'
      area in ACK messages. This area is meant to report gaps in the
      received packet sequence following the initial gap, so that lost
      packets can be retransmitted earlier and received out-of-sequence
      packets can be released earlier. However, the interpretation of those
      blocks is dependent on a complete and correct sequence of gaps and
      acks. Hence, when the initial gap indicator is missing a single gap
      block will be interpreted as an acknowledgment of all preceding
      packets. This may lead to packets being released prematurely from the
      sender's transmit queue, with easily predicatble consequences.
      
      We now fix this by not building any gap block area if there is no
      initial gap to report.
      
      Fixes: commit 02288248 ("tipc: eliminate gap indicator from ACK messages")
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b7ffa045