1. 27 5月, 2020 7 次提交
    • T
      tipc: add test for Nagle algorithm effectiveness · 0a3e060f
      Tuong Lien 提交于
      When streaming in Nagle mode, we try to bundle small messages from user
      as many as possible if there is one outstanding buffer, i.e. not ACK-ed
      by the receiving side, which helps boost up the overall throughput. So,
      the algorithm's effectiveness really depends on when Nagle ACK comes or
      what the specific network latency (RTT) is, compared to the user's
      message sending rate.
      
      In a bad case, the user's sending rate is low or the network latency is
      small, there will not be many bundles, so making a Nagle ACK or waiting
      for it is not meaningful.
      For example: a user sends its messages every 100ms and the RTT is 50ms,
      then for each messages, we require one Nagle ACK but then there is only
      one user message sent without any bundles.
      
      In a better case, even if we have a few bundles (e.g. the RTT = 300ms),
      but now the user sends messages in medium size, then there will not be
      any difference at all, that says 3 x 1000-byte data messages if bundled
      will still result in 3 bundles with MTU = 1500.
      
      When Nagle is ineffective, the delay in user message sending is clearly
      wasted instead of sending directly.
      
      Besides, adding Nagle ACKs will consume some processor load on both the
      sending and receiving sides.
      
      This commit adds a test on the effectiveness of the Nagle algorithm for
      an individual connection in the network on which it actually runs.
      Particularly, upon receipt of a Nagle ACK we will compare the number of
      bundles in the backlog queue to the number of user messages which would
      be sent directly without Nagle. If the ratio is good (e.g. >= 2), Nagle
      mode will be kept for further message sending. Otherwise, we will leave
      Nagle and put a 'penalty' on the connection, so it will have to spend
      more 'one-way' messages before being able to re-enter Nagle.
      
      In addition, the 'ack-required' bit is only set when really needed that
      the number of Nagle ACKs will be reduced during Nagle mode.
      
      Testing with benchmark showed that with the patch, there was not much
      difference in throughput for small messages since the tool continuously
      sends messages without a break, so Nagle would still take in effect.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a3e060f
    • T
      tipc: add support for broadcast rcv stats dumping · 03b6fefd
      Tuong Lien 提交于
      This commit enables dumping the statistics of a broadcast-receiver link
      like the traditional 'broadcast-link' one (which is for broadcast-
      sender). The link dumping can be triggered via netlink (e.g. the
      iproute2/tipc tool) by the link flag - 'TIPC_NLA_LINK_BROADCAST' as the
      indicator.
      
      The name of a broadcast-receiver link of a specific peer will be in the
      format: 'broadcast-link:<peer-id>'.
      
      For example:
      
      Link <broadcast-link:1001002>
        Window:50 packets
        RX packets:7841 fragments:2408/440 bundles:0/0
        TX packets:0 fragments:0/0 bundles:0/0
        RX naks:0 defs:124 dups:0
        TX naks:21 acks:0 retrans:0
        Congestion link:0  Send queue max:0 avg:0
      
      In addition, the broadcast-receiver link statistics can be reset in the
      usual way via netlink by specifying that link name in command.
      
      Note: the 'tipc_link_name_ext()' is removed because the link name can
      now be retrieved simply via the 'l->name'.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      03b6fefd
    • T
      tipc: enable broadcast retrans via unicast · a91d55d1
      Tuong Lien 提交于
      In some environment, broadcast traffic is suppressed at high rate (i.e.
      a kind of bandwidth limit setting). When it is applied, TIPC broadcast
      can still run successfully. However, when it comes to a high load, some
      packets will be dropped first and TIPC tries to retransmit them but the
      packet retransmission is intentionally broadcast too, so making things
      worse and not helpful at all.
      
      This commit enables the broadcast retransmission via unicast which only
      retransmits packets to the specific peer that has really reported a gap
      i.e. not broadcasting to all nodes in the cluster, so will prevent from
      being suppressed, and also reduce some overheads on the other peers due
      to duplicates, finally improve the overall TIPC broadcast performance.
      
      Note: the functionality can be turned on/off via the sysctl file:
      
      echo 1 > /proc/sys/net/tipc/bc_retruni
      echo 0 > /proc/sys/net/tipc/bc_retruni
      
      Default is '0', i.e. the broadcast retransmission still works as usual.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a91d55d1
    • T
      tipc: add back link trace events · c6ed7a5c
      Tuong Lien 提交于
      In the previous commit ("tipc: add Gap ACK blocks support for broadcast
      link"), we have removed the following link trace events due to the code
      changes:
      
      - tipc_link_bc_ack
      - tipc_link_retrans
      
      This commit adds them back along with some minor changes to adapt to
      the new code.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c6ed7a5c
    • T
      tipc: introduce Gap ACK blocks for broadcast link · d7626b5a
      Tuong Lien 提交于
      As achieved through commit 9195948f ("tipc: improve TIPC throughput
      by Gap ACK blocks"), we apply the same mechanism for the broadcast link
      as well. The 'Gap ACK blocks' data field in a 'PROTOCOL/STATE_MSG' will
      consist of two parts built for both the broadcast and unicast types:
      
       31                       16 15                        0
      +-------------+-------------+-------------+-------------+
      |  bgack_cnt  |  ugack_cnt  |            len            |
      +-------------+-------------+-------------+-------------+  -
      |            gap            |            ack            |   |
      +-------------+-------------+-------------+-------------+    > bc gacks
      :                           :                           :   |
      +-------------+-------------+-------------+-------------+  -
      |            gap            |            ack            |   |
      +-------------+-------------+-------------+-------------+    > uc gacks
      :                           :                           :   |
      +-------------+-------------+-------------+-------------+  -
      
      which is "automatically" backward-compatible.
      
      We also increase the max number of Gap ACK blocks to 128, allowing upto
      64 blocks per type (total buffer size = 516 bytes).
      
      Besides, the 'tipc_link_advance_transmq()' function is refactored which
      is applicable for both the unicast and broadcast cases now, so some old
      functions can be removed and the code is optimized.
      
      With the patch, TIPC broadcast is more robust regardless of packet loss
      or disorder, latency, ... in the underlying network. Its performance is
      boost up significantly.
      For example, experiment with a 5% packet loss rate results:
      
      $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000
      real    0m 42.46s
      user    0m 1.16s
      sys     0m 17.67s
      
      Without the patch:
      
      $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000
      real    8m 27.94s
      user    0m 0.55s
      sys     0m 2.38s
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7626b5a
    • Y
      qed: Add EDPM mode type for user-fw compatibility · ff937b91
      Yuval Basson 提交于
      In older FW versions the completion flag was treated as the ack flag in
      edpm messages. Expose the FW option of setting which mode the QP is in
      by adding a flag to the qedr <-> qed API.
      
      Flag is added for backward compatibility with libqedr.
      This flag will be set by qedr after determining whether the libqedr is
      using the updated version.
      
      Fixes: f1093940 ("qed: Add support for QP verbs")
      Signed-off-by: NYuval Basson <yuval.bason@marvell.com>
      Signed-off-by: NMichal Kalderon <michal.kalderon@marvell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff937b91
    • E
      tcp: tcp_v4_err() icmp skb is named icmp_skb · 23917494
      Eric Dumazet 提交于
      I missed the fact that tcp_v4_err() differs from tcp_v6_err().
      
      After commit 4d1a2d9e ("Rename skb to icmp_skb in tcp_v4_err()")
      the skb argument has been renamed to icmp_skb only in one function.
      
      I will in a future patch reconciliate these functions to avoid
      this kind of confusion.
      
      Fixes: 45af29ca ("tcp: allow traceroute -Mtcp for unpriv users")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23917494
  2. 26 5月, 2020 13 次提交
  3. 25 5月, 2020 17 次提交
  4. 24 5月, 2020 3 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · caffb99b
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Fix RCU warnings in ipv6 multicast router code, from Madhuparna
          Bhowmik.
      
       2) Nexthop attributes aren't being checked properly because of
          mis-initialized iterator, from David Ahern.
      
       3) Revert iop_idents_reserve() change as it caused performance
          regressions and was just working around what is really a UBSAN bug
          in the compiler. From Yuqi Jin.
      
       4) Read MAC address properly from ROM in bmac driver (double iteration
          proceeds past end of address array), from Jeremy Kerr.
      
       5) Add Microsoft Surface device IDs to r8152, from Marc Payne.
      
       6) Prevent reference to freed SKB in __netif_receive_skb_core(), from
          Boris Sukholitko.
      
       7) Fix ACK discard behavior in rxrpc, from David Howells.
      
       8) Preserve flow hash across packet scrubbing in wireguard, from Jason
          A. Donenfeld.
      
       9) Cap option length properly for SO_BINDTODEVICE in AX25, from Eric
          Dumazet.
      
      10) Fix encryption error checking in kTLS code, from Vadim Fedorenko.
      
      11) Missing BPF prog ref release in flow dissector, from Jakub Sitnicki.
      
      12) dst_cache must be used with BH disabled in tipc, from Eric Dumazet.
      
      13) Fix use after free in mlxsw driver, from Jiri Pirko.
      
      14) Order kTLS key destruction properly in mlx5 driver, from Tariq
          Toukan.
      
      15) Check devm_platform_ioremap_resource() return value properly in
          several drivers, from Tiezhu Yang.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (71 commits)
        net: smsc911x: Fix runtime PM imbalance on error
        net/mlx4_core: fix a memory leak bug.
        net: ethernet: ti: cpsw: fix ASSERT_RTNL() warning during suspend
        net: phy: mscc: fix initialization of the MACsec protocol mode
        net: stmmac: don't attach interface until resume finishes
        net: Fix return value about devm_platform_ioremap_resource()
        net/mlx5: Fix error flow in case of function_setup failure
        net/mlx5e: CT: Correctly get flow rule
        net/mlx5e: Update netdev txq on completions during closure
        net/mlx5: Annotate mutex destroy for root ns
        net/mlx5: Don't maintain a case of del_sw_func being null
        net/mlx5: Fix cleaning unmanaged flow tables
        net/mlx5: Fix memory leak in mlx5_events_init
        net/mlx5e: Fix inner tirs handling
        net/mlx5e: kTLS, Destroy key object after destroying the TIS
        net/mlx5e: Fix allowed tc redirect merged eswitch offload cases
        net/mlx5: Avoid processing commands before cmdif is ready
        net/mlx5: Fix a race when moving command interface to events mode
        net/mlx5: Add command entry handling completion
        rxrpc: Fix a memory leak in rxkad_verify_response()
        ...
      caffb99b
    • H
      ethtool: propagate get_coalesce return value · 31610711
      Heiner Kallweit 提交于
      get_coalesce returns 0 or ERRNO, but the return value isn't checked.
      The returned coalesce data may be invalid if an ERRNO is set,
      therefore better check and propagate the return value.
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31610711
    • D
      Merge branch 'net-provide-a-devres-variant-of-register_netdev' · c0096a28
      David S. Miller 提交于
      Bartosz Golaszewski says:
      
      ====================
      net: provide a devres variant of register_netdev()
      
      Using devres helpers allows to shrink the probing code, avoid memory leaks in
      error paths make sure the order in which resources are freed is the exact
      opposite of their allocation. This series proposes to add a devres variant
      of register_netdev() that will only work with net_device structures whose
      memory is also managed.
      
      First we add the missing documentation entry for the only other networking
      devres helper: devm_alloc_etherdev().
      
      Next we move devm_alloc_etherdev() into a separate source file.
      
      We then use a proxy structure in devm_alloc_etherdev() to improve readability.
      
      Last: we implement devm_register_netdev() and use it in mtk-eth-mac driver.
      
      v1 -> v2:
      - rebase on top of net-next after driver rename, no functional changes
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0096a28