1. 29 5月, 2020 13 次提交
  2. 28 5月, 2020 12 次提交
  3. 27 5月, 2020 13 次提交
    • A
      net: ethtool: Allow PHY cable test TDR data to configured · f2bc8ad3
      Andrew Lunn 提交于
      Allow the user to configure where on the cable the TDR data should be
      retrieved, in terms of first and last sample, and the step between
      samples. Also add the ability to ask for TDR data for just one pair.
      
      If this configuration is not provided, it defaults to 1-150m at 1m
      intervals for all pairs.
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      
      v3:
      Move the TDR configuration into a structure
      Add a range check on step
      Use NL_SET_ERR_MSG_ATTR() when appropriate
      Move TDR configuration into a nest
      Document attributes in the request
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f2bc8ad3
    • A
      net: ethtool: Add helpers for cable test TDR data · 6b4a0fc1
      Andrew Lunn 提交于
      Add helpers for returning raw TDR helpers in netlink messages.
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b4a0fc1
    • A
      net: ethtool: Add generic parts of cable test TDR · 1a644de2
      Andrew Lunn 提交于
      Add the generic parts of the code used to trigger a cable test and
      return raw TDR data. Any PHY driver which support this must implement
      the new driver op.
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      
      v2
      Update nxp-tja11xx for API change.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a644de2
    • F
      mptcp: attempt coalescing when moving skbs to mptcp rx queue · 4e637c70
      Florian Westphal 提交于
      We can try to coalesce skbs we take from the subflows rx queue with the
      tail of the mptcp rx queue.
      
      If successful, the skb head can be discarded early.
      
      We can also free the skb extensions, we do not access them after this.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e637c70
    • D
      net/smc: mark smc_pnet_policy as const · 09d0310f
      Dmitry Vyukov 提交于
      Netlink policies are generally declared as const.
      This is safer and prevents potential bugs.
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09d0310f
    • G
      cls_flower: Support filtering on multiple MPLS Label Stack Entries · 61aec25a
      Guillaume Nault 提交于
      With struct flow_dissector_key_mpls now recording the first
      FLOW_DIS_MPLS_MAX labels, we can extend Flower to filter on any of
      these LSEs independently.
      
      In order to avoid creating new netlink attributes for every possible
      depth, let's define a new TCA_FLOWER_KEY_MPLS_OPTS nested attribute
      that contains the list of LSEs to match. Each LSE is represented by
      another attribute, TCA_FLOWER_KEY_MPLS_OPTS_LSE, which then contains
      the attributes representing the depth and the MPLS fields to match at
      this depth (label, TTL, etc.).
      
      For each MPLS field, the mask is always set to all-ones, as this is
      what the original API did. We could allow user configurable masks in
      the future if there is demand for more flexibility.
      
      The new API also allows to only specify an LSE depth. In that case,
      Flower only verifies that the MPLS label stack depth is greater or
      equal to the provided depth (that is, an LSE exists at this depth).
      
      Filters that only match on one (or more) fields of the first LSE are
      dumped using the old netlink attributes, to avoid confusing user space
      programs that don't understand the new API.
      Signed-off-by: NGuillaume Nault <gnault@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61aec25a
    • G
      flow_dissector: Parse multiple MPLS Label Stack Entries · 58cff782
      Guillaume Nault 提交于
      The current MPLS dissector only parses the first MPLS Label Stack
      Entry (second LSE can be parsed too, but only to set a key_id).
      
      This patch adds the possibility to parse several LSEs by making
      __skb_flow_dissect_mpls() return FLOW_DISSECT_RET_PROTO_AGAIN as long
      as the Bottom Of Stack bit hasn't been seen, up to a maximum of
      FLOW_DIS_MPLS_MAX entries.
      
      FLOW_DIS_MPLS_MAX is arbitrarily set to 7. This should be enough for
      many practical purposes, without wasting too much space.
      
      To record the parsed values, flow_dissector_key_mpls is modified to
      store an array of stack entries, instead of just the values of the
      first one. A bit field, "used_lses", is also added to keep track of
      the LSEs that have been set. The objective is to avoid defining a
      new FLOW_DISSECTOR_KEY_MPLS_XX for each level of the MPLS stack.
      
      TC flower is adapted for the new struct flow_dissector_key_mpls layout.
      Matching on several MPLS Label Stack Entries will be added in the next
      patch.
      
      The NFP and MLX5 drivers are also adapted: nfp_flower_compile_mac() and
      mlx5's parse_tunnel() now verify that the rule only uses the first LSE
      and fail if it doesn't.
      
      Finally, the behaviour of the FLOW_DISSECTOR_KEY_MPLS_ENTROPY key is
      slightly modified. Instead of recording the first Entropy Label, it
      now records the last one. This shouldn't have any consequences since
      there doesn't seem to have any user of FLOW_DISSECTOR_KEY_MPLS_ENTROPY
      in the tree. We'd probably better do a hash of all parsed MPLS labels
      instead (excluding reserved labels) anyway. That'd give better entropy
      and would probably also simplify the code. But that's not the purpose
      of this patch, so I'm keeping that as a future possible improvement.
      Signed-off-by: NGuillaume Nault <gnault@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58cff782
    • T
      tipc: add test for Nagle algorithm effectiveness · 0a3e060f
      Tuong Lien 提交于
      When streaming in Nagle mode, we try to bundle small messages from user
      as many as possible if there is one outstanding buffer, i.e. not ACK-ed
      by the receiving side, which helps boost up the overall throughput. So,
      the algorithm's effectiveness really depends on when Nagle ACK comes or
      what the specific network latency (RTT) is, compared to the user's
      message sending rate.
      
      In a bad case, the user's sending rate is low or the network latency is
      small, there will not be many bundles, so making a Nagle ACK or waiting
      for it is not meaningful.
      For example: a user sends its messages every 100ms and the RTT is 50ms,
      then for each messages, we require one Nagle ACK but then there is only
      one user message sent without any bundles.
      
      In a better case, even if we have a few bundles (e.g. the RTT = 300ms),
      but now the user sends messages in medium size, then there will not be
      any difference at all, that says 3 x 1000-byte data messages if bundled
      will still result in 3 bundles with MTU = 1500.
      
      When Nagle is ineffective, the delay in user message sending is clearly
      wasted instead of sending directly.
      
      Besides, adding Nagle ACKs will consume some processor load on both the
      sending and receiving sides.
      
      This commit adds a test on the effectiveness of the Nagle algorithm for
      an individual connection in the network on which it actually runs.
      Particularly, upon receipt of a Nagle ACK we will compare the number of
      bundles in the backlog queue to the number of user messages which would
      be sent directly without Nagle. If the ratio is good (e.g. >= 2), Nagle
      mode will be kept for further message sending. Otherwise, we will leave
      Nagle and put a 'penalty' on the connection, so it will have to spend
      more 'one-way' messages before being able to re-enter Nagle.
      
      In addition, the 'ack-required' bit is only set when really needed that
      the number of Nagle ACKs will be reduced during Nagle mode.
      
      Testing with benchmark showed that with the patch, there was not much
      difference in throughput for small messages since the tool continuously
      sends messages without a break, so Nagle would still take in effect.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a3e060f
    • T
      tipc: add support for broadcast rcv stats dumping · 03b6fefd
      Tuong Lien 提交于
      This commit enables dumping the statistics of a broadcast-receiver link
      like the traditional 'broadcast-link' one (which is for broadcast-
      sender). The link dumping can be triggered via netlink (e.g. the
      iproute2/tipc tool) by the link flag - 'TIPC_NLA_LINK_BROADCAST' as the
      indicator.
      
      The name of a broadcast-receiver link of a specific peer will be in the
      format: 'broadcast-link:<peer-id>'.
      
      For example:
      
      Link <broadcast-link:1001002>
        Window:50 packets
        RX packets:7841 fragments:2408/440 bundles:0/0
        TX packets:0 fragments:0/0 bundles:0/0
        RX naks:0 defs:124 dups:0
        TX naks:21 acks:0 retrans:0
        Congestion link:0  Send queue max:0 avg:0
      
      In addition, the broadcast-receiver link statistics can be reset in the
      usual way via netlink by specifying that link name in command.
      
      Note: the 'tipc_link_name_ext()' is removed because the link name can
      now be retrieved simply via the 'l->name'.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      03b6fefd
    • T
      tipc: enable broadcast retrans via unicast · a91d55d1
      Tuong Lien 提交于
      In some environment, broadcast traffic is suppressed at high rate (i.e.
      a kind of bandwidth limit setting). When it is applied, TIPC broadcast
      can still run successfully. However, when it comes to a high load, some
      packets will be dropped first and TIPC tries to retransmit them but the
      packet retransmission is intentionally broadcast too, so making things
      worse and not helpful at all.
      
      This commit enables the broadcast retransmission via unicast which only
      retransmits packets to the specific peer that has really reported a gap
      i.e. not broadcasting to all nodes in the cluster, so will prevent from
      being suppressed, and also reduce some overheads on the other peers due
      to duplicates, finally improve the overall TIPC broadcast performance.
      
      Note: the functionality can be turned on/off via the sysctl file:
      
      echo 1 > /proc/sys/net/tipc/bc_retruni
      echo 0 > /proc/sys/net/tipc/bc_retruni
      
      Default is '0', i.e. the broadcast retransmission still works as usual.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a91d55d1
    • T
      tipc: add back link trace events · c6ed7a5c
      Tuong Lien 提交于
      In the previous commit ("tipc: add Gap ACK blocks support for broadcast
      link"), we have removed the following link trace events due to the code
      changes:
      
      - tipc_link_bc_ack
      - tipc_link_retrans
      
      This commit adds them back along with some minor changes to adapt to
      the new code.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c6ed7a5c
    • T
      tipc: introduce Gap ACK blocks for broadcast link · d7626b5a
      Tuong Lien 提交于
      As achieved through commit 9195948f ("tipc: improve TIPC throughput
      by Gap ACK blocks"), we apply the same mechanism for the broadcast link
      as well. The 'Gap ACK blocks' data field in a 'PROTOCOL/STATE_MSG' will
      consist of two parts built for both the broadcast and unicast types:
      
       31                       16 15                        0
      +-------------+-------------+-------------+-------------+
      |  bgack_cnt  |  ugack_cnt  |            len            |
      +-------------+-------------+-------------+-------------+  -
      |            gap            |            ack            |   |
      +-------------+-------------+-------------+-------------+    > bc gacks
      :                           :                           :   |
      +-------------+-------------+-------------+-------------+  -
      |            gap            |            ack            |   |
      +-------------+-------------+-------------+-------------+    > uc gacks
      :                           :                           :   |
      +-------------+-------------+-------------+-------------+  -
      
      which is "automatically" backward-compatible.
      
      We also increase the max number of Gap ACK blocks to 128, allowing upto
      64 blocks per type (total buffer size = 516 bytes).
      
      Besides, the 'tipc_link_advance_transmq()' function is refactored which
      is applicable for both the unicast and broadcast cases now, so some old
      functions can be removed and the code is optimized.
      
      With the patch, TIPC broadcast is more robust regardless of packet loss
      or disorder, latency, ... in the underlying network. Its performance is
      boost up significantly.
      For example, experiment with a 5% packet loss rate results:
      
      $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000
      real    0m 42.46s
      user    0m 1.16s
      sys     0m 17.67s
      
      Without the patch:
      
      $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000
      real    8m 27.94s
      user    0m 0.55s
      sys     0m 2.38s
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jmaloy@redhat.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7626b5a
    • E
      tcp: tcp_v4_err() icmp skb is named icmp_skb · 23917494
      Eric Dumazet 提交于
      I missed the fact that tcp_v4_err() differs from tcp_v6_err().
      
      After commit 4d1a2d9e ("Rename skb to icmp_skb in tcp_v4_err()")
      the skb argument has been renamed to icmp_skb only in one function.
      
      I will in a future patch reconciliate these functions to avoid
      this kind of confusion.
      
      Fixes: 45af29ca ("tcp: allow traceroute -Mtcp for unpriv users")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23917494
  4. 26 5月, 2020 2 次提交