提交 · ddd061b8daed3ce0c01109a69c9a2a9f9669f01a · openeuler / Kernel

29 5月, 2020 13 次提交

tcp: add tcp_sock_set_quickack · ddd061b8

由 Christoph Hellwig 提交于 5月 28, 2020

Add a helper to directly set the TCP_QUICKACK sockopt from kernel space
without going through a fake uaccess.  Cleanup the callers to avoid
pointless wrappers now that this is a simple function call.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ddd061b8

tcp: add tcp_sock_set_nodelay · 12abc5ee

由 Christoph Hellwig 提交于 5月 28, 2020

Add a helper to directly set the TCP_NODELAY sockopt from kernel space
without going through a fake uaccess.  Cleanup the callers to avoid
pointless wrappers now that this is a simple function call.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NSagi Grimberg <sagi@grimberg.me>
Acked-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

12abc5ee

tcp: add tcp_sock_set_cork · db10538a

由 Christoph Hellwig 提交于 5月 28, 2020

Add a helper to directly set the TCP_CORK sockopt from kernel space
without going through a fake uaccess.  Cleanup the callers to avoid
pointless wrappers now that this is a simple function call.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

db10538a

net: add sock_set_reuseport · fe31a326

由 Christoph Hellwig 提交于 5月 28, 2020

Add a helper to directly set the SO_REUSEPORT sockopt from kernel space
without going through a fake uaccess.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fe31a326

net: add sock_set_rcvbuf · 26cfabf9

由 Christoph Hellwig 提交于 5月 28, 2020

Add a helper to directly set the SO_RCVBUFFORCE sockopt from kernel space
without going through a fake uaccess.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

26cfabf9

net: add sock_set_keepalive · ce3d9544

由 Christoph Hellwig 提交于 5月 28, 2020

Add a helper to directly set the SO_KEEPALIVE sockopt from kernel space
without going through a fake uaccess.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce3d9544

net: add sock_enable_timestamps · 783da70e

由 Christoph Hellwig 提交于 5月 28, 2020

Add a helper to directly enable timestamps instead of setting the
SO_TIMESTAMP* sockopts from kernel space and going through a fake
uaccess.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

783da70e

net: add sock_bindtoindex · 7594888c

由 Christoph Hellwig 提交于 5月 28, 2020

Add a helper to directly set the SO_BINDTOIFINDEX sockopt from kernel
space without going through a fake uaccess.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7594888c

net: add sock_set_sndtimeo · 76ee0785

由 Christoph Hellwig 提交于 5月 28, 2020

Add a helper to directly set the SO_SNDTIMEO_NEW sockopt from kernel
space without going through a fake uaccess.  The interface is
simplified to only pass the seconds value, as that is the only
thing needed at the moment.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

76ee0785

net: add sock_set_priority · 6e434967

由 Christoph Hellwig 提交于 5月 28, 2020

Add a helper to directly set the SO_PRIORITY sockopt from kernel space
without going through a fake uaccess.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e434967

net: add sock_no_linger · c433594c

由 Christoph Hellwig 提交于 5月 28, 2020

Add a helper to directly set the SO_LINGER sockopt from kernel space
with onoff set to true and a linger time of 0 without going through a
fake uaccess.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c433594c

net: add sock_set_reuseaddr · b58f0e8f

由 Christoph Hellwig 提交于 5月 28, 2020

Add a helper to directly set the SO_REUSEADDR sockopt from kernel space
without going through a fake uaccess.

For this the iscsi target now has to formally depend on inet to avoid
a mostly theoretical compile failure.  For actual operation it already
did depend on having ipv4 or ipv6 support.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b58f0e8f

tcp: ipv6: support RFC 6069 (TCP-LD) · d2924569

由 Eric Dumazet 提交于 5月 27, 2020

Make tcp_ld_RTO_revert() helper available to IPv6, and
implement RFC 6069 :

Quoting this RFC :

3. Connectivity Disruption Indication

   For Internet Protocol version 6 (IPv6) [RFC2460], the counterpart of
   the ICMP destination unreachable message of code 0 (net unreachable)
   and of code 1 (host unreachable) is the ICMPv6 destination
   unreachable message of code 0 (no route to destination) [RFC4443].
   As with IPv4, a router should generate an ICMPv6 destination
   unreachable message of code 0 in response to a packet that cannot be
   delivered to its destination address because it lacks a matching
   entry in its routing table.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d2924569

28 5月, 2020 12 次提交

net: remove kernel_getsockopt · 7a15b2e0

由 Christoph Hellwig 提交于 5月 27, 2020

No users left.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a15b2e0

sctp: fix typo sctp_ulpevent_nofity_peer_addr_change · 50ce4c09

由 Jonas Falkevik 提交于 5月 27, 2020

change typo in function name "nofity" to "notify"
sctp_ulpevent_nofity_peer_addr_change ->
sctp_ulpevent_notify_peer_addr_change
Signed-off-by: NJonas Falkevik <jonas.falkevik@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

50ce4c09

net/tls: Add force_resync for driver resync · b3ae2459

由 Tariq Toukan 提交于 5月 27, 2020

This patch adds a field to the tls rx offload context which enables
drivers to force a send_resync call.

This field can be used by drivers to request a resync at the next
possible tls record. It is beneficial for hardware that provides the
resync sequence number asynchronously. In such cases, the packet that
triggered the resync does not contain the information required for a
resync. Instead, the driver requests resync for all the following
TLS record until the asynchronous notification with the resync request
TCP sequence arrives.

A following series for mlx5e ConnectX-6DX TLS RX offload support will
use this mechanism.
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Reviewed-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b3ae2459

net_sched: get rid of unnecessary dev_qdisc_reset() · 759ae57f

由 Cong Wang 提交于 5月 26, 2020

Resetting old qdisc on dev_queue->qdisc_sleeping in
dev_qdisc_reset() is redundant, because this qdisc,
even if not same with dev_queue->qdisc, is reset via
qdisc_put() right after calling dev_graft_qdisc() when
hitting refcnt 0.

This is very easy to observe with qdisc_reset() tracepoint
and stack traces.
Reported-by: NVáclav Zindulka <vaclav.zindulka@tlapnet.cz>
Tested-by: NVáclav Zindulka <vaclav.zindulka@tlapnet.cz>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

759ae57f

net_sched: avoid resetting active qdisc for multiple times · 70f50965

由 Cong Wang 提交于 5月 26, 2020

Except for sch_mq and sch_mqprio, each dev queue points to the
same root qdisc, so when we reset the dev queues with
netdev_for_each_tx_queue() we end up resetting the same instance
of the root qdisc for multiple times.

Avoid this by checking the __QDISC_STATE_DEACTIVATED bit in
each iteration, so for sch_mq/sch_mqprio, we still reset all
of them like before, for the rest, we only reset it once.
Reported-by: NVáclav Zindulka <vaclav.zindulka@tlapnet.cz>
Tested-by: NVáclav Zindulka <vaclav.zindulka@tlapnet.cz>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

70f50965

net_sched: add a tracepoint for qdisc creation · f5a7833e

由 Cong Wang 提交于 5月 26, 2020

With this tracepoint, we could know when qdisc's are created,
especially those default qdisc's.

Sample output:

tc-736 [001] ...1 56.230107: qdisc_create: dev=ens3 kind=pfifo parent=1:0
tc-736 [001] ...1 56.230113: qdisc_create: dev=ens3 kind=hfsc parent=ffff:ffff
tc-738 [001] ...1 56.256816: qdisc_create: dev=ens3 kind=pfifo parent=1:100
tc-739 [001] ...1 56.267584: qdisc_create: dev=ens3 kind=pfifo parent=1:200
tc-740 [001] ...1 56.279649: qdisc_create: dev=ens3 kind=fq_codel parent=1:100
tc-741 [001] ...1 56.289996: qdisc_create: dev=ens3 kind=pfifo_fast parent=1:200
tc-745 [000] .N.1 111.687483: qdisc_create: dev=ens3 kind=ingress parent=ffff:fff1

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f5a7833e

net_sched: add tracepoints for qdisc_reset() and qdisc_destroy() · a34dac0b

由 Cong Wang 提交于 5月 26, 2020

Add two tracepoints for qdisc_reset() and qdisc_destroy() to track
qdisc resetting and destroying.

Sample output:

tc-756 [000] ...3 138.355662: qdisc_reset: dev=ens3 kind=pfifo_fast parent=ffff:ffff handle=0:0
tc-756 [000] ...1 138.355720: qdisc_reset: dev=ens3 kind=pfifo_fast parent=ffff:ffff handle=0:0
tc-756 [000] ...1 138.355867: qdisc_reset: dev=ens3 kind=pfifo_fast parent=ffff:ffff handle=0:0
tc-756 [000] ...1 138.355930: qdisc_destroy: dev=ens3 kind=pfifo_fast parent=ffff:ffff handle=0:0
tc-757 [000] ...2 143.073780: qdisc_reset: dev=ens3 kind=fq_codel parent=ffff:ffff handle=8001:0
tc-757 [000] ...1 143.073878: qdisc_reset: dev=ens3 kind=fq_codel parent=ffff:ffff handle=8001:0
tc-757 [000] ...1 143.074114: qdisc_reset: dev=ens3 kind=fq_codel parent=ffff:ffff handle=8001:0
tc-757 [000] ...1 143.074228: qdisc_destroy: dev=ens3 kind=fq_codel parent=ffff:ffff handle=8001:0

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a34dac0b

net_sched: use qdisc_reset() in qdisc_destroy() · 4909daba

由 Cong Wang 提交于 5月 26, 2020

qdisc_destroy() calls ops->reset() and cleans up qdisc->gso_skb
and qdisc->skb_bad_txq, these are nearly same with qdisc_reset(),
so just call it directly, and cosolidate the code for the next
patch.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4909daba

tcp: rename tcp_v4_err() skb parameter · a12daf13

由 Eric Dumazet 提交于 5月 26, 2020

This essentially reverts 4d1a2d9e ("Revert Backoff [v3]:
Rename skb to icmp_skb in tcp_v4_err()")

Now we have tcp_ld_RTO_revert() helper, we can use the usual
name for sk_buff parameter, so that tcp_v4_err() and
tcp_v6_err() use similar names.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a12daf13

tcp: add tcp_ld_RTO_revert() helper · f7456642

由 Eric Dumazet 提交于 5月 26, 2020

RFC 6069 logic has been implemented for IPv4 only so far,
right in the middle of tcp_v4_err() and was error prone.

Move this code to one helper, to make tcp_v4_err() more
readable and to eventually expand RFC 6069 to IPv6 in
the future.

Also perform sock_owned_by_user() check a bit sooner.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Tested-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f7456642

bridge: mrp: Rework the MRP netlink interface · 20f6a05e

由 Horatiu Vultur 提交于 5月 27, 2020

This patch reworks the MRP netlink interface. Before, each attribute
represented a binary structure which made it hard to be extended.
Therefore update the MRP netlink interface such that each existing
attribute to be a nested attribute which contains the fields of the
binary structures.
In this way the MRP netlink interface can be extended without breaking
the backwards compatibility. It is also using strict checking for
attributes under the MRP top attribute.
Signed-off-by: NHoratiu Vultur <horatiu.vultur@microchip.com>
Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

20f6a05e

nexthop: Fix type of event_type in call_nexthop_notifiers · d8e79f1d

由 Nathan Chancellor 提交于 5月 27, 2020

Clang warns:

net/ipv4/nexthop.c:841:30: warning: implicit conversion from enumeration
type 'enum nexthop_event_type' to different enumeration type 'enum
fib_event_type' [-Wenum-conversion]
        call_nexthop_notifiers(net, NEXTHOP_EVENT_DEL, nh);
        ~~~~~~~~~~~~~~~~~~~~~~      ^~~~~~~~~~~~~~~~~
1 warning generated.

Use the right type for event_type so that clang does not warn.

Fixes: 8590ceed ("nexthop: add support for notifiers")
Link: https://github.com/ClangBuiltLinux/linux/issues/1038Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d8e79f1d

27 5月, 2020 13 次提交

net: ethtool: Allow PHY cable test TDR data to configured · f2bc8ad3

由 Andrew Lunn 提交于 5月 27, 2020

Allow the user to configure where on the cable the TDR data should be
retrieved, in terms of first and last sample, and the step between
samples. Also add the ability to ask for TDR data for just one pair.

If this configuration is not provided, it defaults to 1-150m at 1m
intervals for all pairs.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>

v3:
Move the TDR configuration into a structure
Add a range check on step
Use NL_SET_ERR_MSG_ATTR() when appropriate
Move TDR configuration into a nest
Document attributes in the request
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f2bc8ad3

net: ethtool: Add helpers for cable test TDR data · 6b4a0fc1

由 Andrew Lunn 提交于 5月 27, 2020

Add helpers for returning raw TDR helpers in netlink messages.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b4a0fc1

net: ethtool: Add generic parts of cable test TDR · 1a644de2

由 Andrew Lunn 提交于 5月 27, 2020

Add the generic parts of the code used to trigger a cable test and
return raw TDR data. Any PHY driver which support this must implement
the new driver op.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>

v2
Update nxp-tja11xx for API change.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1a644de2

mptcp: attempt coalescing when moving skbs to mptcp rx queue · 4e637c70

由 Florian Westphal 提交于 5月 25, 2020

We can try to coalesce skbs we take from the subflows rx queue with the
tail of the mptcp rx queue.

If successful, the skb head can be discarded early.

We can also free the skb extensions, we do not access them after this.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e637c70

net/smc: mark smc_pnet_policy as const · 09d0310f

由 Dmitry Vyukov 提交于 5月 25, 2020

Netlink policies are generally declared as const.
This is safer and prevents potential bugs.
Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

09d0310f

cls_flower: Support filtering on multiple MPLS Label Stack Entries · 61aec25a

由 Guillaume Nault 提交于 5月 26, 2020

With struct flow_dissector_key_mpls now recording the first
FLOW_DIS_MPLS_MAX labels, we can extend Flower to filter on any of
these LSEs independently.

In order to avoid creating new netlink attributes for every possible
depth, let's define a new TCA_FLOWER_KEY_MPLS_OPTS nested attribute
that contains the list of LSEs to match. Each LSE is represented by
another attribute, TCA_FLOWER_KEY_MPLS_OPTS_LSE, which then contains
the attributes representing the depth and the MPLS fields to match at
this depth (label, TTL, etc.).

For each MPLS field, the mask is always set to all-ones, as this is
what the original API did. We could allow user configurable masks in
the future if there is demand for more flexibility.

The new API also allows to only specify an LSE depth. In that case,
Flower only verifies that the MPLS label stack depth is greater or
equal to the provided depth (that is, an LSE exists at this depth).

Filters that only match on one (or more) fields of the first LSE are
dumped using the old netlink attributes, to avoid confusing user space
programs that don't understand the new API.
Signed-off-by: NGuillaume Nault <gnault@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

61aec25a

flow_dissector: Parse multiple MPLS Label Stack Entries · 58cff782

由 Guillaume Nault 提交于 5月 26, 2020

The current MPLS dissector only parses the first MPLS Label Stack
Entry (second LSE can be parsed too, but only to set a key_id).

This patch adds the possibility to parse several LSEs by making
__skb_flow_dissect_mpls() return FLOW_DISSECT_RET_PROTO_AGAIN as long
as the Bottom Of Stack bit hasn't been seen, up to a maximum of
FLOW_DIS_MPLS_MAX entries.

FLOW_DIS_MPLS_MAX is arbitrarily set to 7. This should be enough for
many practical purposes, without wasting too much space.

To record the parsed values, flow_dissector_key_mpls is modified to
store an array of stack entries, instead of just the values of the
first one. A bit field, "used_lses", is also added to keep track of
the LSEs that have been set. The objective is to avoid defining a
new FLOW_DISSECTOR_KEY_MPLS_XX for each level of the MPLS stack.

TC flower is adapted for the new struct flow_dissector_key_mpls layout.
Matching on several MPLS Label Stack Entries will be added in the next
patch.

The NFP and MLX5 drivers are also adapted: nfp_flower_compile_mac() and
mlx5's parse_tunnel() now verify that the rule only uses the first LSE
and fail if it doesn't.

Finally, the behaviour of the FLOW_DISSECTOR_KEY_MPLS_ENTROPY key is
slightly modified. Instead of recording the first Entropy Label, it
now records the last one. This shouldn't have any consequences since
there doesn't seem to have any user of FLOW_DISSECTOR_KEY_MPLS_ENTROPY
in the tree. We'd probably better do a hash of all parsed MPLS labels
instead (excluding reserved labels) anyway. That'd give better entropy
and would probably also simplify the code. But that's not the purpose
of this patch, so I'm keeping that as a future possible improvement.
Signed-off-by: NGuillaume Nault <gnault@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

58cff782

tipc: add test for Nagle algorithm effectiveness · 0a3e060f

由 Tuong Lien 提交于 5月 26, 2020

When streaming in Nagle mode, we try to bundle small messages from user
as many as possible if there is one outstanding buffer, i.e. not ACK-ed
by the receiving side, which helps boost up the overall throughput. So,
the algorithm's effectiveness really depends on when Nagle ACK comes or
what the specific network latency (RTT) is, compared to the user's
message sending rate.

In a bad case, the user's sending rate is low or the network latency is
small, there will not be many bundles, so making a Nagle ACK or waiting
for it is not meaningful.
For example: a user sends its messages every 100ms and the RTT is 50ms,
then for each messages, we require one Nagle ACK but then there is only
one user message sent without any bundles.

In a better case, even if we have a few bundles (e.g. the RTT = 300ms),
but now the user sends messages in medium size, then there will not be
any difference at all, that says 3 x 1000-byte data messages if bundled
will still result in 3 bundles with MTU = 1500.

When Nagle is ineffective, the delay in user message sending is clearly
wasted instead of sending directly.

Besides, adding Nagle ACKs will consume some processor load on both the
sending and receiving sides.

This commit adds a test on the effectiveness of the Nagle algorithm for
an individual connection in the network on which it actually runs.
Particularly, upon receipt of a Nagle ACK we will compare the number of
bundles in the backlog queue to the number of user messages which would
be sent directly without Nagle. If the ratio is good (e.g. >= 2), Nagle
mode will be kept for further message sending. Otherwise, we will leave
Nagle and put a 'penalty' on the connection, so it will have to spend
more 'one-way' messages before being able to re-enter Nagle.

In addition, the 'ack-required' bit is only set when really needed that
the number of Nagle ACKs will be reduced during Nagle mode.

Testing with benchmark showed that with the patch, there was not much
difference in throughput for small messages since the tool continuously
sends messages without a break, so Nagle would still take in effect.
Acked-by: NYing Xue <ying.xue@windriver.com>
Acked-by: NJon Maloy <jmaloy@redhat.com>
Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a3e060f

tipc: add support for broadcast rcv stats dumping · 03b6fefd

由 Tuong Lien 提交于 5月 26, 2020

This commit enables dumping the statistics of a broadcast-receiver link
like the traditional 'broadcast-link' one (which is for broadcast-
sender). The link dumping can be triggered via netlink (e.g. the
iproute2/tipc tool) by the link flag - 'TIPC_NLA_LINK_BROADCAST' as the
indicator.

The name of a broadcast-receiver link of a specific peer will be in the
format: 'broadcast-link:<peer-id>'.

For example:

Link <broadcast-link:1001002>
  Window:50 packets
  RX packets:7841 fragments:2408/440 bundles:0/0
  TX packets:0 fragments:0/0 bundles:0/0
  RX naks:0 defs:124 dups:0
  TX naks:21 acks:0 retrans:0
  Congestion link:0  Send queue max:0 avg:0

In addition, the broadcast-receiver link statistics can be reset in the
usual way via netlink by specifying that link name in command.

Note: the 'tipc_link_name_ext()' is removed because the link name can
now be retrieved simply via the 'l->name'.
Acked-by: NYing Xue <ying.xue@windriver.com>
Acked-by: NJon Maloy <jmaloy@redhat.com>
Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

03b6fefd

tipc: enable broadcast retrans via unicast · a91d55d1

由 Tuong Lien 提交于 5月 26, 2020

In some environment, broadcast traffic is suppressed at high rate (i.e.
a kind of bandwidth limit setting). When it is applied, TIPC broadcast
can still run successfully. However, when it comes to a high load, some
packets will be dropped first and TIPC tries to retransmit them but the
packet retransmission is intentionally broadcast too, so making things
worse and not helpful at all.

This commit enables the broadcast retransmission via unicast which only
retransmits packets to the specific peer that has really reported a gap
i.e. not broadcasting to all nodes in the cluster, so will prevent from
being suppressed, and also reduce some overheads on the other peers due
to duplicates, finally improve the overall TIPC broadcast performance.

Note: the functionality can be turned on/off via the sysctl file:

echo 1 > /proc/sys/net/tipc/bc_retruni
echo 0 > /proc/sys/net/tipc/bc_retruni

Default is '0', i.e. the broadcast retransmission still works as usual.
Acked-by: NYing Xue <ying.xue@windriver.com>
Acked-by: NJon Maloy <jmaloy@redhat.com>
Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a91d55d1

tipc: add back link trace events · c6ed7a5c

由 Tuong Lien 提交于 5月 26, 2020

In the previous commit ("tipc: add Gap ACK blocks support for broadcast
link"), we have removed the following link trace events due to the code
changes:

- tipc_link_bc_ack
- tipc_link_retrans

This commit adds them back along with some minor changes to adapt to
the new code.
Acked-by: NYing Xue <ying.xue@windriver.com>
Acked-by: NJon Maloy <jmaloy@redhat.com>
Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c6ed7a5c

tipc: introduce Gap ACK blocks for broadcast link · d7626b5a

由 Tuong Lien 提交于 5月 26, 2020

As achieved through commit 9195948f ("tipc: improve TIPC throughput
by Gap ACK blocks"), we apply the same mechanism for the broadcast link
as well. The 'Gap ACK blocks' data field in a 'PROTOCOL/STATE_MSG' will
consist of two parts built for both the broadcast and unicast types:

 31                       16 15                        0
+-------------+-------------+-------------+-------------+
|  bgack_cnt  |  ugack_cnt  |            len            |
+-------------+-------------+-------------+-------------+  -
|            gap            |            ack            |   |
+-------------+-------------+-------------+-------------+    > bc gacks
:                           :                           :   |
+-------------+-------------+-------------+-------------+  -
|            gap            |            ack            |   |
+-------------+-------------+-------------+-------------+    > uc gacks
:                           :                           :   |
+-------------+-------------+-------------+-------------+  -

which is "automatically" backward-compatible.

We also increase the max number of Gap ACK blocks to 128, allowing upto
64 blocks per type (total buffer size = 516 bytes).

Besides, the 'tipc_link_advance_transmq()' function is refactored which
is applicable for both the unicast and broadcast cases now, so some old
functions can be removed and the code is optimized.

With the patch, TIPC broadcast is more robust regardless of packet loss
or disorder, latency, ... in the underlying network. Its performance is
boost up significantly.
For example, experiment with a 5% packet loss rate results:

$ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000
real    0m 42.46s
user    0m 1.16s
sys     0m 17.67s

Without the patch:

$ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000
real    8m 27.94s
user    0m 0.55s
sys     0m 2.38s
Acked-by: NYing Xue <ying.xue@windriver.com>
Acked-by: NJon Maloy <jmaloy@redhat.com>
Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d7626b5a

tcp: tcp_v4_err() icmp skb is named icmp_skb · 23917494

由 Eric Dumazet 提交于 5月 25, 2020

I missed the fact that tcp_v4_err() differs from tcp_v6_err().

After commit 4d1a2d9e ("Rename skb to icmp_skb in tcp_v4_err()")
the skb argument has been renamed to icmp_skb only in one function.

I will in a future patch reconciliate these functions to avoid
this kind of confusion.

Fixes: 45af29ca ("tcp: allow traceroute -Mtcp for unpriv users")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

23917494

26 5月, 2020 2 次提交

batman-adv: Revert "disable ethtool link speed detection when auto negotiation off" · 9ad346c9

由 Sven Eckelmann 提交于 11月 25, 2019

The commit 8c46fcd7 ("batman-adv: disable ethtool link speed detection
when auto negotiation off") disabled the usage of ethtool's link_ksetting
when auto negotation was enabled due to invalid values when used with
tun/tap virtual net_devices. According to the patch, automatic measurements
should be used for these kind of interfaces.

But there are major flaws with this argumentation:

* automatic measurements are not implemented
* auto negotiation has nothing to do with the validity of the retrieved
  values

The first point has to be fixed by a longer patch series. The "validity"
part of the second point must be addressed in the same patch series by
dropping the usage of ethtool's link_ksetting (thus always doing automatic
measurements over ethernet).

Drop the patch again to have more default values for various net_device
types/configurations. The user can still overwrite them using the
batadv_hardif's BATADV_ATTR_THROUGHPUT_OVERRIDE.
Reported-by: NMatthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: NSven Eckelmann <sven@narfation.org>
Signed-off-by: NSimon Wunderlich <sw@simonwunderlich.de>

9ad346c9

bridge: mrp: Fix out-of-bounds read in br_mrp_parse · 617504c6

由 Horatiu Vultur 提交于 5月 25, 2020

The issue was reported by syzbot. When the function br_mrp_parse was
called with a valid net_bridge_port, the net_bridge was an invalid
pointer. Therefore the check br->stp_enabled could pass/fail
depending where it was pointing in memory.
The fix consists of setting the net_bridge pointer if the port is a
valid pointer.

Reported-by: syzbot+9c6f0f1f8e32223df9a4@syzkaller.appspotmail.com
Fixes: 65369933 ("bridge: mrp: Integrate MRP into the bridge")
Signed-off-by: NHoratiu Vultur <horatiu.vultur@microchip.com>
Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

617504c6

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功