提交 · 7fa41efac14ffbe8db7660ad2da3928969d10caf · openanolis / cloud-kernel

24 7月, 2018 4 次提交

ipv6: sr: Use kmemdup instead of duplicating it in parse_nla_srh · 7fa41efa

由 YueHaibing 提交于 7月 23, 2018

Replace calls to kmalloc followed by a memcpy with a direct call to
kmemdup.
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7fa41efa

Merge branch 'net-bridge-add-support-for-backup-port' · f8b2990f

由 David S. Miller 提交于 7月 23, 2018

Nikolay Aleksandrov says:

====================
net: bridge: add support for backup port

This set introduces a new bridge port option that allows any port to have
any other port (in the same bridge of course) as its backup and traffic
will be forwarded to the backup port when the primary goes down. This is
mainly used in MLAG and EVPN setups where we have peerlink path which is
a backup of many (or even all) ports and is a participating bridge port
itself. There's more detailed information in patch 02. Patch 01 just
prepares the port sysfs code for options that take raw value. The main
issues that this set solves are scalability and fallback latency.

We have used similar code for over 6 months now to bring the fallback
latency of the backup peerlink down and avoid fdb notification storms.
Also due to the nature of master devices such setup is currently not
possible, and last but not least having tens of thousands of fdbs require
thousands of calls to switch.

I've also CCed our MLAG experts that have been using similar option.

Roopa also adds:

"Two switches acting in a MLAG pair are connected by the peerlink
interface which is a bridge port.

the config on one of the switches looks like the below. The other
switch also has a similar config.
eth0 is connected to one port on the server. And the server is
connected to both switches.

br0 -- team0---eth0
      |
      -- switch-peerlink

switch-peerlink becomes the failover/backport port when say team0 to
the server goes down.
Today, when team0 goes down, control plane has to withdraw all the fdb
entries pointing to team0
and re-install the fdb entries pointing to switch-peerlink...and
restore the fdb entries when team0 comes back up again.
and  this is the problem we are trying to solve.

This also becomes necessary when multihoming is implemented by a
standard like E-VPN https://tools.ietf.org/html/rfc8365#section-8
where the 'switch-peerlink' is an overlay vxlan port (like nikolay
mentions in his patch commit). In these implementations, the fdb scale
can be much larger.

On why bond failover cannot be used here ?: the point that nikolay was
alluding to is, switch-peerlink in the above example is a bridge port
and is a failover/backport port for more than one or all ports in the
bridge br0. And you cannot enslave switch-peerlink into a second level
team
with other bridge ports. Hence a multi layered team device is not an
option (FWIW, switch-peerlink is also a teamed interface to the peer
switch)."

v3: Added Roopa's explanation and diagram
v2: In patch 01 use kstrdup/kfree to avoid casting the const buf. In order
to avoid using GFP_ATOMIC or always allocating I kept the spinlock inside
each branch.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8b2990f

net: bridge: add support for backup port · 2756f68c

由 Nikolay Aleksandrov 提交于 7月 23, 2018

This patch adds a new port attribute - IFLA_BRPORT_BACKUP_PORT, which
allows to set a backup port to be used for known unicast traffic if the
port has gone carrier down. The backup pointer is rcu protected and set
only under RTNL, a counter is maintained so when deleting a port we know
how many other ports reference it as a backup and we remove it from all.
Also the pointer is in the first cache line which is hot at the time of
the check and thus in the common case we only add one more test.
The backup port will be used only for the non-flooding case since
it's a part of the bridge and the flooded packets will be forwarded to it
anyway. To remove the forwarding just send a 0/non-existing backup port.
This is used to avoid numerous scalability problems when using MLAG most
notably if we have thousands of fdbs one would need to change all of them
on port carrier going down which takes too long and causes a storm of fdb
notifications (and again when the port comes back up). In a Multi-chassis
Link Aggregation setup usually hosts are connected to two different
switches which act as a single logical switch. Those switches usually have
a control and backup link between them called peerlink which might be used
for communication in case a host loses connectivity to one of them.
We need a fast way to failover in case a host port goes down and currently
none of the solutions (like bond) cannot fulfill the requirements because
the participating ports are actually the "master" devices and must have the
same peerlink as their backup interface and at the same time all of them
must participate in the bridge device. As Roopa noted it's normal practice
in routing called fast re-route where a precalculated backup path is used
when the main one is down.
Another use case of this is with EVPN, having a single vxlan device which
is backup of every port. Due to the nature of master devices it's not
currently possible to use one device as a backup for many and still have
all of them participate in the bridge (which is master itself).
More detailed information about MLAG is available at the link below.
https://docs.cumulusnetworks.com/display/DOCS/Multi-Chassis+Link+Aggregation+-+MLAG

Further explanation and a diagram by Roopa:
Two switches acting in a MLAG pair are connected by the peerlink
interface which is a bridge port.

the config on one of the switches looks like the below. The other
switch also has a similar config.
eth0 is connected to one port on the server. And the server is
connected to both switches.

br0 -- team0---eth0
|
-- switch-peerlink
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2756f68c

net: bridge: add support for raw sysfs port options · a5f3ea54

由 Nikolay Aleksandrov 提交于 7月 23, 2018

This patch adds a new alternative store callback for port sysfs options
which takes a raw value (buf) and can use it directly. It is needed for the
backup port sysfs support since we have to pass the device by its name.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5f3ea54

23 7月, 2018 16 次提交

net: mediatek: use dma_zalloc_coherent instead of allocator/memset · 0a78c380

由 YueHaibing 提交于 7月 23, 2018

Use dma_zalloc_coherent instead of dma_alloc_coherent
followed by memset 0.
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a78c380

nfp: avoid buffer leak when FW communication fails · 07300f77

由 Jakub Kicinski 提交于 7月 20, 2018

After device is stopped we reset the rings by moving all free buffers
to positions [0, cnt - 2], and clear the position cnt - 1 in the ring.
We then proceed to clear the read/write pointers. This means that if
we try to reset the ring again the code will assume that the next to
fill buffer is at position 0 and swap it with cnt - 1. Since we
previously cleared position cnt - 1 it will lead to leaking the first
buffer and leaving ring in a bad state.

This scenario can only happen if FW communication fails, in which case
the ring will never be used again, so the fact it's in a bad state will
not be noticed. Buffer leak is the only problem. Don't try to move
buffers in the ring if the read/write pointers indicate the ring was
never used or have already been reset.

nfp_net_clear_config_and_disable() is now fully idempotent.

Found by code inspection, FW communication failures are very rare,
and reconfiguring a live device is not common either, so it's unlikely
anyone has ever noticed the leak.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

07300f77

nfp: bring back support for offloading shared blocks · 042f8825

由 Jakub Kicinski 提交于 7月 20, 2018

Now that we have offload replay infrastructure added by
commit 32636742 ("net: sched: call reoffload op on block callback reg")
and flows are guaranteed to be removed correctly, we can revert
commit 951a8ee6 ("nfp: reject binding to shared blocks").
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NJohn Hurley <john.hurley@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

042f8825

xen-netfront: fix queue name setting · 2d408c0d

由 Vitaly Kuznetsov 提交于 7月 20, 2018

Commit f599c64f ("xen-netfront: Fix race between device setup and
open") changed the initialization order: xennet_create_queues() now
happens before we do register_netdev() so using netdev->name in
xennet_init_queue() is incorrect, we end up with the following in
/proc/interrupts:

60: 139 0 xen-dyn -event eth%d-q0-tx
61: 265 0 xen-dyn -event eth%d-q0-rx
62: 234 0 xen-dyn -event eth%d-q1-tx
63: 1 0 xen-dyn -event eth%d-q1-rx

and this looks ugly. Actually, using early netdev name (even when it's
already set) is also not ideal: nowadays we tend to rename eth devices
and queue name may end up not corresponding to the netdev name.

Use nodename from xenbus device for queue naming: this can't change in VM's
lifetime. Now /proc/interrupts looks like

62: 202 0 xen-dyn -event device/vif/0-q0-tx
63: 317 0 xen-dyn -event device/vif/0-q0-rx
64: 262 0 xen-dyn -event device/vif/0-q1-tx
65: 17 0 xen-dyn -event device/vif/0-q1-rx

Fixes: f599c64f ("xen-netfront: Fix race between device setup and open")
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NRoss Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d408c0d

net/dsa/realtek: add MODULE_LICENSE() · be5a8ffa

由 Randy Dunlap 提交于 7月 20, 2018

Add MODULE_LICENSE() to net/dsa/realtek.o to fix build warning message.

WARNING: modpost: missing MODULE_LICENSE() in drivers/net/dsa/realtek.o
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be5a8ffa

bonding: don't cast const buf in sysfs store · 5b3df177

由 Nikolay Aleksandrov 提交于 7月 22, 2018

As was recently discussed [1], let's avoid casting the const buf in
bonding_sysfs_store_option and use kstrndup/kfree instead.

[1] http://lists.openwall.net/netdev/2018/07/22/25Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5b3df177

Merge branch 'TX-used-ring-batched-updating-for-vhost' · fb42c838

由 David S. Miller 提交于 7月 22, 2018

Jason Wang says:

====================
TX used ring batched updating for vhost

This series implement batch updating of used ring for TX. This help to
reduce the cache contention on used ring. The idea is first split
datacopy path from zerocopy, and do only batching for datacopy. This
is because zercopy had already supported its own batching.

TX PPS was increased 25.8% and Netperf TCP does not show obvious
differences.

The split of datapath will also be helpful for future implementation
like in order completion.
====================
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb42c838

vhost_net: batch update used ring for datacopy TX · 4afb52c2

由 Jason Wang 提交于 7月 20, 2018

Like commit e2b3b35e ("vhost_net: batch used ring update in rx"),
this patches implements batch used ring update for datacopy TX
(zerocopy has already done some kind of batching).

Testpmd transmission from guest to host (XDP_DROP on tap) shows 25.8%
improvement (from ~3.1Mpps to ~3.9Mpps) on Broadwell i7-5600U CPU @
2.60GHz machine. Netperf TCP tests does not show obvious differences.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4afb52c2

vhost_net: rename VHOST_RX_BATCH to VHOST_NET_BATCH · d0d86971

由 Jason Wang 提交于 7月 20, 2018

A more generic name which could be used for TX as well.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d0d86971

vhost_net: rename vhost_rx_signal_used() to vhost_net_signal_used() · 09c32489

由 Jason Wang 提交于 7月 20, 2018

Rename for reusing this for TX.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

09c32489

vhost_net: split out datacopy logic · 0d20bdf3

由 Jason Wang 提交于 7月 20, 2018

Instead of mixing zerocopy and datacopy logics, this patch tries to
split datacopy logic out. This results for a more compact code and
ad-hoc optimization could be done on top more easily.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d20bdf3

vhost_net: introduce tx_can_batch() · c92a8a8c

由 Jason Wang 提交于 7月 20, 2018

Introduce tx_can_batch() to determine whether TX could be
batched. This will help to reduce the code duplication in the future.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c92a8a8c

vhost_net: introduce get_tx_bufs() · a2a91a13

由 Jason Wang 提交于 7月 20, 2018

Factor out logic of getting tx buffer and iov iter
initialization. This will be used for reducing codes duplication in
the future.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a2a91a13

vhost_net: introduce vhost_exceeds_weight() · 272f35cb

由 Jason Wang 提交于 7月 20, 2018

Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

272f35cb

vhost_net: introduce helper to initialize tx iov iter · b0d0ea50

由 Jason Wang 提交于 7月 20, 2018

Introduce init_iov_iter() in order to be reused by future patch.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b0d0ea50

vhost_net: drop unnecessary parameter · 652e4f3e

由 Jason Wang 提交于 7月 20, 2018

Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

652e4f3e

22 7月, 2018 20 次提交

multicast: remove useless parameter for group add · 0ae0d60a

由 Hangbin Liu 提交于 7月 20, 2018

Remove the mode parameter for igmp/igmp6_group_added as we can get it
from first parameter.

Fixes: 6e2059b5 (ipv4/igmp: init group mode as INCLUDE when join source group)
Fixes: c7ea20c9 (ipv6/mcast: init as INCLUDE when join SSM INCLUDE group)
Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ae0d60a

net: wimax: stack: fixed multi line comment issue · ef324779

由 Mark Railton 提交于 7月 20, 2018

Moved end of comment to it's own line per guide
Signed-off-by: NMark Railton <mark@markrailton.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ef324779

net: phy: sfp: Do not use "imply HWMON" · b5293443

由 Guenter Roeck 提交于 7月 19, 2018

"imply HWMON" was supposed to ensure that the SFP phy code can be built
with HWMON enabled or disabled while at the same time ensuring that
HWMON is not built as module if SFP is built into the kernel.
Unfortunately, that does not work as intended. With "allmodconfig", it
results in several unrelated HWMON drivers to be disabled instead of
being built as module as expected.

Let's use the old "depends on HWMON || HWMON=n" instead. This is slightly
different (it enforces SFP to be built as module if HWMON is built as
module), but it is better than the alternative of using "IS_REACHABLE()"
in the driver since that would disable sensor support if HWMON is built
as module and SFP is built into the kernel.

Fixes: 1323061a ("net: phy: sfp: Add HWMON support for module sensors")
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b5293443

libcxgb: replace vmalloc and memset with vzalloc · 4c303373

由 YueHaibing 提交于 7月 19, 2018

Use vzalloc instead of the vmalloc, memset combo
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4c303373

net: hix5hd2_gmac: use dma_zalloc_coherent instead of allocator/memset · c1907e53

由 YueHaibing 提交于 7月 19, 2018

Use dma_zalloc_coherent instead of dma_alloc_coherent
followed by memset 0.
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c1907e53

tipc: make some functions static · e064cce1

由 YueHaibing 提交于 7月 19, 2018

Fixes the following sparse warnings:

net/tipc/link.c:376:5: warning: symbol 'link_bc_rcv_gap' was not declared. Should it be static?
net/tipc/link.c:823:6: warning: symbol 'link_prepare_wakeup' was not declared. Should it be static?
net/tipc/link.c:959:6: warning: symbol 'tipc_link_advance_backlog' was not declared. Should it be static?
net/tipc/link.c:1009:5: warning: symbol 'tipc_link_retrans' was not declared. Should it be static?
net/tipc/monitor.c:687:5: warning: symbol '__tipc_nl_add_monitor_peer' was not declared. Should it be static?
net/tipc/group.c:230:20: warning: symbol 'tipc_group_find_member' was not declared. Should it be static?
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e064cce1

net: sched: use PTR_ERR_OR_ZERO macro in tcf_block_cb_register · baa2d2b1

由 Gustavo A. R. Silva 提交于 7月 18, 2018

This line makes up what macro PTR_ERR_OR_ZERO already does. So,
make use of PTR_ERR_OR_ZERO rather than an open-code version.

This code was detected with the help of Coccinelle.
Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

baa2d2b1

Merge branch 'tcp-improve-setsockopt-TCP_USER_TIMEOUT-accuracy' · d1afdc51

由 David S. Miller 提交于 7月 21, 2018

Jon Maxwell says:

====================
tcp: improve setsockopt() TCP_USER_TIMEOUT accuracy

The patch was becoming bigger based on feedback therefore I have
implemented a series of 3 commits instead in V4.

This series is a continuation based on V3 here and associated feedback:

https://patchwork.kernel.org/patch/10516195/

Suggestions by Neal Cardwell:

1) Fix up units mismatch regarding msec/jiffies.
2) Address possiblility of time_remaining being negative.
3) Add a helper routine tcp_clamp_rto_to_user_timeout() to do the rto
calculation.
4) Move start_ts logic into helper routine tcp_retrans_stamp() to
validate tcp_sk(sk)->retrans_stamp.
5) Some u32 declation and return refactoring.
6) Return 0 instead of false in tcp_retransmit_stamp(), it's not a bool.

Suggestions by David Laight:

1) Don't cache rto in tcp_clamp_rto_to_user_timeout().

Suggestions by Eric Dumazet:

1) Make u32 declartions consistent.
2) Use patch series for easier review.
3) Convert icsk->icsk_user_timeout to millisconds to avoid jiffie to
msec dance.
4) Use seperate titles for each commit in the series.
5) Fix fuzzy indentation and line wrap issues.
6) Make commit titles descriptive.

Changes:

1) Call tcp_clamp_rto_to_user_timeout(sk) as an argument to
inet_csk_reset_xmit_timer() to save on rto declaration.

Every time the TCP retransmission timer fires. It checks to see if
there is a timeout before scheduling the next retransmit timer. The
retransmit interval between each retransmission increases
exponentially. The issue is that in order for the timeout to occur the
retransmit timer needs to fire again. If the user timeout check happens
after the 9th retransmit for example. It needs to wait for the 10th
retransmit timer to fire in order to evaluate whether a timeout has
occurred or not. If the interval is large enough then the timeout will
be inaccurate.

For example with a TCP_USER_TIMEOUT of 10 seconds without patch:

1st retransmit:

22:25:18.973488 IP host1.49310 > host2.search-agent: Flags [.]

Last retransmit:

22:25:26.205499 IP host1.49310 > host2.search-agent: Flags [.]

Timeout:

send: Connection timed out
Sun Jul  1 22:25:34 EDT 2018

We can see that last retransmit took ~7 seconds. Which pushed the total
timeout to ~15 seconds instead of the expected 10 seconds. This gets
more inaccurate the larger the TCP_USER_TIMEOUT value. As the interval
increases.

Add tcp_clamp_rto_to_user_timeout() to determine if the user rto has
expired. Or whether the rto interval needs to be recalculated. Use the
original interval if user rto is not set.

Test results with the patch is the expected 10 second timeout:

1st retransmit:

01:37:59.022555 IP host1.49310 > host2.search-agent: Flags [.]

Last retransmit:

01:38:06.486558 IP host1.49310 > host2.search-agent: Flags [.]

Timeout:

send: Connection timed out
Mon Jul  2 01:38:09 EDT 2018
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1afdc51

tcp: Add tcp_clamp_rto_to_user_timeout() helper to improve accuracy · b701a99e

由 Jon Maxwell 提交于 7月 19, 2018

Create the tcp_clamp_rto_to_user_timeout() helper routine. To calculate
the correct rto, so that the TCP_USER_TIMEOUT socket option is more
accurate. Taking suggestions and feedback into account from
Eric Dumazet, Neal Cardwell and David Laight. Due to the 1st commit we
can avoid the msecs_to_jiffies() and jiffies_to_msecs() dance.
Signed-off-by: NJon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b701a99e

tcp: Add tcp_retransmit_stamp() helper routine · a7fa3770

由 Jon Maxwell 提交于 7月 19, 2018

Create a seperate helper routine as per Neal Cardwells suggestion. To
be used by the final commit in this series and retransmits_timed_out().
Signed-off-by: NJon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a7fa3770

tcp: convert icsk_user_timeout from jiffies to msecs · 9bcc66e1

由 Jon Maxwell 提交于 7月 19, 2018

This is a preparatory commit. Part of this series that improves the
socket TCP_USER_TIMEOUT option accuracy. Implement Eric Dumazets idea
to convert icsk->icsk_user_timeout from jiffies to msecs. To eliminate
the msecs_to_jiffies() and jiffies_to_msecs() dance in future.
Signed-off-by: NJon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9bcc66e1

Merge branch 's390-qeth-updates' · 975cd350

由 David S. Miller 提交于 7月 21, 2018

Julian Wiedmann says:

====================
s390/qeth: updates 2018-07-19

please apply one more round of qeth patches to net-next.
This brings additional performance improvements for the transmit code,
and some refactoring to pave the way for using netdev_priv.
Also, two minor fixes for rare corner cases.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

975cd350

s390/qeth: speed up L2 IQD xmit · 5f89eca5

由 Julian Wiedmann 提交于 7月 19, 2018

Modify the L2 OSA xmit path so that it also supports L2 IQD devices
(in particular, their HW header requirements). This allows IQD devices
to advertise NETIF_F_SG support, and eliminates the allocation overhead
for the HW header.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f89eca5

s390/qeth: add support for constrained HW headers · a7c2f4a3

由 Julian Wiedmann 提交于 7月 19, 2018

Some transmit modes require that the HW header is located in the same
page as the initial protocol headers in skb->data. Let callers specify
the size of this contiguous header range, and enforce it when building
the HW header.

While at it, apply some gentle renaming to the relevant L2 code so that
it matches the L3 code.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a7c2f4a3

s390/qeth: merge linearize-check into HW header construction · ba86ceee

由 Julian Wiedmann 提交于 7月 19, 2018

When checking whether an skb needs to be linearized to fit into an IO
buffer, it's desirable to consider the skb's final size and layout
(ie. after the HW header was added). But a subsequent linearization can
then cause the re-positioned HW header to violate its alignment
restrictions.

Dealing with this situation in two different code paths is quite tricky.
This patch integrates a) linearize-check and b) HW header construction
into one 3 step-sequence:
1. evaluate how the HW header needs to be added (to identify if it takes
   up an additional buffer element), then
2. check if the required buffer elements exceed the device's limit.
   Linearize when necessary and re-evaluate the HW header placement.
3. Add the HW header in the best-possible way:
   a) push, without taking up an additional buffer element
   b) push, but consume another buffer element
   c) allocate a header object from the cache.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ba86ceee

s390/qeth: add statistics for consumed buffer elements · d2a274b2

由 Julian Wiedmann 提交于 7月 19, 2018

Nowadays an skb fragment typically spans over multiple pages. So replace
the obsolete, SG-only 'fragments' counter with one that tracks the
consumed buffer elements. This is what actually matters for performance.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d2a274b2

s390/qeth: use core MTU range checking · 72f219da

由 Julian Wiedmann 提交于 7月 19, 2018

qeth's ndo_change_mtu() only applies some trivial bounds checking. Set
up dev->min_mtu properly, so that dev_set_mtu() can do this for us.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72f219da

s390/qeth: simplify max MTU handling · 8ce7a9e0

由 Julian Wiedmann 提交于 7月 19, 2018

When the MPC initialization code discovers the HW-specific max MTU,
apply the resulting changes straight to the netdevice.

If this is the device's first initialization, also set its MTU
(HiperSockets: the max MTU; else: a layer-specific default value).
Then cap the current MTU by the new max MTU.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8ce7a9e0

s390/qeth: don't cache HW port number · 92d27209

由 Julian Wiedmann 提交于 7月 19, 2018

The netdevice is always available now, so get the portno from there.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

92d27209

s390/qeth: allocate netdevice early · d3d1b205

由 Julian Wiedmann 提交于 7月 19, 2018

Allocation of the netdevice is currently delayed until a qeth card first
goes online. This complicates matters in several places, where we need
to cache values instead of applying them straight to the netdevice.

Improve on this by moving the allocation up to where the qeth card
itself is created. This is also one step in direction of eventually
placing the qeth card into netdev_priv().

In all subsequent code, remove the now redundant checks whether
card->dev is valid.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3d1b205

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功