提交 · c22af22cbdc206a0273d0e6d773bd3dfc99d2b02 · openeuler / Kernel

01 4月, 2018 40 次提交

ipv6: frag: remove unused field · c22af22c

由 Eric Dumazet 提交于 3月 31, 2018

csum field in struct frag_queue is not used, remove it.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c22af22c

Merge branch 'bnxt_en-next' · 5749d6af

由 David S. Miller 提交于 3月 31, 2018

Michael Chan says:

====================
bnxt_en: Update for net-next.

Misc. updates including updated firmware interface, some additional
port statistics, a new IRQ assignment scheme for the RDMA driver, support
for VF trust, and other changes and improvements for SRIOV.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5749d6af

bnxt_en: Add ULP calls to stop and restart IRQs. · ec86f14e

由 Michael Chan 提交于 3月 31, 2018

When the driver needs to re-initailize the IRQ vectors, we make the
new ulp_irq_stop() call to tell the RDMA driver to disable and free
the IRQ vectors.  After IRQ vectors have been re-initailized, we
make the ulp_irq_restart() call to tell the RDMA driver that
IRQs can be restarted.
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec86f14e

bnxt_en: Reserve completion rings and MSIX for bnxt_re RDMA driver. · fbcfc8e4

由 Michael Chan 提交于 3月 31, 2018

Add additional logic to reserve completion rings for the bnxt_re driver
when it requests MSIX vectors.  The function bnxt_cp_rings_in_use()
will return the total number of completion rings used by both drivers
that need to be reserved.  If the network interface in up, we will
close and open the NIC to reserve the new set of completion rings and
re-initialize the vectors.
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fbcfc8e4

bnxt_en: Refactor bnxt_need_reserve_rings(). · 4e41dc5d

由 Michael Chan 提交于 3月 31, 2018

Refactor bnxt_need_reserve_rings() slightly so that __bnxt_reserve_rings()
can call it and remove some duplicated code.
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e41dc5d

bnxt_en: Add IRQ remapping logic. · e5811b8c

由 Michael Chan 提交于 3月 31, 2018

Add remapping logic so that bnxt_en can use any arbitrary MSIX vectors.
This will allow the driver to reserve one range of MSIX vectors to be
used by both bnxt_en and bnxt_re.  bnxt_en can now skip over the MSIX
vectors used by bnxt_re.
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e5811b8c

bnxt_en: Change IRQ assignment for RDMA driver. · 08654eb2

由 Michael Chan 提交于 3月 31, 2018

In the current code, the range of MSIX vectors allocated for the RDMA
driver is disjoint from the network driver.  This creates a problem
for the new firmware ring reservation scheme.  The new scheme requires
the reserved completion rings/MSIX vectors to be in a contiguous
range.

Change the logic to allocate RDMA MSIX vectors to be contiguous with
the vectors used by bnxt_en on new firmware using the new scheme.
The new function bnxt_get_num_msix() calculates the exact number of
vectors needed by both drivers.
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

08654eb2

bnxt_en: Improve ring allocation logic. · 9899bb59

由 Michael Chan 提交于 3月 31, 2018

Currently, the driver code makes some assumptions about the group index
and the map index of rings.  This makes the code more difficult to
understand and less flexible.

Improve it by adding the grp_idx and map_idx fields explicitly to the
bnxt_ring_struct as a union.  The grp_idx is initialized for each tx ring
and rx agg ring during init. time.  We do the same for the map_idx for
each cmpl ring.

The grp_idx ties the tx ring to the ring group.  The map_idx is the
doorbell index of the ring.  With this new infrastructure, we can change
the ring index mapping scheme easily in the future.
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9899bb59

bnxt_en: Improve valid bit checking in firmware response message. · 845adfe4

由 Michael Chan 提交于 3月 31, 2018

When firmware sends a DMA response to the driver, the last byte of the
message will be set to 1 to indicate that the whole response is valid.
The driver waits for the message to be valid before reading the message.

The firmware spec allows these response messages to increase in
length by adding new fields to the end of these messages. The
older spec's valid location may become a new field in a newer
spec. To guarantee compatibility, the driver should zero the valid
byte before interpreting the entire message so that any new fields not
implemented by the older spec will be read as zero.

For messages that are forwarded to VFs, we need to set the length
and re-instate the valid bit so the VF will see the valid response.
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

845adfe4

bnxt_en: Improve resource accounting for SRIOV. · 596f9d55

由 Michael Chan 提交于 3月 31, 2018

When VFs are created, the current code subtracts the maximum VF
resources from the PF's pool.  This under-estimates the resources
remaining in the PF pool.  Instead, we should subtract the minimum
VF resources.  The VF minimum resources are guaranteed to the VFs
and only these should be subtracted from the PF's pool.
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

596f9d55

bnxt_en: Check max_tx_scheduler_inputs value from firmware. · db4723b3

由 Michael Chan 提交于 3月 31, 2018

When checking for the maximum pre-set TX channels for ethtool -l, we
need to check the current max_tx_scheduler_inputs parameter from firmware.
This parameter specifies the max input for the internal QoS nodes currently
available to this function. The function's TX rings will be capped by this
parameter. By adding this logic, we provide a more accurate pre-set max
TX channels to the user.
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

db4723b3

bnxt_en: Add extended port statistics support · 00db3cba

由 Vasundhara Volam 提交于 3月 31, 2018

Gather periodic extended port statistics, if the device is PF and
link is up.
Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

00db3cba

bnxt_en: Include additional hardware port statistics in ethtool -S. · 699efed0

由 Vasundhara Volam 提交于 3月 31, 2018

Include additional hardware port statistics in ethtool -S, which
are useful for debugging.
Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

699efed0

bnxt_en: Add support for ndo_set_vf_trust · 746df139

由 Vasundhara Volam 提交于 3月 31, 2018

Trusted VFs are allowed to modify MAC address, even when PF
has assigned one.
Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

746df139

bnxt_en: fix clear flags in ethtool reset handling · 2373d8d6

由 Scott Branden 提交于 3月 31, 2018

Clear flags when reset command processed successfully for components
specified.

Fixes: 6502ad59 ("bnxt_en: Add ETH_RESET_AP support")
Signed-off-by: NScott Branden <scott.branden@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2373d8d6

bnxt_en: Use a dedicated VNIC mode for RDMA. · abe93ad2

由 Michael Chan 提交于 3月 31, 2018

If the RDMA driver is registered, use a new VNIC mode that allows
RDMA traffic to be seen on the netdev in promiscuous mode.
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

abe93ad2

bnxt_en: Adjust default rings for multi-port NICs. · 1d3ef13d

由 Michael Chan 提交于 3月 31, 2018

Change the default ring logic to select default number of rings to be up to
8 per port if the default rings x NIC ports <= total CPUs.
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1d3ef13d

bnxt_en: Update firmware interface to 1.9.1.15. · d4f52de0

由 Michael Chan 提交于 3月 31, 2018

Minor changes, such as new extended port statistics.
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4f52de0

vlan: vlan_hw_filter_capable() can be static · eeb0a2a5

由 Wei Yongjun 提交于 3月 31, 2018

Fixes the following sparse warning:

net/8021q/vlan_core.c:168:6: warning:
 symbol 'vlan_hw_filter_capable' was not declared. Should it be static?
Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eeb0a2a5

Merge tag 'mlx5-updates-2018-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 8bde261e

由 David S. Miller 提交于 3月 31, 2018

Saeed Mahameed says:

====================
mlx5-updates-2018-03-30

This series contains updates to mlx5 core and mlx5e netdev drivers.
The main highlight of this series is the RX optimizations for striding RQ path,
introduced by Tariq.

First Four patches are trivial misc cleanups.
 - Spelling mistake fix
 - Dead code removal
 - Warning messages

RX optimizations for striding RQ:

1) RX refactoring, cleanups and micro optimizations
   - MTU calculation simplifications, obsoletes some WQEs-to-packets translation
     functions and helps delete ~60 LOC.
   - Do not busy-wait a pending UMR completion.
   - post the new values of UMR WQE inline, instead of using a data pointer.
   - use pre-initialized structures to save calculations in datapath.

2) Use linear SKB in Striding RQ "build_skb", (Using linear SKB has many advantages):
    - Saves a memcpy of the headers.
    - No page-boundary checks in datapath.
    - No filler CQEs.
    - Significantly smaller CQ.
    - SKB data continuously resides in linear part, and not split to
      small amount (linear part) and large amount (fragment).
      This saves datapath cycles in driver and improves utilization
      of SKB fragments in GRO.
    - The fragments of a resulting GRO SKB follow the IP forwarding
      assumption of equal-size fragments.

    implementation details:
    HW writes the packets to the beginning of a stride,
    i.e. does not keep headroom. To overcome this we make sure we can
    extend backwards and use the last bytes of stride i-1.
    Extra care is needed for stride 0 as it has no preceding stride.
    We make sure headroom bytes are available by shifting the buffer
    pointer passed to HW by headroom bytes.

    This configuration now becomes default, whenever capable.
    Of course, this implies turning LRO off.

    Performance testing:
    ConnectX-5, single core, single RX ring, default MTU.

    UDP packet rate, early drop in TC layer:

    --------------------------------------------
    | pkt size | before    | after     | ratio |
    --------------------------------------------
    | 1500byte | 4.65 Mpps | 5.96 Mpps | 1.28x |
    |  500byte | 5.23 Mpps | 5.97 Mpps | 1.14x |
    |   64byte | 5.94 Mpps | 5.96 Mpps | 1.00x |
    --------------------------------------------

    TCP streams: ~20% gain

3) Support XDP over Striding RQ:
    Now that linear SKB is supported over Striding RQ,
    we can support XDP by setting stride size to PAGE_SIZE
    and headroom to XDP_PACKET_HEADROOM.

    Striding RQ is capable of a higher packet-rate than
    conventional RQ.

    Performance testing:
    ConnectX-5, 24 rings, default MTU.
    CQE compression ON (to reduce completions BW in PCI).

    XDP_DROP packet rate:
    --------------------------------------------------
    | pkt size | XDP rate   | 100GbE linerate | pct% |
    --------------------------------------------------
    |   64byte | 126.2 Mpps |      148.0 Mpps |  85% |
    |  128byte |  80.0 Mpps |       84.8 Mpps |  94% |
    |  256byte |  42.7 Mpps |       42.7 Mpps | 100% |
    |  512byte |  23.4 Mpps |       23.4 Mpps | 100% |
    --------------------------------------------------

4) Remove mlx5 page_ref bulking in Striding RQ and use page_ref_inc only when needed.
   Without this bulking, we have:
    - no atomic ops on WQE allocation or free
    - one atomic op per SKB
    - In the default MTU configuration (1500, stride size is 2K),
      the non-bulking method execute 2 atomic ops as before
    - For larger MTUs with stride size of 4K, non-bulking method
      executes only a single op.
    - For XDP (stride size of 4K, no SKBs), non-bulking have no atomic ops per packet at all.

    Performance testing:
    ConnectX-5, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz.

    Single core packet rate (64 bytes).

    Early drop in TC: no degradation.

    XDP_DROP:
    before: 14,270,188 pps
    after:  20,503,603 pps, 43% improvement.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8bde261e

Merge tag 'rxrpc-next-20180330' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · e2e80c02

由 David S. Miller 提交于 3月 31, 2018

David Howells says:

====================
rxrpc: Fixes and more traces

Here are some patches that add some more tracepoints to AF_RXRPC and fix
some issues therein:

 (1) Fix the use of VERSION packets to keep firewall routes open.

 (2) Fix the incorrect current time usage in a tracepoint.

 (3) Fix Tx ring annotation corruption.

 (4) Fix accidental conversion of call-level abort into connection-level
     abort.

 (5) Fix calculation of resend time.

 (6) Remove a couple of unused variables.

 (7) Fix a bunch of checker warnings and an error.  Note that not all
     warnings can be quashed as checker doesn't seem to correctly handle
     seqlocks.

 (8) Fix a potential race between call destruction and socket/net
     destruction.

 (9) Add a tracepoint to track rxrpc_local refcounting.

(10) Fix an apparent leak of rxrpc_local objects.

(11) Add a tracepoint to track rxrpc_peer refcounting.

(12) Fix a leak of rxrpc_peer objects.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e2e80c02

hv_netvsc: Clean up extra parameter from rndis_filter_receive_data() · 3be9b5fd

由 Haiyang Zhang 提交于 3月 30, 2018

The variables, msg and data, have the same value. This patch removes
the extra one.
Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3be9b5fd

ethernet: hisilicon: hns: hns_dsaf_mac: Use generic eth_broadcast_addr · 49b44aa2

由 Joe Perches 提交于 3月 30, 2018

Rather than use an on-stack array to copy a broadcast address, use
the generic eth_broadcast_addr function to save a trivial amount of
object code.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

49b44aa2

Merge branch 'net_rwsem-fixes' · b3834acd

由 David S. Miller 提交于 3月 31, 2018

Kirill Tkhai says:

====================
net_rwsem fixes

there is wext_netdev_notifier_call()->wireless_nlevent_flush()
netdevice notifier, which takes net_rwsem, so we can't take
net_rwsem in {,un}register_netdevice_notifier().

Since {,un}register_netdevice_notifier() is executed under
pernet_ops_rwsem, net_namespace_list can't change, while we
holding it, so there is no need net_rwsem in these functions [1/2].

The same is in [2/2]. We make callers of __rtnl_link_unregister()
take pernet_ops_rwsem, and close the race with setup_net()
and cleanup_net(), so __rtnl_link_unregister() does not need it.
This also fixes the problem of that __rtnl_link_unregister() does
not see initializing and exiting nets.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b3834acd

net: Do not take net_rwsem in __rtnl_link_unregister() · 554873e5

由 Kirill Tkhai 提交于 3月 30, 2018

This function calls call_netdevice_notifier(), which also
may take net_rwsem. So, we can't use net_rwsem here.

This patch makes callers of this functions take pernet_ops_rwsem,
like register_netdevice_notifier() does. This will protect
the modifications of net_namespace_list, and allows notifiers
to take it (they won't have to care about context).

Since __rtnl_link_unregister() is used on module load
and unload (which are not frequent operations), this looks
for me better, than make all call_netdevice_notifier()
always executing in "protected net_namespace_list" context.

Also, this fixes the problem we had a deal in 328fbe74
"Close race between {un, }register_netdevice_notifier and ...",
and guarantees __rtnl_link_unregister() does not skip
exitting net.
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

554873e5

net: Remove net_rwsem from {, un}register_netdevice_notifier() · fc1dd369

由 Kirill Tkhai 提交于 3月 30, 2018

These functions take net_rwsem, while wireless_nlevent_flush()
also takes it. But down_read() can't be taken recursive,
because of rw_semaphore design, which prevents it to be occupied
by only readers forever.

Since we take pernet_ops_rwsem in {,un}register_netdevice_notifier(),
net list can't change, so these down_read()/up_read() can be removed.

Fixes: f0b07bb1 "net: Introduce net_rwsem to protect net_namespace_list"
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc1dd369

net: hns3: remove unnecessary pci_set_drvdata() and devm_kfree() · c679f6a2

由 Wei Yongjun 提交于 3月 30, 2018

There is no need for explicit calls of devm_kfree(), as the allocated
memory will be freed during driver's detach.

The driver core clears the driver data to NULL after device_release.
Thus, it is not needed to manually clear the device driver data to NULL.

So remove the unnecessary pci_set_drvdata() and devm_kfree().
Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c679f6a2

netdevsim: Change nsim_devlink_setup to return error to caller · ef817102

由 David Ahern 提交于 3月 30, 2018

Change nsim_devlink_setup to return any error back to the caller and
update nsim_init to handle it.
Requested-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Acked-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ef817102

Merge branch 'tipc-slim-down-name-table' · 6851cf28

由 David S. Miller 提交于 3月 31, 2018

Jon Maloy says:

====================
tipc: slim down name table

We clean up and improve the name binding table:

 - Replace the memory consuming 'sub_sequence/service range' array with
   an RB tree.
 - Introduce support for overlapping service sequences/ranges

 v2: #1: Fixed a missing initialization reported by David Miller
     #4: Obsoleted and replaced a few more macros to get a consistent
         terminology in the API.
     #5: Added new commit to fix a potential string overflow bug (it
         is still only in net-next) reported by Arnd Bergmann
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6851cf28

tipc: avoid possible string overflow · 7494cfa6

由 Jon Maloy 提交于 3月 29, 2018

gcc points out that the combined length of the fixed-length inputs to
l->name is larger than the destination buffer size:

net/tipc/link.c: In function 'tipc_link_create':
net/tipc/link.c:465:26: error: '%s' directive writing up to 32 bytes
into a region of size between 26 and 58 [-Werror=format-overflow=]
sprintf(l->name, "%s:%s-%s:unknown", self_str, if_name, peer_str);

net/tipc/link.c:465:2: note: 'sprintf' output 11 or more bytes
(assuming 75) into a destination of size 60
sprintf(l->name, "%s:%s-%s:unknown", self_str, if_name, peer_str);

A detailed analysis reveals that the theoretical maximum length of
a link name is:
max self_str + 1 + max if_name + 1 + max peer_str + 1 + max if_name =
16 + 1 + 15 + 1 + 16 + 1 + 15 = 65
Since we also need space for a trailing zero we now set MAX_LINK_NAME
to 68.

Just to be on the safe side we also replace the sprintf() call with
snprintf().

Fixes: 25b0b9c4 ("tipc: handle collisions of 32-bit node address
hash values")
Reported-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7494cfa6

tipc: tipc: rename address types in user api · 7a74d39c

由 Jon Maloy 提交于 3月 29, 2018

The three address type structs in the user API have names that in
reality reflect the specific, non-Linux environment where they were
originally created.

We now give them more intuitive names, in accordance with how TIPC is
described in the current documentation.

struct tipc_portid   -> struct tipc_socket_addr
struct tipc_name     -> struct tipc_service_addr
struct tipc_name_seq -> struct tipc_service_range

To avoid confusion, we also update some commmets and macro names to
 match the new terminology.

For compatibility, we add macros that map all old names to the new ones.
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a74d39c

tipc: permit overlapping service ranges in name table · 37922ea4

由 Jon Maloy 提交于 3月 29, 2018

With the new RB tree structure for service ranges it becomes possible to
solve an old problem; - we can now allow overlapping service ranges in
the table.

When inserting a new service range to the tree, we use 'lower' as primary
key, and when necessary 'upper' as secondary key.

Since there may now be multiple service ranges matching an indicated
'lower' value, we must also add the 'upper' value to the functions
used for removing publications, so that the correct, corresponding
range item can be found.

These changes guarantee that a well-formed publication/withdrawal item
from a peer node never will be rejected, and make it possible to
eliminate the problematic backlog functionality we currently have for
handling such cases.
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

37922ea4

tipc: refactor name table translate function · f20889f7

由 Jon Maloy 提交于 3月 29, 2018

The function tipc_nametbl_translate() function is ugly and hard to
follow. This can be improved somewhat by introducing a stack variable
for holding the publication list to be used and re-ordering the if-
clauses for selection of algorithm.
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f20889f7

tipc: replace name table service range array with rb tree · 218527fe

由 Jon Maloy 提交于 3月 29, 2018

The current design of the binding table has an unnecessary memory
consuming and complex data structure. It aggregates the service range
items into an array, which is expanded by a factor two every time it
becomes too small to hold a new item. Furthermore, the arrays never
shrink when the number of ranges diminishes.

We now replace this array with an RB tree that is holding the range
items as tree nodes, each range directly holding a list of bindings.

This, along with a few name changes, improves both readability and
volume of the code, as well as reducing memory consumption and hopefully
improving cache hit rate.
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

218527fe

Merge branch 'bridge-mtu' · 24197ee2

由 David S. Miller 提交于 3月 31, 2018

Nikolay Aleksandrov says:

====================
net: bridge: MTU handling changes

As previously discussed the recent changes break some setups and could lead
to packet drops. Thus the first patch reverts the behaviour for the bridge
to follow the minimum MTU but also keeps the ability to set the MTU to the
maximum (out of all ports) if vlan filtering is enabled. Patch 02 is the
bigger change in behaviour - we've always had trouble when configuring
bridges and their MTU which is auto tuning on port events
(add/del/changemtu), which means config software needs to chase it and fix
it after each such event, after patch 02 we allow the user to configure any
MTU (ETH_MIN/MAX limited) but once that is done the bridge stops auto
tuning and relies on the user to keep the MTU correct.
This should be compatible with cases that don't touch the MTU (or set it
to the same value), while allowing to configure the MTU and not worry
about it changing afterwards.

The patches are intentionally split like this, so that if they get accepted
and there are any complaints patch 02 can be reverted.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

24197ee2

net: bridge: disable bridge MTU auto tuning if it was set manually · 804b854d

由 Nikolay Aleksandrov 提交于 3月 30, 2018

As Roopa noted today the biggest source of problems when configuring
bridge and ports is that the bridge MTU keeps changing automatically on
port events (add/del/changemtu). That leads to inconsistent behaviour
and network config software needs to chase the MTU and fix it on each
such event. Let's improve on that situation and allow for the user to
set any MTU within ETH_MIN/MAX limits, but once manually configured it
is the user's responsibility to keep it correct afterwards.

In case the MTU isn't manually set - the behaviour reverts to the
previous and the bridge follows the minimum MTU.
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

804b854d

net: bridge: set min MTU on port events and allow user to set max · f40aa233

由 Nikolay Aleksandrov 提交于 3月 30, 2018

Recently the bridge was changed to automatically set maximum MTU on port
events (add/del/changemtu) when vlan filtering is enabled, but that
actually changes behaviour in a way which breaks some setups and can lead
to packet drops. In order to still allow that maximum to be set while being
compatible, we add the ability for the user to tune the bridge MTU up to
the maximum when vlan filtering is enabled, but that has to be done
explicitly and all port events (add/del/changemtu) lead to resetting that
MTU to the minimum as before.
Suggested-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f40aa233

Merge branch 'thunderx-DMAC-filtering' · 56c03cbf

由 David S. Miller 提交于 3月 31, 2018

Vadim Lomovtsev says:

====================
net: thunderx: implement DMAC filtering support

By default CN88XX BGX accepts all incoming multicast and broadcast
packets and filtering is disabled. The nic driver doesn't provide
an ability to change such behaviour.

This series is to implement DMAC filtering management for CN88XX
nic driver allowing user to enable/disable filtering and configure
specific MAC addresses to filter traffic.

Changes from v1:
build issues:
 - update code in order to address compiler warnings;
checkpatch.pl reported issues:
 - update code in order to fit 80 symbols length;
 - update commit descriptions in order to fit 80 symbols length;
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56c03cbf

net: thunderx: add ndo_set_rx_mode callback implementation for VF · 37c3347e

由 Vadim Lomovtsev 提交于 3月 30, 2018

The ndo_set_rx_mode() is called from atomic context which causes
messages response timeouts while VF to PF communication via MSIx.
To get rid of that we're copy passed mc list, parse flags and queue
handling of kernel request to ordered workqueue.
Signed-off-by: NVadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

37c3347e

net: thunderx: add workqueue control structures for handle ndo_set_rx_mode request · 1b6d55f2

由 Vadim Lomovtsev 提交于 3月 30, 2018

The kernel calls ndo_set_rx_mode() callback from atomic context which
causes messaging timeouts between VF and PF (as they’re implemented via
MSIx). So in order to handle ndo_set_rx_mode() we need to get rid of it.

This commit implements necessary workqueue related structures to let VF
queue kernel request processing in non-atomic context later.
Signed-off-by: NVadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1b6d55f2

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功