提交 · 489c25f9a375ba1e780dcf445ca74f6b3766d1b5 · openeuler / Kernel

20 12月, 2018 37 次提交

selftests: mlxsw: Add rtnetlink tests · 489c25f9

由 Ido Schimmel 提交于 12月 19, 2018

Add a new test that is focused on rtnetlink configuration. Its purpose
is to test valid and invalid (as deemed by mlxsw) configurations and
make sure that they succeed / fail without producing a trace.

Some of the test cases are derived from recent fixes in order to make
sure that the fixed bugs are not introduced again.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

489c25f9

mlxsw: spectrum_router: Hold a reference on RIF's netdev · b61cd7c6

由 Ido Schimmel 提交于 12月 19, 2018

Previous patches tried to make RIF deletion more robust and avoid
use-after-free situations.

As another precaution, hold a reference on a RIF's netdev and release it
when the RIF is deleted.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b61cd7c6

mlxsw: spectrum_router: Make RIF deletion more robust · 965fa8e6

由 Ido Schimmel 提交于 12月 19, 2018

In the past we had multiple instances where RIFs were not properly
deleted.

One of the reasons for leaking a RIF was that at the time when IP
addresses were flushed from the respective netdev (prompting the
destruction of the RIF), the netdev was no longer a mlxsw upper. This
caused the inet{,6}addr notification blocks to ignore the NETDEV_DOWN
event and leak the RIF.

Instead of checking whether the netdev is our upper when an IP address
is removed, we can instead check if the netdev has a RIF configured.

To look up a RIF we need to access mlxsw private data, so the patch
stores the notification blocks inside a mlxsw struct. This then allows
us to use container_of() and extract the required private data.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

965fa8e6

mlxsw: spectrum_router: Propagate 'struct mlxsw_sp' further · 21ffedb6

由 Ido Schimmel 提交于 12月 19, 2018

Next patch is going to make RIF deletion more robust by removing
reliance on fragile mlxsw_sp_lower_get(). This is because a netdev is
not necessarily our upper anymore when its IP addresses are flushed.

The inet{,6}addr notification blocks are going to resolve 'struct
mlxsw_sp' using container_of(), but the functions they call still use
mlxsw_sp_lower_get().

As a preparation for the next patch, propagate 'struct mlxsw_sp' down to
the functions called from the notification blocks and remove reliance on
mlxsw_sp_lower_get().
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

21ffedb6

mlxsw: spectrum: Properly cleanup LAG uppers when removing port from LAG · be2d6f42

由 Ido Schimmel 提交于 12月 19, 2018

When a LAG device or a VLAN device on top of it is enslaved to a bridge,
the driver propagates the CHANGEUPPER event to the LAG's slaves.

This causes each physical port to increase the reference count of the
internal representation of the bridge port by calling
mlxsw_sp_port_bridge_join().

However, when a port is removed from a LAG, the corresponding leave()
function is not called and the reference count is not decremented. This
leads to ugly hacks such as mlxsw_sp_bridge_port_should_destroy() that
try to understand if the bridge port should be destroyed even when its
reference count is not 0.

Instead, make sure that when a port is unlinked from a LAG it would see
the same events as if the LAG (or its uppers) were unlinked from a
bridge.

The above is achieved by walking the LAG's uppers when a port is
unlinked and calling mlxsw_sp_port_bridge_leave() for each upper that is
enslaved to a bridge.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be2d6f42

mlxsw: spectrum: Remove reference count from VLAN entries · 635c8c8b

由 Ido Schimmel 提交于 12月 19, 2018

Commit b3529af6 ("spectrum: Reference count VLAN entries") started
reference counting port-VLAN entries in a similar fashion to the 8021q
driver.

However, this is not actually needed and only complicates things.
Instead, the driver should forbid the creation of a VLAN on a port if
this VLAN already exists. This would also solve the issue fixed by the
mentioned commit.

Therefore, remove the get()/put() API and use create()/destroy()
instead.

One place that needs special attention is VLAN addition in a VLAN-aware
bridge via switchdev operations. In case the VLAN flags (e.g., 'pvid')
are toggled, then the VLAN entry already exists. To prevent the driver
from wrongly returning EEXIST, the driver is changed to check in the
prepare phase whether the entry already exists and only returns an error
in case it is not associated with the correct bridge port.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

635c8c8b

mlxsw: spectrum: Handle VLAN device unlinking · e149113a

由 Ido Schimmel 提交于 12月 19, 2018

In commit 993107fe ("mlxsw: spectrum_switchdev: Fix VLAN device
deletion via ioctl") I fixed a bug caused by the fact that the driver
views differently the deletion of a VLAN device when it is deleted via
an ioctl and netlink.

Instead of relying on a specific order of events (device being
unregistered vs. VLAN filter being updated), simply make sure that the
driver performs the necessary cleanup when the VLAN device is unlinked,
which always happens before the other two events.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e149113a

mlxsw: spectrum_fid: Remove unused function · f1d7c33d

由 Ido Schimmel 提交于 12月 19, 2018

This function is no longer used. Remove it.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f1d7c33d

mlxsw: spectrum_router: Do not destroy RIFs based on FID's reference count · 32fd4b49

由 Ido Schimmel 提交于 12月 19, 2018

Currently, when a RIF is constructed on top of a FID, the RIF increments
the FID's reference count and the RIF is destroyed when the FID's
reference count drops to 1. This effectively means that when no local
ports are member in the FID, the FID is destroyed regardless if the
router port is a member in the FID or not.

The above can lead to the unexpected behavior in which routes using a
VLAN interface as their nexthop device are no longer offloaded after the
last local port leaves the corresponding VLAN (FID).

Example:
# ip -4 route show dev br0.10
192.0.2.0/24 proto kernel scope link src 192.0.2.1 offload
# bridge vlan del vid 10 dev swp3
# ip -4 route show dev br0.10
192.0.2.0/24 proto kernel scope link src 192.0.2.1

After the patch, the route is offloaded before and after the VLAN is
removed from local port 'swp3', as the RIF corresponding to 'br0.10'
continues to exists.

In order to remove RIFs' reliance on the underlying FID's reference
count, we need to add a reference count to sub-port RIFs, which are RIFs
that correspond to physical ports and their uppers (e.g., LAG devices).

In this case, each {Port, VID} ('struct mlxsw_sp_port_vlan') needs to
hold a reference on the RIF. For example:

                       bond0.10
                          |
                        bond0
                          |
                      +-------+
                      |       |
                    swp1    swp2

Both {Port 1, VID 10} and {Port 2, VID 10} will hold a reference on the
RIF corresponding to 'bond0.10'. When the last reference is dropped, the
RIF will be destroyed.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

32fd4b49

mlxsw: spectrum: Sanitize VLAN interface's uppers · 927d0ef1

由 Ido Schimmel 提交于 12月 19, 2018

Currently, only VRF and macvlan uppers are supported on top of VLAN
device configured over a bridge, so make sure the driver forbids other
uppers.

Note that enslavement to a VRF is handled earlier in the notification
block, so there is no need to check for a VRF upper here.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

927d0ef1

tipc: fix uninitialized value for broadcast retransmission · 05572271

由 Hoang Le 提交于 12月 19, 2018

When sending broadcast message on high load system, there are a lot of
unnecessary packets restranmission. That issue was caused by missing in
initial criteria for retransmission.

To prevent this happen, just initialize this criteria for retransmission
in next 10 milliseconds.

Fixes: 31c4f4cc ("tipc: improve broadcast retransmission algorithm")
Acked-by: NYing Xue <ying.xue@windriver.com>
Acked-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NHoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

05572271

Merge branch 'tipc-tracepoints' · 013dc9d5

由 David S. Miller 提交于 12月 19, 2018

Tuong Lien says:

====================
tipc: tracepoints and trace_events in TIPC

The patch series is the first step of introducing a tracing framework in
TIPC, which will assist in collecting complete & plentiful data for post
analysis, even in the case of a single failure occurrence e.g. when the
failure is unreproducible.

The tracing code in TIPC utilizes the powerful kernel tracepoints, trace
events features along with particular dump functions to trace the TIPC
object data and events (incl. bearer, link, socket, node, etc.).

The tracing code should generate zero-load to TIPC when the trace events
are not enabled.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

013dc9d5

tipc: add trace_events for tipc bearer · cf5f55f7

由 Tuong Lien 提交于 12月 19, 2018

The commit adds the new trace_event for TIPC bearer, L2 device event:

trace_tipc_l2_device_event()

Also, it puts the trace at the tipc_l2_device_event() function, then
the device/bearer events and related info can be traced out during
runtime when needed.
Acked-by: NYing Xue <ying.xue@windriver.com>
Tested-by: NYing Xue <ying.xue@windriver.com>
Acked-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf5f55f7

tipc: add trace_events for tipc node · eb18a510

由 Tuong Lien 提交于 12月 19, 2018

The commit adds the new trace_events for TIPC node object:

trace_tipc_node_create()
trace_tipc_node_delete()
trace_tipc_node_lost_contact()
trace_tipc_node_timeout()
trace_tipc_node_link_up()
trace_tipc_node_link_down()
trace_tipc_node_reset_links()
trace_tipc_node_fsm_evt()
trace_tipc_node_check_state()

Also, enables the traces for the following cases:
- When a node is created/deleted;
- When a node contact is lost;
- When a node timer is timed out;
- When a node link is up/down;
- When all node links are reset;
- When node state is changed;
- When a skb comes and node state needs to be checked/updated.
Acked-by: NYing Xue <ying.xue@windriver.com>
Tested-by: NYing Xue <ying.xue@windriver.com>
Acked-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eb18a510

tipc: add trace_events for tipc socket · 01e661eb

由 Tuong Lien 提交于 12月 19, 2018

The commit adds the new trace_events for TIPC socket object:

trace_tipc_sk_create()
trace_tipc_sk_poll()
trace_tipc_sk_sendmsg()
trace_tipc_sk_sendmcast()
trace_tipc_sk_sendstream()
trace_tipc_sk_filter_rcv()
trace_tipc_sk_advance_rx()
trace_tipc_sk_rej_msg()
trace_tipc_sk_drop_msg()
trace_tipc_sk_release()
trace_tipc_sk_shutdown()
trace_tipc_sk_overlimit1()
trace_tipc_sk_overlimit2()

Also, enables the traces for the following cases:
- When user creates a TIPC socket;
- When user calls poll() on TIPC socket;
- When user sends a dgram/mcast/stream message.
- When a message is put into the socket 'sk_receive_queue';
- When a message is released from the socket 'sk_receive_queue';
- When a message is rejected (e.g. due to no port, invalid, etc.);
- When a message is dropped (e.g. due to wrong message type);
- When socket is released;
- When socket is shutdown;
- When socket rcvq's allocation is overlimit (> 90%);
- When socket rcvq + bklq's allocation is overlimit (> 90%);
- When the 'TIPC_ERR_OVERLOAD/2' issue happens;

Note:
a) All the socket traces are designed to be able to trace on a specific
socket by either using the 'event filtering' feature on a known socket
'portid' value or the sysctl file:

/proc/sys/net/tipc/sk_filter

The file determines a 'tuple' for what socket should be traced:

(portid, sock type, name type, name lower, name upper)

where:
+ 'portid' is the socket portid generated at socket creating, can be
found in the trace outputs or the 'tipc socket list' command printouts;
+ 'sock type' is the socket type (1 = SOCK_TREAM, ...);
+ 'name type', 'name lower' and 'name upper' are the service name being
connected to or published by the socket.

Value '0' means 'ANY', the default tuple value is (0, 0, 0, 0, 0) i.e.
the traces happen for every sockets with no filter.

b) The 'tipc_sk_overlimit1/2' event is also a conditional trace_event
which happens when the socket receive queue (and backlog queue) is
about to be overloaded, when the queue allocation is > 90%. Then, when
the trace is enabled, the last skbs leading to the TIPC_ERR_OVERLOAD/2
issue can be traced.

The trace event is designed as an 'upper watermark' notification that
the other traces (e.g. 'tipc_sk_advance_rx' vs 'tipc_sk_filter_rcv') or
actions can be triggerred in the meanwhile to see what is going on with
the socket queue.

In addition, the 'trace_tipc_sk_dump()' is also placed at the
'TIPC_ERR_OVERLOAD/2' case, so the socket and last skb can be dumped
for post-analysis.
Acked-by: NYing Xue <ying.xue@windriver.com>
Tested-by: NYing Xue <ying.xue@windriver.com>
Acked-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

01e661eb

tipc: add trace_events for tipc link · 26574db0

由 Tuong Lien 提交于 12月 19, 2018

The commit adds the new trace_events for TIPC link object:

trace_tipc_link_timeout()
trace_tipc_link_fsm()
trace_tipc_link_reset()
trace_tipc_link_too_silent()
trace_tipc_link_retrans()
trace_tipc_link_bc_ack()
trace_tipc_link_conges()

And the traces for PROTOCOL messages at building and receiving:

trace_tipc_proto_build()
trace_tipc_proto_rcv()

Note:
a) The 'tipc_link_too_silent' event will only happen when the
'silent_intv_cnt' is about to reach the 'abort_limit' value (and the
event is enabled). The benefit for this kind of event is that we can
get an early indication about TIPC link loss issue due to timeout, then
can do some necessary actions for troubleshooting.

For example: To trigger the 'tipc_proto_rcv' when the 'too_silent'
event occurs:

echo 'enable_event:tipc:tipc_proto_rcv' > \
      events/tipc/tipc_link_too_silent/trigger

And disable it when TIPC link is reset:

echo 'disable_event:tipc:tipc_proto_rcv' > \
      events/tipc/tipc_link_reset/trigger

b) The 'tipc_link_retrans' or 'tipc_link_bc_ack' event is useful to
trace TIPC retransmission issues.

In addition, the commit adds the 'trace_tipc_list/link_dump()' at the
'retransmission failure' case. Then, if the issue occurs, the link
'transmq' along with the link data can be dumped for post-analysis.
These dump events should be enabled by default since it will only take
effect when the failure happens.

The same approach is also applied for the faulty case that the
validation of protocol message is failed.
Acked-by: NYing Xue <ying.xue@windriver.com>
Tested-by: NYing Xue <ying.xue@windriver.com>
Acked-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

26574db0

tipc: enable tracepoints in tipc · b4b9771b

由 Tuong Lien 提交于 12月 19, 2018

As for the sake of debugging/tracing, the commit enables tracepoints in
TIPC along with some general trace_events as shown below. It also
defines some 'tipc_*_dump()' functions that allow to dump TIPC object
data whenever needed, that is, for general debug purposes, ie. not just
for the trace_events.

The following trace_events are now available:

- trace_tipc_skb_dump(): allows to trace and dump TIPC msg & skb data,
  e.g. message type, user, droppable, skb truesize, cloned skb, etc.

- trace_tipc_list_dump(): allows to trace and dump any TIPC buffers or
  queues, e.g. TIPC link transmq, socket receive queue, etc.

- trace_tipc_sk_dump(): allows to trace and dump TIPC socket data, e.g.
  sk state, sk type, connection type, rmem_alloc, socket queues, etc.

- trace_tipc_link_dump(): allows to trace and dump TIPC link data, e.g.
  link state, silent_intv_cnt, gap, bc_gap, link queues, etc.

- trace_tipc_node_dump(): allows to trace and dump TIPC node data, e.g.
  node state, active links, capabilities, link entries, etc.

How to use:
Put the trace functions at any places where we want to dump TIPC data
or events.

Note:
a) The dump functions will generate raw data only, that is, to offload
the trace event's processing, it can require a tool or script to parse
the data but this should be simple.

b) The trace_tipc_*_dump() should be reserved for a failure cases only
(e.g. the retransmission failure case) or where we do not expect to
happen too often, then we can consider enabling these events by default
since they will almost not take any effects under normal conditions,
but once the rare condition or failure occurs, we get the dumped data
fully for post-analysis.

For other trace purposes, we can reuse these trace classes as template
but different events.

c) A trace_event is only effective when we enable it. To enable the
TIPC trace_events, echo 1 to 'enable' files in the events/tipc/
directory in the 'debugfs' file system. Normally, they are located at:

/sys/kernel/debug/tracing/events/tipc/

For example:

To enable the tipc_link_dump event:

echo 1 > /sys/kernel/debug/tracing/events/tipc/tipc_link_dump/enable

To enable all the TIPC trace_events:

echo 1 > /sys/kernel/debug/tracing/events/tipc/enable

To collect the trace data:

cat trace

or

cat trace_pipe > /trace.out &

To disable all the TIPC trace_events:

echo 0 > /sys/kernel/debug/tracing/events/tipc/enable

To clear the trace buffer:

echo > trace

d) Like the other trace_events, the feature like 'filter' or 'trigger'
is also usable for the tipc trace_events.
For more details, have a look at:

Documentation/trace/ftrace.txt

MAINTAINERS | add two new files 'trace.h' & 'trace.c' in tipc
Acked-by: NYing Xue <ying.xue@windriver.com>
Tested-by: NYing Xue <ying.xue@windriver.com>
Acked-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b4b9771b

Merge branch 'sk_buff-add-extension-infrastructure' · 4a54877e

由 David S. Miller 提交于 12月 19, 2018

Florian Westphal says:

====================
sk_buff: add extension infrastructure

TL;DR:
 - objdiff shows no change if CONFIG_XFRM=n && BR_NETFILTER=n
 - small size reduction when one or both options are set
 - no changes in ipsec performance

 Changes since v1:
 - Allocate entire extension space from a kmem_cache.
 - Avoid atomic_dec_and_test operation on skb_ext_put() for refcnt == 1 case.
   (similar to kfree_skbmem() fclone_ref use).

This adds an optional extension infrastructure, with ispec (xfrm) and
bridge netfilter as first users.

The third (future) user is Multipath TCP which is still out-of-tree.
MPTCP needs to map logical mptcp sequence numbers to the tcp sequence
numbers used by individual subflows.

This DSS mapping is read/written from tcp option space on receive and
written to tcp option space on transmitted tcp packets that are part of
and MPTCP connection.

Extending skb_shared_info or adding a private data field to skb fclones
doesn't work for incoming skb, so a different DSS propagation method would
be required for the receive side.

mptcp has same requirements as secpath/bridge netfilter:

1. extension memory is released when the sk_buff is free'd.
2. data is shared after cloning an skb (clone inherits extension)
3. adding extension to an skb will COW the extension buffer if needed.

Two new members are added to sk_buff:
1. 'active_extensions' byte (filling a hole), telling which extensions
   are available for this skb.
   This has two purposes.
   a) avoids the need to initialize the pointer.
   b) allows to "delete" an extension by clearing its bit
   value in ->active_extensions.

   While it would be possible to store the active_extensions byte
   in the extension struct instead of sk_buff, there is one problem
   with this:
    When an extension has to be disabled, we can always clear the
    bit in skb->active_extensions.  But in case it would be stored in the
    extension buffer itself, we might have to COW it first, if
    we are dealing with a cloned skb.  On kmalloc failure we would
    be unable to turn an extension off.
2. extension pointer, located at the end of the sk_buff.
   If the active_extensions byte is 0, the pointer is undefined,
   it is not initialized on skb allocation.

This adds extra code to skb clone and free paths (to deal with
refcount/free of extension area) but this replaces similar code that
manages skb->nf_bridge and skb->sp structs in the followup patches of
the series.

It is possible to add support for extensions that are not preseved on
clones/copies:

1. define a bitmask of all extensions that need copy/cow on clone
2. change __skb_ext_copy() to check
   ->active_extensions & SKB_EXT_PRESERVE_ON_CLONE
3. set clone->active_extensions to 0 if test is false.

This isn't done here because all extensions that get added here
need the copy/cow semantics.

Last patch converts skb->sp, secpath information gets stored as
new SKB_EXT_SEC_PATH, so the 'sp' pointer is removed from skbuff.

Extra code added to skb clone and free paths (to deal with refcount/free
of extension area) replaces the existing code that does the same for
skb->nf_bridge and skb->secpath.

I don't see any other in-tree users that could benefit from this
infrastructure, it doesn't make sense to add an extension just for the sake
of a single flag bit (like skb->nf_trace).

Adding a new extension is a good fit if all of the following are true:

1. Data is related to the skb/packet aggregate
2. Data should be freed when the skb is free'd
3. Data is not going to be relevant/needed in normal case (udp, tcp,
   forwarding workloads, ...)
4. There are no fancy action(s) needed on clone/free, such as callbacks
   into kernel modules.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4a54877e

net: switch secpath to use skb extension infrastructure · 4165079b

由 Florian Westphal 提交于 12月 18, 2018

Remove skb->sp and allocate secpath storage via extension
infrastructure.  This also reduces sk_buff by 8 bytes on x86_64.

Total size of allyesconfig kernel is reduced slightly, as there is
less inlined code (one conditional atomic op instead of two on
skb_clone).

No differences in throughput in following ipsec performance tests:
- transport mode with aes on 10GB link
- tunnel mode between two network namespaces with aes and null cipher
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4165079b

xfrm: prefer secpath_set over secpath_dup · a84e3f53

由 Florian Westphal 提交于 12月 18, 2018

secpath_set is a wrapper for secpath_dup that will not perform
an allocation if the secpath attached to the skb has a reference count
of one, i.e., it doesn't need to be COW'ed.

Also, secpath_dup doesn't attach the secpath to the skb, it leaves
this to the caller.

Use secpath_set in places that immediately assign the return value to
skb.

This allows to remove skb->sp without touching these spots again.

secpath_dup can eventually be removed in followup patch.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a84e3f53

drivers: chelsio: use skb_sec_path helper · a053c866

由 Florian Westphal 提交于 12月 18, 2018

reduce noise when skb->sp is removed later in the series.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a053c866

xfrm: use secpath_exist where applicable · 26912e37

由 Florian Westphal 提交于 12月 18, 2018

Will reduce noise when skb->sp is removed later in this series.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

26912e37

drivers: net: netdevsim: use skb_sec_path helper · 56d1ac32

由 Florian Westphal 提交于 12月 18, 2018

... so this won't have to be changed when skb->sp goes away.

v2: no changes, preserve ack.
Acked-by: NShannon Nelson <shannon.lee.nelson@gmail.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56d1ac32

drivers: net: ethernet: mellanox: use skb_sec_path helper · 6362a6a0

由 Florian Westphal 提交于 12月 18, 2018

Will avoid touching this when sp pointer is removed from sk_buff struct.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6362a6a0

drivers: net: intel: use secpath helpers in more places · 2fdb435b

由 Florian Westphal 提交于 12月 18, 2018

Use skb_sec_path and secpath_exists helpers where possible.
This reduces noise in followup patch that removes skb->sp pointer.

v2: no changes, preseve acks from v1.
Acked-by: NShannon Nelson <shannon.lee.nelson@gmail.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2fdb435b

net: use skb_sec_path helper in more places · 2294be0f

由 Florian Westphal 提交于 12月 18, 2018

skb_sec_path gains 'const' qualifier to avoid
xt_policy.c: 'skb_sec_path' discards 'const' qualifier from pointer target type

same reasoning as previous conversions: Won't need to touch these
spots anymore when skb->sp is removed.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2294be0f

net: move secpath_exist helper to sk_buff.h · 7af8f4ca

由 Florian Westphal 提交于 12月 18, 2018

Future patch will remove skb->sp pointer.
To reduce noise in those patches, move existing helper to
sk_buff and use it in more places to ease skb->sp replacement later.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7af8f4ca

xfrm: change secpath_set to return secpath struct, not error value · 0ca64da1

由 Florian Westphal 提交于 12月 18, 2018

It can only return 0 (success) or -ENOMEM.
Change return value to a pointer to secpath struct.

This avoids direct access to skb->sp:

err = secpath_set(skb);
if (!err) ..
skb->sp-> ...

Becomes:
sp = secpath_set(skb)
if (!sp) ..
sp-> ..

This reduces noise in followup patch which is going to remove skb->sp.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ca64da1

net: convert bridge_nf to use skb extension infrastructure · de8bda1d

由 Florian Westphal 提交于 12月 18, 2018

This converts the bridge netfilter (calling iptables hooks from bridge)
facility to use the extension infrastructure.

The bridge_nf specific hooks in skb clone and free paths are removed, they
have been replaced by the skb_ext hooks that do the same as the bridge nf
allocations hooks did.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de8bda1d

sk_buff: add skb extension infrastructure · df5042f4

由 Florian Westphal 提交于 12月 18, 2018

This adds an optional extension infrastructure, with ispec (xfrm) and
bridge netfilter as first users.
objdiff shows no changes if kernel is built without xfrm and br_netfilter
support.

The third (planned future) user is Multipath TCP which is still
out-of-tree.
MPTCP needs to map logical mptcp sequence numbers to the tcp sequence
numbers used by individual subflows.

This DSS mapping is read/written from tcp option space on receive and
written to tcp option space on transmitted tcp packets that are part of
and MPTCP connection.

Extending skb_shared_info or adding a private data field to skb fclones
doesn't work for incoming skb, so a different DSS propagation method would
be required for the receive side.

mptcp has same requirements as secpath/bridge netfilter:

1. extension memory is released when the sk_buff is free'd.
2. data is shared after cloning an skb (clone inherits extension)
3. adding extension to an skb will COW the extension buffer if needed.

The "MPTCP upstreaming" effort adds SKB_EXT_MPTCP extension to store the
mapping for tx and rx processing.

Two new members are added to sk_buff:
1. 'active_extensions' byte (filling a hole), telling which extensions
   are available for this skb.
   This has two purposes.
   a) avoids the need to initialize the pointer.
   b) allows to "delete" an extension by clearing its bit
   value in ->active_extensions.

   While it would be possible to store the active_extensions byte
   in the extension struct instead of sk_buff, there is one problem
   with this:
    When an extension has to be disabled, we can always clear the
    bit in skb->active_extensions.  But in case it would be stored in the
    extension buffer itself, we might have to COW it first, if
    we are dealing with a cloned skb.  On kmalloc failure we would
    be unable to turn an extension off.

2. extension pointer, located at the end of the sk_buff.
   If the active_extensions byte is 0, the pointer is undefined,
   it is not initialized on skb allocation.

This adds extra code to skb clone and free paths (to deal with
refcount/free of extension area) but this replaces similar code that
manages skb->nf_bridge and skb->sp structs in the followup patches of
the series.

It is possible to add support for extensions that are not preseved on
clones/copies.

To do this, it would be needed to define a bitmask of all extensions that
need copy/cow semantics, and change __skb_ext_copy() to check
->active_extensions & SKB_EXT_PRESERVE_ON_CLONE, then just set
->active_extensions to 0 on the new clone.

This isn't done here because all extensions that get added here
need the copy/cow semantics.

v2:
Allocate entire extension space using kmem_cache.
Upside is that this allows better tracking of used memory,
downside is that we will allocate more space than strictly needed in
most cases (its unlikely that all extensions are active/needed at same
time for same skb).
The allocated memory (except the small extension header) is not cleared,
so no additonal overhead aside from memory usage.

Avoid atomic_dec_and_test operation on skb_ext_put()
by using similar trick as kfree_skbmem() does with fclone_ref:
If recount is 1, there is no concurrent user and we can free right away.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df5042f4

netfilter: avoid using skb->nf_bridge directly · c4b0e771

由 Florian Westphal 提交于 12月 18, 2018

This pointer is going to be removed soon, so use the existing helpers in
more places to avoid noise when the removal happens.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c4b0e771

Merge branch 'dpaa2-eth-add-QBMAN-statistics' · 8239d579

由 David S. Miller 提交于 12月 19, 2018

Ioana Ciornei says:

====================
dpaa2-eth: add QBMAN statistics

This patch set adds ethtool statistics for pending frames/bytes
in Rx/Tx conf FQs and number of buffers in pool.

The first patch adds support for the query APIs in the DPIO driver
while the latter actually exposes the statistics through ethtool.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8239d579

dpaa2-eth: Add QBMAN related stats · 610febc6

由 Ioana Radulescu 提交于 12月 18, 2018

Add statistics for pending frames in Rx/Tx conf FQs and
number of buffers in pool. Available through ethtool -S.
Signed-off-by: NIoana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: NIoana ciornei <ioana.ciornei@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

610febc6

soc: fsl: dpio: Add BP and FQ query APIs · e80081c3

由 Roy Pledge 提交于 12月 18, 2018

Add FQ (Frame Queue) and BP (Buffer Pool) query APIs that
users of QBMan can invoke to see the status of the queues
and pools that they are using.
Signed-off-by: NRoy Pledge <roy.pledge@nxp.com>
Signed-off-by: NIoana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: NIoana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e80081c3

net: phy: mscc: Fix the VSC 8531/41 Chip Init sequence · 7b98f63e

由 Raju Lakkaraju 提交于 12月 18, 2018

- Turn on Broadcast writes
- UNH 1.8.1 clear bias for UNH 1000BT distortion
- UNH 1.8.7 optimize pre-emphasis for 100BasTx UNH 100W fix
- Enable Token-ring during 'Coma Mode'
Signed-off-by: NRaju Lakkaraju <Raju.Lakkaraju@microchip.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7b98f63e

Merge branch 'for-upstream' of... · 29d3c047

由 David S. Miller 提交于 12月 19, 2018

Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2018-12-19

Here's the main bluetooth-next pull request for 4.21:

 - Multiple fixes & improvements for Broadcom-based controllers
 - New USB ID for an Intel controller
 - Support for new Broadcom controller variants
 - Use DEFINE_SHOW_ATTRIBUTE to simplify debugfs code
 - Eliminate confusing "last event is not cmd complete" warning message
 - Added vendor suspend/resume support for H:5 (3-Wire UART) controllers
 - Various other smaller improvements & fixes

Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

29d3c047

Merge tag 'mac80211-next-for-davem-2018-12-19' of... · 5a862f86

由 David S. Miller 提交于 12月 19, 2018

Merge tag 'mac80211-next-for-davem-2018-12-19' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
This time we have too many changes to list, highlights:
 * virt_wifi - wireless control simulation on top of
   another network interface
 * hwsim configurability to test capabilities similar
   to real hardware
 * various mesh improvements
 * various radiotap vendor data fixes in mac80211
 * finally the nl_set_extack_cookie_u64() we talked
   about previously, used for
 * peer measurement APIs, right now only with FTM
   (flight time measurement) for location
 * made nl80211 radio/interface announcements more complete
 * various new HE (802.11ax) things:
   updates, TWT support, ...
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a862f86

19 12月, 2018 3 次提交

Bluetooth: Fix unnecessary error message for HCI request completion · 1629db9c

由 Johan Hedberg 提交于 11月 27, 2018

In case a command which completes in Command Status was sent using the
hci_cmd_send-family of APIs there would be a misleading error in the
hci_get_cmd_complete function, since the code would be trying to fetch
the Command Complete parameters when there are none.

Avoid the misleading error and silently bail out from the function in
case the received event is a command status.
Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>
Acked-by: NLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

1629db9c

Bluetooth: hci_bcm: Handle specific unknown packets after firmware loading · 22bba805

由 Jonathan Bakker 提交于 12月 19, 2018

The Broadcom controller on aries S5PV210 boards sends out a couple of
unknown packets after the firmware is loaded.  This will cause
logging of errors such as:
	Bluetooth: hci0: Frame reassembly failed (-84)

This is probably also the case with other boards, as there are related
Android userspace patches for custom ROMs such as
https://review.lineageos.org/#/c/LineageOS/android_system_bt/+/142721/
Since this appears to be intended behaviour, treated them as diagnostic
packets.

Note that this is another variant of commit 01d5e44a
("Bluetooth: hci_bcm: Handle empty packet after firmware loading")
Signed-off-by: NJonathan Bakker <xc-racer2@live.ca>
Signed-off-by: NPaweł Chmiel <pawel.mikolaj.chmiel@gmail.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

22bba805

Bluetooth: btbcm: Add entry for BCM4329B1 UART bluetooth · e3ca60d0

由 Paweł Chmiel 提交于 12月 19, 2018

This patch adds the device ID for the BCM 4329 combo module used
in the Samsung Aries based phones (Galaxy S and it's variants).

```
[   11.508980] Bluetooth: hci0: BCM: chip id 41
[   11.518975] Bluetooth: hci0: BCM: features 0x04
[   11.550132] Bluetooth: hci0: BCM4329B1
[   11.557046] Bluetooth: hci0: BCM4329B1 (002.002.023) build 0000
[   13.737071] Bluetooth: hci0: BCM4329B1 (002.002.023) build 0744
```

Output from hciconfig

```
hci0:   Type: Primary  Bus: UART
        BD Address: 43:29:B1:55:00:00  ACL MTU: 1021:6  SCO MTU: 64:1
        UP RUNNING
        RX bytes:1675 acl:0 sco:0 events:145 errors:0
        TX bytes:20426 acl:0 sco:0 commands:146 errors:0
        Features: 0xbf 0xfe 0x8f 0xfe 0x9b 0xff 0x79 0x83
        Packet type: DM1 DM3 DM5 DH1 DH3 DH5 HV1 HV2 HV3
        Link policy: RSWITCH SNIFF
        Link mode: SLAVE ACCEPT
        Name: 'aries'
        Class: 0x000000
        Service Classes: Unspecified
        Device Class: Miscellaneous,
        HCI Version: 2.1 (0x4)  Revision: 0x2e8
        LMP Version: 2.1 (0x4)  Subversion: 0x4217
        Manufacturer: Broadcom Corporation (15)
```
Signed-off-by: NPaweł Chmiel <pawel.mikolaj.chmiel@gmail.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

e3ca60d0

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功