提交 · 26912e3756d0a13b188142d1ba0ab279cd3b657a · openeuler / Kernel

20 12月, 2018 6 次提交

xfrm: use secpath_exist where applicable · 26912e37

由 Florian Westphal 提交于 12月 18, 2018

Will reduce noise when skb->sp is removed later in this series.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

26912e37

net: use skb_sec_path helper in more places · 2294be0f

由 Florian Westphal 提交于 12月 18, 2018

skb_sec_path gains 'const' qualifier to avoid
xt_policy.c: 'skb_sec_path' discards 'const' qualifier from pointer target type

same reasoning as previous conversions: Won't need to touch these
spots anymore when skb->sp is removed.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2294be0f

net: move secpath_exist helper to sk_buff.h · 7af8f4ca

由 Florian Westphal 提交于 12月 18, 2018

Future patch will remove skb->sp pointer.
To reduce noise in those patches, move existing helper to
sk_buff and use it in more places to ease skb->sp replacement later.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7af8f4ca

xfrm: change secpath_set to return secpath struct, not error value · 0ca64da1

由 Florian Westphal 提交于 12月 18, 2018

It can only return 0 (success) or -ENOMEM.
Change return value to a pointer to secpath struct.

This avoids direct access to skb->sp:

err = secpath_set(skb);
if (!err) ..
skb->sp-> ...

Becomes:
sp = secpath_set(skb)
if (!sp) ..
sp-> ..

This reduces noise in followup patch which is going to remove skb->sp.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ca64da1

net: convert bridge_nf to use skb extension infrastructure · de8bda1d

由 Florian Westphal 提交于 12月 18, 2018

This converts the bridge netfilter (calling iptables hooks from bridge)
facility to use the extension infrastructure.

The bridge_nf specific hooks in skb clone and free paths are removed, they
have been replaced by the skb_ext hooks that do the same as the bridge nf
allocations hooks did.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de8bda1d

netfilter: avoid using skb->nf_bridge directly · c4b0e771

由 Florian Westphal 提交于 12月 18, 2018

This pointer is going to be removed soon, so use the existing helpers in
more places to avoid noise when the removal happens.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c4b0e771

18 12月, 2018 4 次提交

mac80211: propagate the support for TWT to the driver · 55ebd6e6

由 Emmanuel Grumbach 提交于 12月 15, 2018

TWT is a feature that was added in 11ah and enhanced in
11ax. There are two bits that need to be set if we want
to use the feature in 11ax: one in the HE Capability IE
and one in the Extended Capability IE. This is because
of backward compatibility between 11ah and 11ax.

In order to simplify the flow for the low level driver
in managed mode, aggregate the two bits and add a boolean
that tells whether TWT is supported or not, but only if
11ax is supported.
Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

55ebd6e6

mac80211: document RCU requirements for ieee80211_tx_dequeue() · fca1279f

由 Johannes Berg 提交于 12月 15, 2018

In the iwlwifi conversion, we sometimes call this from outside
of the wake_tx_queue() method, and in those cases must be in an
RCU critical section. Document this requirement.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

fca1279f

cfg80211: clarify LCI/civic location documentation · 30db641e

由 Johannes Berg 提交于 12月 15, 2018

The older code and current userspace assumed that this data
is the content of the Measurement Report element, starting
with the Measurement Token. Clarify this in the documentation.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

30db641e

wireless: FTM: fix kernel-doc "cannot understand" warnings · 3453de98

由 Randy Dunlap 提交于 12月 06, 2018

Fix kernel-doc warnings in FTM due to missing "struct" keyword.

Fixes 109 warnings from <net/cfg80211.h>:
../include/net/cfg80211.h:2838: warning: cannot understand function prototype: 'struct cfg80211_ftm_responder_stats '

and fixes 88 warnings from <net/mac80211.h>:
../include/net/mac80211.h:477: warning: cannot understand function prototype: 'struct ieee80211_ftm_responder_params '

Fixes: 81e54d08 ("cfg80211: support FTM responder configuration/statistics")
Fixes: bc847970 ("mac80211: support FTM responder configuration/statistics")
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Pradeep Kumar Chitrapu <pradeepc@codeaurora.org>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: David Spinadel <david.spinadel@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

3453de98

17 12月, 2018 2 次提交

net: dsa: ksz: Rename NET_DSA_TAG_KSZ to _KSZ9477 · 39d6b96f

由 Tristram Ha 提交于 12月 15, 2018

Rename the tag Kconfig option and related macros in preparation for
addition of new KSZ family switches with different tag formats.
Signed-off-by: NTristram Ha <Tristram.Ha@microchip.com>
Signed-off-by: NMarek Vasut <marex@denx.de>
Cc: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Cc: Woojung Huh <woojung.huh@microchip.com>
Cc: David S. Miller <davem@davemloft.net>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

39d6b96f

neighbor: Add protocol attribute · df9b0e30

由 David Ahern 提交于 12月 15, 2018

Similar to routes and rules, add protocol attribute to neighbor entries
for easier tracking of how each was created.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df9b0e30

16 12月, 2018 3 次提交

net: use indirect call wrappers at GRO transport layer · 028e0a47

由 Paolo Abeni 提交于 12月 14, 2018

This avoids an indirect call in the receive path for TCP and UDP
packets. TCP takes precedence on UDP, so that we have a single
additional conditional in the common case.

When IPV6 is build as module, all gro symbols except UDPv6 are
builtin, while the latter belong to the ipv6 module, so we
need some special care.

v1 -> v2:
 - adapted to INDIRECT_CALL_ changes
v2 -> v3:
 - fix build issue with CONFIG_IPV6=m
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

028e0a47

net: use indirect call wrappers at GRO network layer · aaa5d90b

由 Paolo Abeni 提交于 12月 14, 2018

This avoids an indirect calls for L3 GRO receive path, both
for ipv4 and ipv6, if the latter is not compiled as a module.

Note that when IPv6 is compiled as builtin, it will be checked first,
so we have a single additional compare for the more common path.

v1 -> v2:
 - adapted to INDIRECT_CALL_ changes
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aaa5d90b

neighbor: Improve neighbour struct layout · 4b7cd11f

由 David Ahern 提交于 12月 13, 2018

Move arp_queue_len_bytes ahead of arp_queue to remove two 4-byte holes.
Ensure ha element is always 8-byte aligned.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b7cd11f

15 12月, 2018 2 次提交

neighbor: Move neigh_update_ext_learned to core file · 526f1b58

由 David Ahern 提交于 12月 11, 2018

neigh_update_ext_learned has one caller in neighbour.c so does not need
to be defined in the header. Move it and in the process remove the
intialization of ndm_flags and just set it based on the flags check.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

526f1b58

net_sched: fold tcf_block_cb_call() into tc_setup_cb_call() · aeb3fecd

由 Cong Wang 提交于 12月 11, 2018

After commit 69bd4840 ("net/sched: Remove egdev mechanism"),
tc_setup_cb_call() is nearly identical to tcf_block_cb_call(),
so we can just fold tcf_block_cb_call() into tc_setup_cb_call()
and remove its unused parameter 'exts'.

Fixes: 69bd4840 ("net/sched: Remove egdev mechanism")
Cc: Oz Shlomo <ozsh@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NOz Shlomo <ozsh@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aeb3fecd

13 12月, 2018 3 次提交

net: switchdev: Add extack to switchdev_handle_port_obj_add() callback · 69213513

由 Petr Machata 提交于 12月 12, 2018

Drivers use switchdev_handle_port_obj_add() to handle recursive descent
through lower devices. Change this function prototype to take add_cb
that itself takes an extack argument. Decode extack from
switchdev_notifier_port_obj_info and pass it to add_cb.

Update mlxsw and ocelot drivers which use this helper.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NIvan Vecera <ivecera@redhat.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

69213513

net: switchdev: Add extack to struct switchdev_notifier_info · 479c86dc

由 Petr Machata 提交于 12月 12, 2018

In order to pass extack to the drivers that need it, add an extack field
to struct switchdev_notifier_info, and an extack argument to the
function call_switchdev_blocking_notifiers(). Also add a helper function
switchdev_notifier_info_to_extack().
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NIvan Vecera <ivecera@redhat.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

479c86dc

net: switchdev: Add extack argument to switchdev_port_obj_add() · 69b7320e

由 Petr Machata 提交于 12月 12, 2018

After the previous patch, bridge driver has extack argument available to
pass to switchdev. Therefore extend switchdev_port_obj_add() with this
argument, updating all callers, and passing the argument through to
switchdev_port_obj_notify().
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NIvan Vecera <ivecera@redhat.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

69b7320e

11 12月, 2018 2 次提交

net/sched: Remove egdev mechanism · 69bd4840

由 Oz Shlomo 提交于 11月 06, 2018

The egdev mechanism was replaced by the TC indirect block notifications
platform.
Signed-off-by: NOz Shlomo <ozsh@mellanox.com>
Reviewed-by: NEli Britstein <elibr@mellanox.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Cc: John Hurley <john.hurley@netronome.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

69bd4840

net: Add netif_is_gretap()/netif_is_ip6gretap() · 0621e6fc

由 Oz Shlomo 提交于 11月 21, 2018

Changed the is_gretap_dev and is_ip6gretap_dev logic from structure
comparison to string comparison of the rtnl_link_ops kind field.

This approach aligns with the current identification methods and function
names of vxlan and geneve network devices.

Convert mlxsw to use these helpers and use them in downstream mlx5 patch.
Signed-off-by: NOz Shlomo <ozsh@mellanox.com>
Reviewed-by: NEli Britstein <elibr@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

0621e6fc

10 12月, 2018 1 次提交

xfrm: clean an indentation issue, remove a space · 77990464

由 Colin Ian King 提交于 12月 06, 2018

Trivial fix to clean up indentation issue, remove an extraneous
space.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

77990464

08 12月, 2018 4 次提交

neighbour: Avoid writing before skb->head in neigh_hh_output() · e6ac64d4

由 Stefano Brivio 提交于 12月 06, 2018

While skb_push() makes the kernel panic if the skb headroom is less than
the unaligned hardware header size, it will proceed normally in case we
copy more than that because of alignment, and we'll silently corrupt
adjacent slabs.

In the case fixed by the previous patch,
"ipv6: Check available headroom in ip6_xmit() even without options", we
end up in neigh_hh_output() with 14 bytes headroom, 14 bytes hardware
header and write 16 bytes, starting 2 bytes before the allocated buffer.

Always check we're not writing before skb->head and, if the headroom is
not enough, warn and drop the packet.

v2:
 - instead of panicking with BUG_ON(), WARN_ON_ONCE() and drop the packet
   (Eric Dumazet)
 - if we avoid the panic, though, we need to explicitly check the headroom
   before the memcpy(), otherwise we'll have corrupted slabs on a running
   kernel, after we warn
 - use __skb_push() instead of skb_push(), as the headroom check is
   already implemented here explicitly (Eric Dumazet)
Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e6ac64d4

neighbor: Improve garbage collection · 58956317

由 David Ahern 提交于 12月 07, 2018

The existing garbage collection algorithm has a number of problems:

1. The gc algorithm will not evict PERMANENT entries as those entries
   are managed by userspace, yet the existing algorithm walks the entire
   hash table which means it always considers PERMANENT entries when
   looking for entries to evict. In some use cases (e.g., EVPN) there
   can be tens of thousands of PERMANENT entries leading to wasted
   CPU cycles when gc kicks in. As an example, with 32k permanent
   entries, neigh_alloc has been observed taking more than 4 msec per
   invocation.

2. Currently, when the number of neighbor entries hits gc_thresh2 and
   the last flush for the table was more than 5 seconds ago gc kicks in
   walks the entire hash table evicting *all* entries not in PERMANENT
   or REACHABLE state and not marked as externally learned. There is no
   discriminator on when the neigh entry was created or if it just moved
   from REACHABLE to another NUD_VALID state (e.g., NUD_STALE).

   It is possible for entries to be created or for established neighbor
   entries to be moved to STALE (e.g., an external node sends an ARP
   request) right before the 5 second window lapses:

        -----|---------x|----------|-----
            t-5         t         t+5

   If that happens those entries are evicted during gc causing unnecessary
   thrashing on neighbor entries and userspace caches trying to track them.

   Further, this contradicts the description of gc_thresh2 which says
   "Entries older than 5 seconds will be cleared".

   One workaround is to make gc_thresh2 == gc_thresh3 but that negates the
   whole point of having separate thresholds.

3. Clearing *all* neigh non-PERMANENT/REACHABLE/externally learned entries
   when gc_thresh2 is exceeded is over kill and contributes to trashing
   especially during startup.

This patch addresses these problems as follows:

1. Use of a separate list_head to track entries that can be garbage
   collected along with a separate counter. PERMANENT entries are not
   added to this list.

   The gc_thresh parameters are only compared to the new counter, not the
   total entries in the table. The forced_gc function is updated to only
   walk this new gc_list looking for entries to evict.

2. Entries are added to the list head at the tail and removed from the
   front.

3. Entries are only evicted if they were last updated more than 5 seconds
   ago, adhering to the original intent of gc_thresh2.

4. Forced gc is stopped once the number of gc_entries drops below
   gc_thresh2.

5. Since gc checks do not apply to PERMANENT entries, gc levels are skipped
   when allocating a new neighbor for a PERMANENT entry. By extension this
   means there are no explicit limits on the number of PERMANENT entries
   that can be created, but this is no different than FIB entries or FDB
   entries.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

58956317

vxlan: Add vxlan_fdb_clear_offload() · e5ff4b19

由 Petr Machata 提交于 12月 07, 2018

When a driver unoffloads all FDB entries en bloc, it's inefficient to
send the switchdev notification one by one. Add a helper that walks the
FDB table, unsetting the offload flag on RDST with a given VNI.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e5ff4b19

vxlan: Add vxlan_fdb_replay() · 4f89f5b5

由 Petr Machata 提交于 12月 07, 2018

When a VXLAN device becomes relevant to a driver (such as when it is
attached to an offloaded bridge), the driver will generally need to walk
the existing FDB entries and offload them.

Add a function vxlan_fdb_replay() to call a given notifier block for
each FDB entry with a given VNI.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4f89f5b5

07 12月, 2018 1 次提交

net: dsa: Add overhead to tag protocol ops. · a5dd3087

由 Andrew Lunn 提交于 12月 06, 2018

Each DSA tag protocol needs to add additional headers to the Ethernet
frame in order to direct it towards a specific switch egress port. It
must also remove the head from a frame received from a
switch. Indicate the maximum size of these headers in the tag protocol
ops structure, so the core can take these overheads into account.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5dd3087

06 12月, 2018 1 次提交

sctp: frag_point sanity check · afd0a800

由 Jakub Audykowicz 提交于 12月 04, 2018

If for some reason an association's fragmentation point is zero,
sctp_datamsg_from_user will try to endlessly try to divide a message
into zero-sized chunks. This eventually causes kernel panic due to
running out of memory.

Although this situation is quite unlikely, it has occurred before as
reported. I propose to add this simple last-ditch sanity check due to
the severity of the potential consequences.
Signed-off-by: NJakub Audykowicz <jakub.audykowicz@gmail.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

afd0a800

05 12月, 2018 1 次提交

tcp: reduce POLLOUT events caused by TCP_NOTSENT_LOWAT · a74f0fa0

由 Eric Dumazet 提交于 12月 04, 2018

TCP_NOTSENT_LOWAT socket option or sysctl was added in linux-3.12
as a step to enable bigger tcp sndbuf limits.

It works reasonably well, but the following happens :

Once the limit is reached, TCP stack generates
an [E]POLLOUT event for every incoming ACK packet.

This causes a high number of context switches.

This patch implements the strategy David Miller added
in sock_def_write_space() :

 - If TCP socket has a notsent_lowat constraint of X bytes,
   allow sendmsg() to fill up to X bytes, but send [E]POLLOUT
   only if number of notsent bytes is below X/2

This considerably reduces TCP_NOTSENT_LOWAT overhead,
while allowing to keep the pipe full.

Tested:
 100 ms RTT netem testbed between A and B, 100 concurrent TCP_STREAM

A:/# cat /proc/sys/net/ipv4/tcp_wmem
4096	262144	64000000
A:/# super_netperf 100 -H B -l 1000 -- -K bbr &

A:/# grep TCP /proc/net/sockstat
TCP: inuse 203 orphan 0 tw 19 alloc 414 mem 1364904 # This is about 54 MB of memory per flow :/

A:/# vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 256220672  13532 694976    0    0    10     0   28   14  0  1 99  0  0
 2  0      0 256320016  13532 698480    0    0   512     0 715901 5927  0 10 90  0  0
 0  0      0 256197232  13532 700992    0    0   735    13 771161 5849  0 11 89  0  0
 1  0      0 256233824  13532 703320    0    0   512    23 719650 6635  0 11 89  0  0
 2  0      0 256226880  13532 705780    0    0   642     4 775650 6009  0 12 88  0  0

A:/# echo 2097152 >/proc/sys/net/ipv4/tcp_notsent_lowat

A:/# grep TCP /proc/net/sockstat
TCP: inuse 203 orphan 0 tw 19 alloc 414 mem 86411 # 3.5 MB per flow

A:/# vmstat 5 5  # check that context switches have not inflated too much.
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0      0 260386512  13592 662148    0    0    10     0   17   14  0  1 99  0  0
 0  0      0 260519680  13592 604184    0    0   512    13 726843 12424  0 10 90  0  0
 1  1      0 260435424  13592 598360    0    0   512    25 764645 12925  0 10 90  0  0
 1  0      0 260855392  13592 578380    0    0   512     7 722943 13624  0 11 88  0  0
 1  0      0 260445008  13592 601176    0    0   614    34 772288 14317  0 10 90  0  0
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a74f0fa0

04 12月, 2018 4 次提交

sctp: kfree_rcu asoc · fb6df5a6

由 Xin Long 提交于 12月 01, 2018

In sctp_hash_transport/sctp_epaddr_lookup_transport, it dereferences
a transport's asoc under rcu_read_lock while asoc is freed not after
a grace period, which leads to a use-after-free panic.

This patch fixes it by calling kfree_rcu to make asoc be freed after
a grace period.

Note that only the asoc's memory is delayed to free in the patch, it
won't cause sk to linger longer.

Thanks Neil and Marcelo to make this clear.

Fixes: 7fda702f ("sctp: use new rhlist interface on sctp transport rhashtable")
Fixes: cd2b7087 ("sctp: check duplicate node before inserting a new transport")
Reported-by: syzbot+0b05d8aa7cb185107483@syzkaller.appspotmail.com
Reported-by: syzbot+aad231d51b1923158444@syzkaller.appspotmail.com
Suggested-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb6df5a6

l3mdev: add function to retreive upper master · 6a6d6681

由 Alexis Bauvin 提交于 12月 03, 2018

Existing functions to retreive the l3mdev of a device did not walk the
master chain to find the upper master. This patch adds a function to
find the l3mdev, even indirect through e.g. a bridge:

+----------+
|          |
| vrf-blue |
|          |
+----+-----+
     |
     |
+----+-----+
|          |
| br-blue  |
|          |
+----+-----+
     |
     |
+----+-----+
|          |
|   eth0   |
|          |
+----------+

This will properly resolve the l3mdev of eth0 to vrf-blue.
Signed-off-by: NAlexis Bauvin <abauvin@scaleway.com>
Reviewed-by: NAmine Kherbouche <akherbouche@scaleway.com>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Tested-by: NAmine Kherbouche <akherbouche@scaleway.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6a6d6681

udp_tunnel: add config option to bind to a device · da5095d0

由 Alexis Bauvin 提交于 12月 03, 2018

UDP tunnel sockets are always opened unbound to a specific device. This
patch allow the socket to be bound on a custom device, which
incidentally makes UDP tunnels VRF-aware if binding to an l3mdev.
Signed-off-by: NAlexis Bauvin <abauvin@scaleway.com>
Reviewed-by: NAmine Kherbouche <akherbouche@scaleway.com>
Tested-by: NAmine Kherbouche <akherbouche@scaleway.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

da5095d0

devlink: Add 'fw_load_policy' generic parameter · 846e980a

由 Shalom Toledo 提交于 12月 03, 2018

Many drivers load the device's firmware image during the initialization
flow either from the flash or from the disk. Currently this option is not
controlled by the user and the driver decides from where to load the
firmware image.

'fw_load_policy' gives the ability to control this option which allows the
user to choose between different loading policies supported by the driver.

This parameter can be useful while testing and/or debugging the device. For
example, testing a firmware bug fix.
Signed-off-by: NShalom Toledo <shalomt@mellanox.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

846e980a

01 12月, 2018 5 次提交

net: reorder flowi_common fields to avoid holes · bf1c3ab8

由 Paolo Abeni 提交于 11月 28, 2018

the flowi* structures are used and memsetted by server functions
in critical path. Currently flowi_common has a couple of holes that
we can eliminate reordering the struct fields. As a side effect,
both flowi4 and flowi6 shrink by 8 bytes.

Before:
pahole -EC flowi_common
struct flowi_common {
// ...
	/* size: 40, cachelines: 1, members: 10 */
	/* sum members: 32, holes: 1, sum holes: 4 */
	/* padding: 4 */
	/* last cacheline: 40 bytes */
};
pahole -EC flowi6
struct flowi6 {
// ...
        /* size: 88, cachelines: 2, members: 6 */
        /* padding: 4 */
        /* last cacheline: 24 bytes */
};
pahole -EC flowi4
struct flowi4 {
// ...
        /* size: 56, cachelines: 1, members: 4 */
        /* padding: 4 */
        /* last cacheline: 56 bytes */
};

After:
struct flowi_common {
// ...
	/* size: 32, cachelines: 1, members: 10 */
	/* last cacheline: 32 bytes */
};
struct flowi6 {
// ...
        /* size: 80, cachelines: 2, members: 6 */
        /* padding: 4 */
        /* last cacheline: 16 bytes */
};
struct flowi4 {
// ...
        /* size: 48, cachelines: 1, members: 4 */
        /* padding: 4 */
        /* last cacheline: 48 bytes */
};
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bf1c3ab8

tcp: md5: add tcp_md5_needed jump label · 6015c71e

由 Eric Dumazet 提交于 11月 27, 2018

Most linux hosts never setup TCP MD5 keys. We can avoid a
cache line miss (accessing tp->md5ig_info) on RX and TX
using a jump label.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6015c71e

tcp: make tcp_space() aware of socket backlog · 85bdf7db

由 Eric Dumazet 提交于 11月 27, 2018

Jean-Louis Dupond reported poor iscsi TCP receive performance
that we tracked to backlog drops.

Apparently we fail to send window updates reflecting the
fact that we are under stress.

Note that we might lack a proper window increase when
backlog is fully processed, since __release_sock() clears
sk->sk_backlog.len _after_ all skbs have been processed.

This should not matter in practice. If we had a significant
load through socket backlog, we are in a dangerous
situation.
Reported-by: NJean-Louis Dupond <jean-louis@dupond.be>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Acked-by: NYuchung Cheng <ycheng@google.com>
Tested-by: Jean-Louis Dupond<jean-louis@dupond.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

85bdf7db

tcp: hint compiler about sack flows · ebeef4bc

由 Eric Dumazet 提交于 11月 27, 2018

Tell the compiler that most TCP flows are using SACK these days.

There is no need to add the unlikely() clause in tcp_is_reno(),
the compiler is able to infer it.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Acked-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ebeef4bc

net/flow_dissector: correct comments on enum flow_dissector_key_id · 91c45956

由 Edward Cree 提交于 11月 27, 2018

There are no such structs flow_dissector_key_flow_vlan or
 flow_dissector_key_flow_tags, the actual structs used are struct
 flow_dissector_key_vlan and struct flow_dissector_key_tags.  So correct the
 comments against FLOW_DISSECTOR_KEY_VLAN, FLOW_DISSECTOR_KEY_FLOW_LABEL and
 FLOW_DISSECTOR_KEY_CVLAN to refer to those.
Signed-off-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

91c45956

27 11月, 2018 1 次提交

netfilter: add missing error handling code for register functions · 584eab29

由 Taehee Yoo 提交于 11月 22, 2018

register_{netdevice/inetaddr/inet6addr}_notifier may return an error
value, this patch adds the code to handle these error paths.
Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

584eab29

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功