提交 · 22fc4c4c9fd60427bcda00878cee94e7622cfa7a · openeuler / Kernel

18 1月, 2019 40 次提交

netfilter: conntrack: gre: switch module to be built-in · 22fc4c4c

由 Florian Westphal 提交于 1月 15, 2019

This makes the last of the modular l4 trackers 'bool'.

After this, all infrastructure to handle dynamic l4 protocol registration
becomes obsolete and can be removed in followup patches.

Old:
302824 net/netfilter/nf_conntrack.ko
 21504 net/netfilter/nf_conntrack_proto_gre.ko

New:
313728 net/netfilter/nf_conntrack.ko

Old:
   text	   data	    bss	    dec	    hex	filename
   6281	   1732	      4	   8017	   1f51	nf_conntrack_proto_gre.ko
 108356	  20613	    236	 129205	  1f8b5	nf_conntrack.ko
New:
 112095	  21381	    240	 133716	  20a54	nf_conntrack.ko

The size increase is only temporary.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

22fc4c4c

netfilter: conntrack: gre: convert rwlock to rcu · 202e651c

由 Florian Westphal 提交于 1月 15, 2019

We can use gre.  Lock is only needed when a new expectation is added.

In case a single spinlock proves to be problematic we can either add one
per netns or use an array of locks combined with net_hash_mix() or similar
to pick the 'correct' one.

But given this is only needed for an expectation rather than per packet
a single one should be ok.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

202e651c

netfilter: conntrack: handle icmp pkt_to_tuple helper via direct calls · e2e48b47

由 Florian Westphal 提交于 1月 15, 2019

rather than handling them via indirect call, use a direct one instead.
This leaves GRE as the last user of this indirect call facility.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

e2e48b47

netfilter: conntrack: handle builtin l4proto packet functions via direct calls · a47c5404

由 Florian Westphal 提交于 1月 15, 2019

The l4 protocol trackers are invoked via indirect call: l4proto->packet().

With one exception (gre), all l4trackers are builtin, so we can make
.packet optional and use a direct call for most protocols.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

a47c5404

netfilter: nf_tables: Support RULE_ID reference in new rule · 75dd48e2

由 Phil Sutter 提交于 1月 14, 2019

To allow for a batch to contain rules in arbitrary ordering, introduce
NFTA_RULE_POSITION_ID attribute which works just like NFTA_RULE_POSITION
but contains the ID of another rule within the same batch. This helps
iptables-nft-restore handling dumps with mixed insert/append commands
correctly.

Note that NFTA_RULE_POSITION takes precedence over
NFTA_RULE_POSITION_ID, so if the former is present, the latter is
ignored.
Signed-off-by: NPhil Sutter <phil@nwl.cc>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

75dd48e2

netfilter: physdev: relax br_netfilter dependency · 8e2f311a

由 Florian Westphal 提交于 1月 11, 2019

Following command:
  iptables -D FORWARD -m physdev ...
causes connectivity loss in some setups.

Reason is that iptables userspace will probe kernel for the module revision
of the physdev patch, and physdev has an artificial dependency on
br_netfilter (xt_physdev use makes no sense unless a br_netfilter module
is loaded).

This causes the "phydev" module to be loaded, which in turn enables the
"call-iptables" infrastructure.

bridged packets might then get dropped by the iptables ruleset.

The better fix would be to change the "call-iptables" defaults to 0 and
enforce explicit setting to 1, but that breaks backwards compatibility.

This does the next best thing: add a request_module call to checkentry.
This was a stray '-D ... -m physdev' won't activate br_netfilter
anymore.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

8e2f311a

netfilter: conntrack: remove helper hook again · 827318fe

由 Florian Westphal 提交于 1月 09, 2019

place them into the confirm one.

Old:
 hook (300): ipv4/6_help() first call helper, then seqadj.
 hook (INT_MAX): confirm

Now:
 hook (INT_MAX): confirm, first call helper, then seqadj, then confirm

Not having the extra call is noticeable in bechmarks.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

827318fe

netfilter: nf_tables: add direct calls for all builtin expressions · 10870dd8

由 Florian Westphal 提交于 1月 08, 2019

With CONFIG_RETPOLINE its faster to add an if (ptr == &foo_func)
check and and use direct calls for all the built-in expressions.

~15% improvement in pathological cases.

checkpatch doesn't like the X macro due to the embedded return statement,
but the macro has a very limited scope so I don't think its a problem.

I would like to avoid bugs of the form
  If (e->ops->eval == (unsigned long)nft_foo_eval)
	 nft_bar_eval();

and open-coded if ()/else if()/else cascade, thus the macro.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

10870dd8

netfilter: nf_tables: handle nft_object lookups via rhltable · 4d44175a

由 Florian Westphal 提交于 1月 08, 2019

Instead of linear search, use rhlist interface to look up the objects.
This fixes rulesets with thousands of named objects (quota, counters and
the like).

We only use a single table for this and consider the address of the
table we're doing the lookup in as a part of the key.

This reduces restore time of a sample ruleset with ~20k named counters
from 37 seconds to 0.8 seconds.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

4d44175a

netfilter: nf_tables: prepare nft_object for lookups via hashtable · d152159b

由 Florian Westphal 提交于 1月 08, 2019

Add a 'key' structure for object, so we can look them up by name + table
combination (the name can be the same in each table).
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

d152159b

Merge branch 'tcp_openreq_child' · 435f3f26

由 David S. Miller 提交于 1月 17, 2019

Eric Dumazet says:

====================
tcp: remove code from tcp_create_openreq_child()

tcp_create_openreq_child() is essentially cloning a listener, then
must initialize some fields that can not be inherited.

Listeners are either fresh sockets, or sockets that came through
tcp_disconnect() after a session that dirtied many fields.

By moving code to tcp_disconnect(), we can shorten time taken
to create a clone, since tcp_disconnect() operation is very
unlikely.
====================
Acked-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

435f3f26

tcp: move rx_opt & syn_data_acked init to tcp_disconnect() · 6bcdc40d

由 Eric Dumazet 提交于 1月 17, 2019

If we make sure all listeners have these fields cleared, then a clone
will also inherit zero values.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6bcdc40d

tcp: move tp->rack init to tcp_disconnect() · 792c4354

由 Eric Dumazet 提交于 1月 17, 2019

If we make sure all listeners have proper tp->rack value,
then a clone will also inherit proper initial value.

Note that fresh sockets init tp->rack from tcp_init_sock()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

792c4354

tcp: move app_limited init to tcp_disconnect() · 6cda8b74

由 Eric Dumazet 提交于 1月 17, 2019

If we make sure all listeners have app_limited set to ~0U,
then a clone will also inherit proper initial value.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6cda8b74

tcp: move retrans_out, sacked_out, tlp_high_seq, last_oow_ack_time init to tcp_disconnect() · 5c701549

由 Eric Dumazet 提交于 1月 17, 2019

If we make sure all listeners have these fields cleared, then a clone
will also inherit zero values.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c701549

tcp: do not clear urg_data in tcp_create_openreq_child · 5d836764

由 Eric Dumazet 提交于 1月 17, 2019

All listeners have this field cleared already, since tcp_disconnect()
clears it and newly created sockets have also a zero value here.

So a clone will inherit a zero value here.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d836764

tcp: move snd_cwnd & snd_cwnd_cnt init to tcp_disconnect() · 3a9a57f6

由 Eric Dumazet 提交于 1月 17, 2019

Passive connections can inherit proper value by cloning,
if we make sure all listeners have the proper values there.

tcp_disconnect() was setting snd_cwnd to 2, which seems
quite obsolete since IW10 adoption.

Also remove an obsolete comment.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3a9a57f6

tcp: move mdev_us init to tcp_disconnect() · b9e2e689

由 Eric Dumazet 提交于 1月 17, 2019

If we make sure a listener always has its mdev_us
field set to TCP_TIMEOUT_INIT, we do not need to rewrite
this field after a new clone is created.

tcp_disconnect() is very seldom used in real applications.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b9e2e689

tcp: do not clear srtt_us in tcp_create_openreq_child · a0070e46

由 Eric Dumazet 提交于 1月 17, 2019

All listeners have this field cleared already, since tcp_disconnect()
clears it and newly created sockets have also a zero value here.

So a clone will inherit a zero value here.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a0070e46

tcp: do not clear packets_out in tcp_create_openreq_child() · eb2c80ca

由 Eric Dumazet 提交于 1月 17, 2019

New sockets have this field cleared, and tcp_disconnect()
calls tcp_write_queue_purge() which among other things
also clear tp->packets_out

So a listener is guaranteed to have this field cleared.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eb2c80ca

tcp: move icsk_rto init to tcp_disconnect() · 6a408147

由 Eric Dumazet 提交于 1月 17, 2019

If we make sure a listener always has its icsk_rto
field set to TCP_TIMEOUT_INIT, we do not need to rewrite
this field after a new clone is created.

tcp_disconnect() is very seldom used in real applications.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6a408147

tcp: do not set snd_ssthresh in tcp_create_openreq_child() · b84235e2

由 Eric Dumazet 提交于 1月 17, 2019

New sockets get the field set to TCP_INFINITE_SSTHRESH in tcp_init_sock()
In case a socket had this field changed and transitions to TCP_LISTEN
state, tcp_disconnect() also makes sure snd_ssthresh is set to
TCP_INFINITE_SSTHRESH.

So a listener has this field set to TCP_INFINITE_SSTHRESH already.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b84235e2

net/mlx4: remove unneeded semicolon · bec03deb

由 YueHaibing 提交于 1月 17, 2019

Remove unneeded semicolon.
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bec03deb

net: ethernet: ti: cpsw-phy-sel: remove unneeded semicolon · 5c423d71

由 YueHaibing 提交于 1月 17, 2019

Remove unneeded semicolon.
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c423d71

tipc: remove unneeded semicolon in trace.c · d4fb30f6

由 YueHaibing 提交于 1月 17, 2019

Remove unneeded semicolon
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4fb30f6

qed: remove duplicated include from qed_if.h · 8b59bfe8

由 YueHaibing 提交于 1月 17, 2019

Remove duplicated include.
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Acked-by: NDenis Bolotin <dbolotin@marvell.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8b59bfe8

sb1000: fix a couple of indentation issues and remove assignment in if statements · 6394d98d

由 Colin Ian King 提交于 1月 17, 2019

There is an if statement and a return statement that are incorrectly
indented. Fix these.  Also replace the assignment-in-if statements
to assignment followed by an if to keep to the coding style.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6394d98d

net: add a route cache full diagnostic message · 22c2ad61

由 Peter Oskolkov 提交于 1月 16, 2019

In some testing scenarios, dst/route cache can fill up so quickly
that even an explicit GC call occasionally fails to clean it up. This leads
to sporadically failing calls to dst_alloc and "network unreachable" errors
to the user, which is confusing.

This patch adds a diagnostic message to make the cause of the failure
easier to determine.
Signed-off-by: NPeter Oskolkov <posk@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22c2ad61

dpaa2-eth: Fix ndo_stop routine · 68d74315

由 Ioana Ciocoi Radulescu 提交于 1月 16, 2019

In the current implementation, on interface down we disabled NAPI and
then manually drained any remaining ingress frames. This could lead
to a situation when, under heavy traffic, the data availability
notification for some of the channels would not get rearmed correctly.

Change the implementation such that we let all remaining ingress frames
be processed as usual and only disable NAPI once the hardware queues
are empty.

We also add a wait on the Tx side, to allow hardware time to process
all in-flight Tx frames before issueing the disable command.
Signed-off-by: NIoana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

68d74315

wan: dscc4: fix various indentation issues · 5191673b

由 Colin Ian King 提交于 1月 16, 2019

There are some lines that have indentation issues, fix these.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5191673b

Merge branch 'vxlan-FDB-veto' · 039d52e1

由 David S. Miller 提交于 1月 17, 2019

Petr Machata says:

====================
vxlan: Allow vetoing FDB operations

mlxsw does not implement handling of the more advanced types of VXLAN
FDB entries. In order to provide visibility to users, it is important to
be able to reject such FDB entries, ideally with an explanation passed
in extended ack. This patch set implements this.

In patches #1-#4, vxlan is gradually transformed to support vetoing of
FDB entries added (or modified) through vxlan_fdb_update(), and the
default FDB entry added in __vxlan_dev_create().

Patches #5-#7 deal with vxlan_changelink(). The existing code recognizes
that vxlan_fdb_update() may fail, but doesn't attempt to keep things
intact if it does. These patches change the function in several steps to
gracefully handle vetoes (or other failures).

Then in patches #8-#11, extack arguments are added, respectively, to
ndo_fdb_add(), mlxsw's mlxsw_sp_nve_ops.fdb_replay, the functions that
connect to the VXLAN vetoing code, and call_switchdev_notifiers(). Note
that call_switchdev_blocking_notifiers() already does support extack.

Finally in patch #12, mlxsw is extended to add extack messages to
rejected FDB entries. In patch #13, the functionality is tested.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

039d52e1

selftests: mlxsw: Test veto of unsupported VXLAN FDBs · 7e1046fd

由 Petr Machata 提交于 1月 16, 2019

mlxsw doesn't implement offloading of all types of FDB entries that the
VXLAN driver supports. Test that such FDB entries are rejected. That
makes sure that the decision made by the existing validation code in
mlxsw propagates up the stack. It also exercises rollback functionality
in VXLAN, and tests that extack is returned.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e1046fd

mlxsw: spectrum: Add extack messages to VXLAN FDB rejection · a40313d9

由 Petr Machata 提交于 1月 16, 2019

Annotate the rejections in mlxsw_sp_switchdev_vxlan_work_prepare() with
textual reasons.

Because this code ends up being invoked for FDB replay as well, drop the
default message from there, so that the more accurate error message is
not overwritten.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a40313d9

switchdev: Add extack argument to call_switchdev_notifiers() · 6685987c

由 Petr Machata 提交于 1月 16, 2019

A follow-up patch will enable vetoing of FDB entries. Make it possible
to communicate details of why an FDB entry is not acceptable back to the
user.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6685987c

vxlan: Add extack to switchdev operations · 4c59b7d1

由 Petr Machata 提交于 1月 16, 2019

There are four sources of VXLAN switchdev notifier calls:

- the changelink() link operation, which already supports extack,
- ndo_fdb_add() which got extack support in a previous patch,
- FDB updates due to packet forwarding,
- and vxlan_fdb_replay().

Extend vxlan_fdb_switchdev_call_notifiers() to include extack in the
switchdev message that it sends, and propagate the argument upwards to
the callers. For the first two cases, pass in the extack gotten through
the operation. For case #3, pass in NULL.

To cover the last case, extend vxlan_fdb_replay() to take extack
argument, which might come from whatever operation necessitated the FDB
replay.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4c59b7d1

mlxsw: Add extack to mlxsw_sp_nve_ops.fdb_replay · d907f58f

由 Petr Machata 提交于 1月 16, 2019

A follow-up patch will extend vxlan_fdb_replay() with an extack
argument. Extend the fdb_replay callback in mlxsw likewise so that the
argument is ready for the vxlan conversion.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d907f58f

net: Add extack argument to ndo_fdb_add() · 87b0984e

由 Petr Machata 提交于 1月 16, 2019

Drivers may not be able to support certain FDB entries, and an error
code is insufficient to give clear hints as to the reasons of rejection.

In order to make it possible to communicate the rejection reason, extend
ndo_fdb_add() with an extack argument. Adapt the existing
implementations of ndo_fdb_add() to take the parameter (and ignore it).
Pass the extack parameter when invoking ndo_fdb_add() from rtnl_fdb_add().
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

87b0984e

vxlan: changelink: Delete remote after update · 1cdc98c2

由 Petr Machata 提交于 1月 16, 2019

If a change in remote address prompts a change in a default FDB entry,
that change might be vetoed. If that happens, it would then be necessary
to reinstate the already-removed default FDB entry corresponding to the
previous remote address.

Instead, arrange to have the previous address removed only after the
FDB is successfully vetted.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1cdc98c2

vxlan: changelink: Postpone vxlan_config_apply() · 038a5a99

由 Petr Machata 提交于 1月 16, 2019

When an FDB entry is vetoed, it is necessary to unroll the changes that
have already been done. To avoid having to unroll vxlan_config_apply(),
postpone the call after the point where the vetoing takes place. Since
the call can't fail, it doesn't necessitate any cleanups in the
preceding FDB update logic.

Correspondingly, move down the mod_timer() call as well.

References to *dst need to be replaced with references to conf.
Additionally, old_dst and old_age_interval are not necessary anymore,
and therefore drop them.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

038a5a99

vxlan: changelink: Inline vxlan_dev_configure() · 8db9427d

由 Petr Machata 提交于 1月 16, 2019

The changelink operation may cause change in remote address, and
therefore an FDB update, which can be vetoed. To properly handle
vetoing, vxlan_changelink() needs to be gradually updated.

In this patch simply replace vxlan_dev_configure() with the two
constituent calls.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8db9427d

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功