提交 · ca2f18be792fddd0db2bbf6cbe1ec12d1bb32dd7 · openeuler / Kernel

18 7月, 2018 13 次提交

netfilter: nf_tables: make valid_genid callback mandatory · ca2f18be

由 Florian Westphal 提交于 7月 11, 2018

always call this function, followup patch can use this to
aquire a per-netns transaction log to guard the entire batch
instead of using the nfnl susbsys mutex (which is shared among all
namespaces).
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

ca2f18be

netfilter: nf_tables: add and use helper for module autoload · 452238e8

由 Florian Westphal 提交于 7月 11, 2018

module autoload is problematic, it requires dropping the mutex that
protects the transaction.  Once the mutex has been dropped, another
client can start a new transaction before we had a chance to abort
current transaction log.

This helper makes sure we first zap the transaction log, then
drop mutex for module autoload.

In case autload is successful, the caller has to reply entire
message anyway.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

452238e8

netfilter: Remove useless param helper of nf_ct_helper_ext_add · 440534d3

由 Gao Feng 提交于 7月 09, 2018

The param helper of nf_ct_helper_ext_add is useless now, then remove
it now.
Signed-off-by: NGao Feng <gfree.wind@vip.163.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

440534d3

ipvs: drop conn templates under attack · 762c4007

由 Julian Anastasov 提交于 7月 06, 2018

Before now, connection templates were ignored by the random
dropentry procedure. But Michal Koutný suggests that we
should add exception for connections under SYN attack.
He provided patch that implements it for TCP:

<quote>

IPVS includes protection against filling the ip_vs_conn_tab by
dropping 1/32 of feasible entries every second. The template
entries (for persistent services) are never directly deleted by
this mechanism but when a picked TCP connection entry is being
dropped (1), the respective template entry is dropped too (realized
by expiring 60 seconds after the connection entry being dropped).

There is another mechanism that removes connection entries when they
time out (2), in this case the associated template entry is not deleted.
Under SYN flood template entries would accumulate (due to their entry
longer timeout).

The accumulation takes place also with drop_entry being enabled. Roughly
15% ((31/32)^60) of SYN_RECV connections survive the dropping mechanism
(1) and are removed by the timeout mechanism (2)(defaults to 60 seconds
for SYN_RECV), thus template entries would still accumulate.

The patch ensures that when a connection entry times out, we also remove
the template entry from the table. To prevent breaking persistent
services (since the connection may time out in already established state)
we add a new entry flag to protect templates what spawned at least one
established TCP connection.

</quote>

We already added ASSURED flag for the templates in previous patch, so
that we can use it now to decide which connection templates should be
dropped under attack. But we also have some cases that need special
handling.

We modify the dropentry procedure as follows:

- Linux timers currently use LIFO ordering but we can not rely on
this to drop controlling connections. So, set cp->timeout to 0
to indicate that connection was dropped and that on expiration we
should try to drop our controlling connections. As result, we can
now avoid the ip_vs_conn_expire_now call.

- move the cp->n_control check above, so that it avoids restarting
the timer for controlling connections when not needed.

- drop unassured connection templates here if they are not referred
by any connections.

On connection expiration: if connection was dropped (cp->timeout=0)
try to drop our controlling connection except if it is a template
in assured state.

In ip_vs_conn_flush change order of ip_vs_conn_expire_now calls
according to the LIFO timer expiration order. It should work
faster for controlling connections with single controlled one.
Suggested-by: NMichal Koutný <mkoutny@suse.com>
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

762c4007

ipvs: add assured state for conn templates · 27541143

由 Julian Anastasov 提交于 7月 06, 2018

cp->state was not used for templates. Add support for state bits
and for the first "assured" bit which indicates that some
connection controlled by this template was established or assured
by the real server. In a followup patch we will use it to drop
templates under SYN attack.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

27541143

ipvs: provide just conn to ip_vs_state_name · ec1b28ca

由 Julian Anastasov 提交于 7月 06, 2018

In preparation for followup patches, provide just the cp
ptr to ip_vs_state_name.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

ec1b28ca

netfilter: nf_conntrack: resolve clash for matching conntracks · ed07d9a0

由 Martynas Pumputis 提交于 7月 02, 2018

This patch enables the clash resolution for NAT (disabled in
"590b52e1") if clashing conntracks match (i.e. both tuples are equal)
and a protocol allows it.

The clash might happen for a connections-less protocol (e.g. UDP) when
two threads in parallel writes to the same socket and consequent calls
to "get_unique_tuple" return the same tuples (incl. reply tuples).

In this case it is safe to perform the resolution, as the losing CT
describes the same mangling as the winning CT, so no modifications to
the packet are needed, and the result of rules traversal for the loser's
packet stays valid.
Signed-off-by: NMartynas Pumputis <martynas@weave.works>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

ed07d9a0

netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search · 5c789e13

由 Yi-Hung Wei 提交于 7月 02, 2018

This patch is originally from Florian Westphal.

This patch does the following 3 main tasks.

1) Add list lock to 'struct nf_conncount_list' so that we can
alter the lists containing the individual connections without holding the
main tree lock.  It would be useful when we only need to add/remove to/from
a list without allocate/remove a node in the tree.  With this change, we
update nft_connlimit accordingly since we longer need to maintain
a list lock in nft_connlimit now.

2) Use RCU for the initial tree search to improve tree look up performance.

3) Add a garbage collection worker. This worker is schedule when there
are excessive tree node that needed to be recycled.

Moreover,the rbnode reclaim logic is moved from search tree to insert tree
to avoid race condition.
Signed-off-by: NYi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

5c789e13

netfilter: nf_conncount: Split insert and traversal · 34848d5c

由 Yi-Hung Wei 提交于 7月 02, 2018

This patch is originally from Florian Westphal.

When we have a very coarse grouping, e.g. by large subnets, zone id,
etc, it's likely that we do not need to do tree rotation because
we'll find a node where we can attach new entry.  Based on this
observation, we split tree traversal and insertion.

Later on, we can make traversal lockless (tree protected
by RCU), and add extra lock in the individual nodes to protect list
insertion/deletion, thereby allowing parallel insert/delete in different
tree nodes.
Signed-off-by: NYi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

34848d5c

netfilter: nf_conncount: Move locking into count_tree() · 2ba39118

由 Yi-Hung Wei 提交于 7月 02, 2018

This patch is originally from Florian Westphal.

This is a preparation patch to allow lockless traversal
of the tree via RCU.
Signed-off-by: NYi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

2ba39118

netfilter: nf_conncount: Early exit in nf_conncount_lookup() and cleanup · 976afca1

由 Yi-Hung Wei 提交于 7月 02, 2018

This patch is originally from Florian Westphal.

This patch does the following three tasks.

It applies the same early exit technique for nf_conncount_lookup().

Since now we keep the number of connections in 'struct nf_conncount_list',
we no longer need to return the count in nf_conncount_lookup().

Moreover, we expose the garbage collection function nf_conncount_gc_list()
for nft_connlimit.
Signed-off-by: NYi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

976afca1

netfilter: nf_conncount: Switch to plain list · cb2b36f5

由 Yi-Hung Wei 提交于 7月 02, 2018

Original patch is from Florian Westphal.

This patch switches from hlist to plain list to store the list of
connections with the same filtering key in nf_conncount. With the
plain list, we can insert new connections at the tail, so over time
the beginning of list holds long-running connections and those are
expired, while the newly creates ones are at the end.

Later on, we could probably move checked ones to the end of the list,
so the next run has higher chance to reclaim stale entries in the front.
Signed-off-by: NYi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

cb2b36f5

netfilter: nf_conncount: Early exit for garbage collection · 2a406e8a

由 Yi-Hung Wei 提交于 7月 02, 2018

This patch is originally from Florian Westphal.

We use an extra function with early exit for garbage collection.
It is not necessary to traverse the full list for every node since
it is enough to zap a couple of entries for garbage collection.
Signed-off-by: NYi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

2a406e8a

17 7月, 2018 2 次提交

netfilter: Kconfig: Change select IPv6 dependencies · 5d400a49

由 Máté Eckl 提交于 7月 10, 2018

... from IPV6 to NF_TABLES_IPV6 and IP6_NF_IPTABLES.

In some cases module selects depend on IPV6, but this means that they
select another module even if eg. NF_TABLES_IPV6 is not set in which
case the selected module is useless due to the lack of IPv6 nf_tables
functionality.

The same applies for IP6_NF_IPTABLES and iptables.

Joint work with: Arnd Bermann <arnd@arndb.de>
Signed-off-by: NMáté Eckl <ecklm94@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

5d400a49

netfilter: conntrack: remove l3proto abstraction · a0ae2562

由 Florian Westphal 提交于 6月 29, 2018

This unifies ipv4 and ipv6 protocol trackers and removes the l3proto
abstraction.

This gets rid of all l3proto indirect calls and the need to do
a lookup on the function to call for l3 demux.

It increases module size by only a small amount (12kbyte), so this reduces
size because nf_conntrack.ko is useless without either nf_conntrack_ipv4
or nf_conntrack_ipv6 module.

before:
   text    data     bss     dec     hex filename
   7357    1088       0    8445    20fd nf_conntrack_ipv4.ko
   7405    1084       4    8493    212d nf_conntrack_ipv6.ko
  72614   13689     236   86539   1520b nf_conntrack.ko
 19K nf_conntrack_ipv4.ko
 19K nf_conntrack_ipv6.ko
179K nf_conntrack.ko

after:
   text    data     bss     dec     hex filename
  79277   13937     236   93450   16d0a nf_conntrack.ko
  191K nf_conntrack.ko
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

a0ae2562

16 7月, 2018 23 次提交

netfilter: conntrack: remove get_timeout() indirection · c779e849

由 Florian Westphal 提交于 6月 29, 2018

Not needed, we can have the l4trackers fetch it themselvs.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

c779e849

netfilter: conntrack: avoid l4proto pkt_to_tuple calls · 97e08cae

由 Florian Westphal 提交于 6月 29, 2018

Handle common protocols (udp, tcp, ..), in the core and only
do the call if needed by the l4proto tracker.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

97e08cae

netfilter: conntrack: avoid calls to l4proto invert_tuple · 8b3892ea

由 Florian Westphal 提交于 6月 29, 2018

Handle the common cases (tcp, udp, etc). in the core and only
do the indirect call for the protocols that need it (GRE for instance).
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

8b3892ea

netfilter: conntrack: remove get_l4proto indirection from l3 protocol trackers · 6816d931

由 Florian Westphal 提交于 6月 29, 2018

Handle it in the core instead.

ipv6_skip_exthdr() is built-in even if ipv6 is a module, i.e. this
doesn't create an ipv6 dependency.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

6816d931

netfilter: conntrack: remove invert_tuple indirection from l3 protocol trackers · d1b6fe94

由 Florian Westphal 提交于 6月 29, 2018

Its simpler to just handle it directly in nf_ct_invert_tuple().
Also gets rid of need to pass l3proto pointer to resolve_conntrack().
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

d1b6fe94

F
netfilter: conntrack: remove pkt_to_tuple indirection from l3 protocol trackers · 47a91b14
由 Florian Westphal 提交于 6月 29, 2018
```
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
```
47a91b14

netfilter: conntrack: remove ctnetlink callbacks from l3 protocol trackers · f957be9d

由 Florian Westphal 提交于 6月 29, 2018

handle everything from ctnetlink directly.

After all these years we still only support ipv4 and ipv6, so it
seems reasonable to remove l3 protocol tracker support and instead
handle ipv4/ipv6 from a common, always builtin inet tracker.

Step 1: Get rid of all the l3proto->func() calls.

Start with ctnetlink, then move on to packet-path ones.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

f957be9d

netfilter: Kconfig: Make NETFILTER_XT_MATCH_SOCKET select NF_SOCKET_IPV4/6 · 7414d929

由 Máté Eckl 提交于 6月 28, 2018

Instead of depending on it.
Signed-off-by: NMáté Eckl <ecklm94@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

7414d929

openvswitch: use nf_ct_get_tuplepr, invert_tuplepr · 60e3be94

由 Florian Westphal 提交于 6月 25, 2018

These versions deal with the l3proto/l4proto details internally.
It removes only caller of nf_ct_get_tuple, so make it static.

After this, l3proto->get_l4proto() can be removed in a followup patch.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

60e3be94

netfilter: utils: move nf_ip6_checksum* from ipv6 to utils · ebee5a50

由 Florian Westphal 提交于 6月 25, 2018

similar to previous change, this also allows to remove it
from nf_ipv6_ops and avoid the indirection.

It also removes the bogus dependency of nf_conntrack_ipv6 on ipv6 module:
ipv6 checksum functions are built into kernel even if CONFIG_IPV6=m,
but ipv6/netfilter.o isn't.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

ebee5a50

netfilter: utils: move nf_ip_checksum* from ipv4 to utils · d7e5a9a5

由 Florian Westphal 提交于 6月 25, 2018

allows to make nf_ip_checksum_partial static, it no longer
has an external caller.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

d7e5a9a5

netfilter: nft_tproxy: Move nf_tproxy_assign_sock() to nf_tproxy.h · f286586d

由 Máté Eckl 提交于 6月 18, 2018

This function is also necessary to implement nft tproxy support

Fixes: 45ca4e0c ("netfilter: Libify xt_TPROXY")
Signed-off-by: NMáté Eckl <ecklm94@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

f286586d

netfilter: flowtables: use fixed renew timeout on teardown · e97d9404

由 Florian Westphal 提交于 6月 15, 2018

This is one of the very few external callers of ->get_timeouts(),

We can use a fixed timeout instead, conntrack core will refresh this in
case a new packet comes within this period.

Use of ESTABLISHED timeout seems way too huge anyway.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

e97d9404

netfilter: nft_reject_bridge: remove unnecessary ttl set · 6542df2f

由 Taehee Yoo 提交于 6月 12, 2018

In the nft_reject_br_send_v4_tcp_reset(), a ttl is set by the
nf_reject_iphdr_put(). so, below code is unnecessary.
Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

6542df2f

tls: Fix zerocopy_from_iter iov handling · 47187998

由 Boris Pismenny 提交于 7月 13, 2018

zerocopy_from_iter iterates over the message, but it doesn't revert the
updates made by the iov iteration. This patch fixes it. Now, the iov can
be used after calling zerocopy_from_iter.

Fixes: 3c4d7559 ("tls: kernel TLS support")
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

47187998

tls: Add rx inline crypto offload · 4799ac81

由 Boris Pismenny 提交于 7月 13, 2018

This patch completes the generic infrastructure to offload TLS crypto to a
network device. It enables the kernel to skip decryption and
authentication of some skbs marked as decrypted by the NIC. In the fast
path, all packets received are decrypted by the NIC and the performance
is comparable to plain TCP.

This infrastructure doesn't require a TCP offload engine. Instead, the
NIC only decrypts packets that contain the expected TCP sequence number.
Out-Of-Order TCP packets are provided unmodified. As a result, at the
worst case a received TLS record consists of both plaintext and ciphertext
packets. These partially decrypted records must be reencrypted,
only to be decrypted.

The notable differences between SW KTLS Rx and this offload are as
follows:
1. Partial decryption - Software must handle the case of a TLS record
that was only partially decrypted by HW. This can happen due to packet
reordering.
2. Resynchronization - tls_read_size calls the device driver to
resynchronize HW after HW lost track of TLS record framing in
the TCP stream.
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4799ac81

tls: Fill software context without allocation · b190a587

由 Boris Pismenny 提交于 7月 13, 2018

This patch allows tls_set_sw_offload to fill the context in case it was
already allocated previously.

We will use it in TLS_DEVICE to fill the RX software context.
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b190a587

tls: Split tls_sw_release_resources_rx · 39f56e1a

由 Boris Pismenny 提交于 7月 13, 2018

This patch splits tls_sw_release_resources_rx into two functions one
which releases all inner software tls structures and another that also
frees the containing structure.

In TLS_DEVICE we will need to release the software structures without
freeeing the containing structure, which contains other information.
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

39f56e1a

tls: Split decrypt_skb to two functions · dafb67f3

由 Boris Pismenny 提交于 7月 13, 2018

Previously, decrypt_skb also updated the TLS context.
Now, decrypt_skb only decrypts the payload using the current context,
while decrypt_skb_update also updates the state.

Later, in the tls_device Rx flow, we will use decrypt_skb directly.
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dafb67f3

tls: Refactor tls_offload variable names · d80a1b9d

由 Boris Pismenny 提交于 7月 13, 2018

For symmetry, we rename tls_offload_context to
tls_offload_context_tx before we add tls_offload_context_rx.
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d80a1b9d

tcp: Don't coalesce decrypted and encrypted SKBs · 41ed9c04

由 Boris Pismenny 提交于 7月 13, 2018

Prevent coalescing of decrypted and encrypted SKBs in GRO
and TCP layer.
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

41ed9c04

net: Add TLS RX offload feature · 14136564

由 Ilya Lesokhin 提交于 7月 13, 2018

This patch adds a netdev feature to configure TLS RX inline crypto offload.
Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com>
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

14136564

net: Add decrypted field to skb · 784abe24

由 Boris Pismenny 提交于 7月 13, 2018

The decrypted bit is propogated to cloned/copied skbs.
This will be used later by the inline crypto receive side offload
of tls.
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

784abe24

15 7月, 2018 2 次提交

bpf: Add BPF_SOCK_OPS_TCP_LISTEN_CB · f333ee0c

由 Andrey Ignatov 提交于 7月 11, 2018

Add new TCP-BPF callback that is called on listen(2) right after socket
transition to TCP_LISTEN state.

It fills the gap for listening sockets in TCP-BPF. For example BPF
program can set BPF_SOCK_OPS_STATE_CB_FLAG when socket becomes listening
and track later transition from TCP_LISTEN to TCP_CLOSE with
BPF_SOCK_OPS_STATE_CB callback.

Before there was no way to do it with TCP-BPF and other options were
much harder to work with. E.g. socket state tracking can be done with
tracepoints (either raw or regular) but they can't be attached to cgroup
and their lifetime has to be managed separately.
Signed-off-by: NAndrey Ignatov <rdna@fb.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

f333ee0c

tcp: remove redundant rcv_nxt update · ff0432e5

由 Yafang Shao 提交于 7月 14, 2018

tcp_rcv_nxt_update() is already executed in tcp_data_queue().
This line is redundant.

See bellow,
	tcp_queue_rcv
		tcp_rcv_nxt_update(tcp_sk(sk), TCP_SKB_CB(skb)->end_seq);
	tcp_rcv_nxt_update(tp, TCP_SKB_CB(skb)->end_seq); <<<< redundant
Signed-off-by: NYafang Shao <laoar.shao@gmail.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff0432e5

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功