提交 · af4d768ad28cbf6542ba70dba10b49127b31b762 · openanolis / cloud-kernel

29 5月, 2018 2 次提交

net/ipv4: Add support for specifying metric of connected routes · af4d768a

由 David Ahern 提交于 5月 27, 2018

Add support for IFA_RT_PRIORITY to ipv4 addresses.

If the metric is changed on an existing address then the new route
is inserted before removing the old one. Since the metric is one
of the route keys, the prefix route can not be replaced.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

af4d768a

net: bpfilter: make function bpfilter_mbox_request() static · 77ab8d5d

由 Wei Yongjun 提交于 5月 26, 2018

Fixes the following sparse warnings:

net/ipv4/bpfilter/sockopt.c:13:5: warning:
 symbol 'bpfilter_mbox_request' was not declared. Should it be static?
Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77ab8d5d

25 5月, 2018 3 次提交

net/ipv4: Remove tracepoint in fib_validate_source · c949cbbb

由 David Ahern 提交于 5月 23, 2018

Tracepoint does not add value and the call to fib_lookup follows
it which shows the same information and the fib lookup result.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c949cbbb

net/ipv4: Udate fib_table_lookup tracepoint · 9f323973

由 David Ahern 提交于 5月 23, 2018

Commit 4a2d73a4 ("ipv4: fib_rules: support match on sport, dport
and ip proto") added support for protocol and ports to FIB rules.
Update the FIB lookup tracepoint to dump the parameters.

In addition, make the IPv4 tracepoint similar to the IPv6 one where
the lookup parameters and result are dumped in 1 event. It is much
easier to use and understand the outcome of the lookup.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f323973

ipv4: remove warning in ip_recv_error · 730c54d5

由 Willem de Bruijn 提交于 5月 23, 2018

A precondition check in ip_recv_error triggered on an otherwise benign
race. Remove the warning.

The warning triggers when passing an ipv6 socket to this ipv4 error
handling function. RaceFuzzer was able to trigger it due to a race
in setsockopt IPV6_ADDRFORM.

  ---
  CPU0
    do_ipv6_setsockopt
      sk->sk_socket->ops = &inet_dgram_ops;

  ---
  CPU1
    sk->sk_prot->recvmsg
      udp_recvmsg
        ip_recv_error
          WARN_ON_ONCE(sk->sk_family == AF_INET6);

  ---
  CPU0
    do_ipv6_setsockopt
      sk->sk_family = PF_INET;

This socket option converts a v6 socket that is connected to a v4 peer
to an v4 socket. It updates the socket on the fly, changing fields in
sk as well as other structs. This is inherently non-atomic. It races
with the lockless udp_recvmsg path.

No other code makes an assumption that these fields are updated
atomically. It is benign here, too, as ip_recv_error cares only about
the protocol of the skbs enqueued on the error queue, for which
sk_family is not a precise predictor (thanks to another isue with
IPV6_ADDRFORM).

Link: http://lkml.kernel.org/r/20180518120826.GA19515@dragonet.kaist.ac.kr
Fixes: 7ce875e5 ("ipv4: warn once on passing AF_INET6 socket to ip_recv_error")
Reported-by: NDaeRyong Jeong <threeearcat@gmail.com>
Suggested-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

730c54d5

24 5月, 2018 4 次提交

ipv4: support sport, dport and ip_proto in RTM_GETROUTE · 404eb77e

由 Roopa Prabhu 提交于 5月 22, 2018

This is a followup to fib rules sport, dport and ipproto
match support. Only supports tcp, udp and icmp for ipproto.
Used by fib rule self tests.
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

404eb77e

net: ipv4: add missing RTA_TABLE to rtm_ipv4_policy · 2eabd764

由 Roopa Prabhu 提交于 5月 22, 2018

Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2eabd764

udp: exclude gso from xfrm paths · ff06342c

由 Willem de Bruijn 提交于 5月 22, 2018

UDP GSO delays final datagram construction to the GSO layer. This
conflicts with protocol transformations.

Fixes: bec1f6f6 ("udp: generate gso with UDP_SEGMENT")
CC: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff06342c

net: add skeleton of bpfilter kernel module · d2ba09c1

由 Alexei Starovoitov 提交于 5月 21, 2018

bpfilter.ko consists of bpfilter_kern.c (normal kernel module code)
and user mode helper code that is embedded into bpfilter.ko

The steps to build bpfilter.ko are the following:
- main.c is compiled by HOSTCC into the bpfilter_umh elf executable file
- with quite a bit of objcopy and Makefile magic the bpfilter_umh elf file
  is converted into bpfilter_umh.o object file
  with _binary_net_bpfilter_bpfilter_umh_start and _end symbols
  Example:
  $ nm ./bld_x64/net/bpfilter/bpfilter_umh.o
  0000000000004cf8 T _binary_net_bpfilter_bpfilter_umh_end
  0000000000004cf8 A _binary_net_bpfilter_bpfilter_umh_size
  0000000000000000 T _binary_net_bpfilter_bpfilter_umh_start
- bpfilter_umh.o and bpfilter_kern.o are linked together into bpfilter.ko

bpfilter_kern.c is a normal kernel module code that calls
the fork_usermode_blob() helper to execute part of its own data
as a user mode process.

Notice that _binary_net_bpfilter_bpfilter_umh_start - end
is placed into .init.rodata section, so it's freed as soon as __init
function of bpfilter.ko is finished.
As part of __init the bpfilter.ko does first request/reply action
via two unix pipe provided by fork_usermode_blob() helper to
make sure that umh is healthy. If not it will kill it via pid.

Later bpfilter_process_sockopt() will be called from bpfilter hooks
in get/setsockopt() to pass iptable commands into umh via bpfilter.ko

If admin does 'rmmod bpfilter' the __exit code bpfilter.ko will
kill umh as well.
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d2ba09c1

23 5月, 2018 7 次提交

netfilter: nf_nat: add nat type hooks to nat core · 9971a514

由 Florian Westphal 提交于 5月 14, 2018

Currently the packet rewrite and instantiation of nat NULL bindings
happens from the protocol specific nat backend.

Invocation occurs either via ip(6)table_nat or the nf_tables nat chain type.

Invocation looks like this (simplified):
NF_HOOK()
   |
   `---iptable_nat
	 |
	 `---> nf_nat_l3proto_ipv4 -> nf_nat_packet
	               |
          new packet? pass skb though iptables nat chain
                       |
		       `---> iptable_nat: ipt_do_table

In nft case, this looks the same (nft_chain_nat_ipv4 instead of
iptable_nat).

This is a problem for two reasons:
1. Can't use iptables nat and nf_tables nat at the same time,
   as the first user adds a nat binding (nf_nat_l3proto_ipv4 adds a
   NULL binding if do_table() did not find a matching nat rule so we
   can detect post-nat tuple collisions).
2. If you use e.g. nft_masq, snat, redir, etc. uses must also register
   an empty base chain so that the nat core gets called fro NF_HOOK()
   to do the reverse translation, which is neither obvious nor user
   friendly.

After this change, the base hook gets registered not from iptable_nat or
nftables nat hooks, but from the l3 nat core.

iptables/nft nat base hooks get registered with the nat core instead:

NF_HOOK()
   |
   `---> nf_nat_l3proto_ipv4 -> nf_nat_packet
		|
         new packet? pass skb through iptables/nftables nat chains
                |
		+-> iptables_nat: ipt_do_table
	        +-> nft nat chain x
	        `-> nft nat chain y

The nat core deals with null bindings and reverse translation.
When no mapping exists, it calls the registered nat lookup hooks until
one creates a new mapping.
If both iptables and nftables nat hooks exist, the first matching
one is used (i.e., higher priority wins).

Also, nft users do not need to create empty nat hooks anymore,
nat core always registers the base hooks that take care of reverse/reply
translation.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

9971a514

netfilter: nf_tables: allow chain type to override hook register · 4e25ceb8

由 Florian Westphal 提交于 5月 14, 2018

Will be used in followup patch when nat types no longer
use nf_register_net_hook() but will instead register with the nat core.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

4e25ceb8

netfilter: xtables: allow table definitions not backed by hook_ops · ba7d284a

由 Florian Westphal 提交于 5月 14, 2018

The ip(6)tables nat table is currently receiving skbs from the netfilter
core, after a followup patch skbs will be coming from the netfilter nat
core instead, so the table is no longer backed by normal hook_ops.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

ba7d284a

netfilter: nf_nat: move common nat code to nat core · 1f55236b

由 Florian Westphal 提交于 5月 14, 2018

Copy-pasted, both l3 helpers almost use same code here.
Split out the common part into an 'inet' helper.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

1f55236b

tcp: do not aggressively quick ack after ECN events · 522040ea

由 Eric Dumazet 提交于 5月 21, 2018

ECN signals currently forces TCP to enter quickack mode for
up to 16 (TCP_MAX_QUICKACKS) following incoming packets.

We believe this is not needed, and only sending one immediate ack
for the current packet should be enough.

This should reduce the extra load noticed in DCTCP environments,
after congestion events.

This is part 2 of our effort to reduce pure ACK packets.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
Acked-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

522040ea

tcp: add max_quickacks param to tcp_incr_quickack and tcp_enter_quickack_mode · 9a9c9b51

由 Eric Dumazet 提交于 5月 21, 2018

We want to add finer control of the number of ACK packets sent after
ECN events.

This patch is not changing current behavior, it only enables following
change.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a9c9b51

ipmr: properly check rhltable_init() return value · 66fb3325

由 Eric Dumazet 提交于 5月 21, 2018

commit 8fb472c0 ("ipmr: improve hash scalability")
added a call to rhltable_init() without checking its return value.

This problem was then later copied to IPv6 and factorized in commit
0bbbf0e7 ("ipmr, ip6mr: Unite creation of new mr_table")

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 31552 Comm: syz-executor7 Not tainted 4.17.0-rc5+ #60
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:rht_key_hashfn include/linux/rhashtable.h:277 [inline]
RIP: 0010:__rhashtable_lookup include/linux/rhashtable.h:630 [inline]
RIP: 0010:rhltable_lookup include/linux/rhashtable.h:716 [inline]
RIP: 0010:mr_mfc_find_parent+0x2ad/0xbb0 net/ipv4/ipmr_base.c:63
RSP: 0018:ffff8801826aef70 EFLAGS: 00010203
RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffc90001ea0000
RDX: 0000000000000079 RSI: ffffffff8661e859 RDI: 000000000000000c
RBP: ffff8801826af1c0 R08: ffff8801b2212000 R09: ffffed003b5e46c2
R10: ffffed003b5e46c2 R11: ffff8801daf23613 R12: dffffc0000000000
R13: ffff8801826af198 R14: ffff8801cf8225c0 R15: ffff8801826af658
FS:  00007ff7fa732700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000003ffffff9c CR3: 00000001b0210000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 ip6mr_cache_find_parent net/ipv6/ip6mr.c:981 [inline]
 ip6mr_mfc_delete+0x1fe/0x6b0 net/ipv6/ip6mr.c:1221
 ip6_mroute_setsockopt+0x15c6/0x1d70 net/ipv6/ip6mr.c:1698
 do_ipv6_setsockopt.isra.9+0x422/0x4660 net/ipv6/ipv6_sockglue.c:163
 ipv6_setsockopt+0xbd/0x170 net/ipv6/ipv6_sockglue.c:922
 rawv6_setsockopt+0x59/0x140 net/ipv6/raw.c:1060
 sock_common_setsockopt+0x9a/0xe0 net/core/sock.c:3039
 __sys_setsockopt+0x1bd/0x390 net/socket.c:1903
 __do_sys_setsockopt net/socket.c:1914 [inline]
 __se_sys_setsockopt net/socket.c:1911 [inline]
 __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1911
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 8fb472c0 ("ipmr: improve hash scalability")
Fixes: 0bbbf0e7 ("ipmr, ip6mr: Unite creation of new mr_table")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Cc: Yuval Mintz <yuvalm@mellanox.com>
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66fb3325

22 5月, 2018 1 次提交

net/ipv4: Add helper to return path MTU based on fib result · 50d889b1

由 David Ahern 提交于 5月 21, 2018

Determine path MTU from a FIB lookup result. Logic is a distillation of
ip_dst_mtu_maybe_forward.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

50d889b1

19 5月, 2018 1 次提交

tcp: tcp_rack_reo_wnd() can be static · 1f7455c3

由 kbuild test robot 提交于 5月 18, 2018

Fixes: 20b654df ("tcp: support DUPACK threshold in RACK")
Signed-off-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f7455c3

18 5月, 2018 16 次提交

tcp: add tcp_comp_sack_nr sysctl · 9c21d2fc