提交 · 8f4ef499c6ca56325c76600777237deaeb58a69b · openeuler / Kernel

27 2月, 2019 5 次提交

tcp: remove tcp_queue argument from tso_fragment() · 56483341

由 Eric Dumazet 提交于 2月 26, 2019

tso_fragment() is only called for packets still in write queue.

Remove the tcp_queue parameter to make this more obvious,
even if the comment clearly states this.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56483341

tcp: use tcp_md5_needed for timewait sockets · 6aedbf98

由 Eric Dumazet 提交于 2月 26, 2019

This might speedup tcp_twsk_destructor() a bit,
avoiding a cache line miss.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6aedbf98

tcp: convert tcp_md5_needed to static_branch API · 921f9a0f

由 Eric Dumazet 提交于 2月 26, 2019

We prefer static_branch_unlikely() over static_key_false() these days.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

921f9a0f

tcp: get rid of tcp_check_send_head() · 6c7b4ee7

由 Eric Dumazet 提交于 2月 26, 2019

This helper is used only once, and its name is no longer relevant.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c7b4ee7

net: remove unused struct inet_frag_queue.fragments field · d8cf757f

由 Peter Oskolkov 提交于 2月 25, 2019

Now that all users of struct inet_frag_queue have been converted
to use 'rb_fragments', remove the unused 'fragments' field.

Build with `make allyesconfig` succeeded. ip_defrag selftest passed.
Signed-off-by: NPeter Oskolkov <posk@google.com>
Acked-by: NStefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d8cf757f

26 2月, 2019 2 次提交

tcp: clean up SOCK_DEBUG() · 9946b341

由 Yafang Shao 提交于 2月 25, 2019

Per discussion with Daniel[1] and Eric[2], these SOCK_DEBUG() calles in
TCP are not needed now.
We'd better clean up it.

[1] https://patchwork.ozlabs.org/patch/1035573/
[2] https://patchwork.ozlabs.org/patch/1040533/Signed-off-by: NYafang Shao <laoar.shao@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9946b341

tcp: remove unused parameter of tcp_sacktag_bsearch() · 4bfabc46

由 Taehee Yoo 提交于 2月 25, 2019

parameter state in the tcp_sacktag_bsearch() is not used.
So, it can be removed.
Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4bfabc46

25 2月, 2019 3 次提交

ip_tunnel: Add ip tunnel tun_info type dst_cache in ip_tunnel_xmit · 186d9366

由 wenxu 提交于 2月 24, 2019

ip l add dev tun type gretap key 1000

Non-tunnel-dst ip tunnel device can send packet through lwtunnel
This patch provide the tun_inf dst cache support for this mode.
Signed-off-by: Nwenxu <wenxu@ucloud.cn>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

186d9366

ip_tunnel: Add dst_cache support in lwtunnel_state of ip tunnel · 3d25eabb

由 wenxu 提交于 2月 23, 2019

The lwtunnel_state is not init the dst_cache Which make the
ip_md_tunnel_xmit can't use the dst_cache. It will lookup
route table every packets.
Signed-off-by: Nwenxu <wenxu@ucloud.cn>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d25eabb

ipv4: icmp: use icmp_sk_exit() · e9128c14

由 Kefeng Wang 提交于 2月 23, 2019

Simply use icmp_sk_exit() when inet_ctl_sock_create() fail in icmp_sk_init().
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e9128c14

24 2月, 2019 1 次提交

tcp: repaired skbs must init their tso_segs · bf50b606

由 Eric Dumazet 提交于 2月 23, 2019

syzbot reported a WARN_ON(!tcp_skb_pcount(skb))
in tcp_send_loss_probe() [1]

This was caused by TCP_REPAIR sent skbs that inadvertenly
were missing a call to tcp_init_tso_segs()

[1]
WARNING: CPU: 1 PID: 0 at net/ipv4/tcp_output.c:2534 tcp_send_loss_probe+0x771/0x8a0 net/ipv4/tcp_output.c:2534
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-rc7+ #77
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 panic+0x2cb/0x65c kernel/panic.c:214
 __warn.cold+0x20/0x45 kernel/panic.c:571
 report_bug+0x263/0x2b0 lib/bug.c:186
 fixup_bug arch/x86/kernel/traps.c:178 [inline]
 fixup_bug arch/x86/kernel/traps.c:173 [inline]
 do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
 do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:290
 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
RIP: 0010:tcp_send_loss_probe+0x771/0x8a0 net/ipv4/tcp_output.c:2534
Code: 88 fc ff ff 4c 89 ef e8 ed 75 c8 fb e9 c8 fc ff ff e8 43 76 c8 fb e9 63 fd ff ff e8 d9 75 c8 fb e9 94 f9 ff ff e8 bf 03 91 fb <0f> 0b e9 7d fa ff ff e8 b3 03 91 fb 0f b6 1d 37 43 7a 03 31 ff 89
RSP: 0018:ffff8880ae907c60 EFLAGS: 00010206
RAX: ffff8880a989c340 RBX: 0000000000000000 RCX: ffffffff85dedbdb
RDX: 0000000000000100 RSI: ffffffff85dee0b1 RDI: 0000000000000005
RBP: ffff8880ae907c90 R08: ffff8880a989c340 R09: ffffed10147d1ae1
R10: ffffed10147d1ae0 R11: ffff8880a3e8d703 R12: ffff888091b90040
R13: ffff8880a3e8d540 R14: 0000000000008000 R15: ffff888091b90860
 tcp_write_timer_handler+0x5c0/0x8a0 net/ipv4/tcp_timer.c:583
 tcp_write_timer+0x10e/0x1d0 net/ipv4/tcp_timer.c:607
 call_timer_fn+0x190/0x720 kernel/time/timer.c:1325
 expire_timers kernel/time/timer.c:1362 [inline]
 __run_timers kernel/time/timer.c:1681 [inline]
 __run_timers kernel/time/timer.c:1649 [inline]
 run_timer_softirq+0x652/0x1700 kernel/time/timer.c:1694
 __do_softirq+0x266/0x95a kernel/softirq.c:292
 invoke_softirq kernel/softirq.c:373 [inline]
 irq_exit+0x180/0x1d0 kernel/softirq.c:413
 exiting_irq arch/x86/include/asm/apic.h:536 [inline]
 smp_apic_timer_interrupt+0x14a/0x570 arch/x86/kernel/apic/apic.c:1062
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807
 </IRQ>
RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58
Code: ff ff ff 48 89 c7 48 89 45 d8 e8 59 0c a1 fa 48 8b 45 d8 e9 ce fe ff ff 48 89 df e8 48 0c a1 fa eb 82 90 90 90 90 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
RSP: 0018:ffff8880a98afd78 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
RAX: 1ffffffff1125061 RBX: ffff8880a989c340 RCX: 0000000000000000
RDX: dffffc0000000000 RSI: 0000000000000001 RDI: ffff8880a989cbbc
RBP: ffff8880a98afda8 R08: ffff8880a989c340 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
R13: ffffffff889282f8 R14: 0000000000000001 R15: 0000000000000000
 arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:555
 default_idle_call+0x36/0x90 kernel/sched/idle.c:93
 cpuidle_idle_call kernel/sched/idle.c:153 [inline]
 do_idle+0x386/0x570 kernel/sched/idle.c:262
 cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:353
 start_secondary+0x404/0x5c0 arch/x86/kernel/smpboot.c:271
 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243
Kernel Offset: disabled
Rebooting in 86400 seconds..

Fixes: 79861919 ("tcp: fix TCP_REPAIR xmit queue setup")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bf50b606

23 2月, 2019 1 次提交

udp: fix possible user after free in error handler · 92b95364

由 Paolo Abeni 提交于 2月 21, 2019

Similar to the previous commit, this addresses the same issue for
ipv4: use a single fetch operation and use the correct rcu
annotation.

Fixes: e7cc0824 ("udp: Support for error handlers of tunnels with arbitrary destination port")
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NStefano Brivio <sbrivio@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

92b95364

22 2月, 2019 3 次提交

net: ip_gre: do not report erspan_ver for gre or gretap · 2bdf700e

由 Lorenzo Bianconi 提交于 2月 19, 2019

Report erspan version field to userspace in ipgre_fill_info just for
erspan tunnels. The issue can be triggered with the following reproducer:

$ip link add name gre1 type gre local 192.168.0.1 remote 192.168.1.1
$ip link set dev gre1 up
$ip -d link sh gre1
13: gre1@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1476 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/gre 192.168.0.1 peer 192.168.1.1 promiscuity 0 minmtu 0 maxmtu 0
    gre remote 192.168.1.1 local 192.168.0.1 ttl inherit erspan_ver 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1

Fixes: f551c91d ("net: erspan: introduce erspan v2 for ip_gre")
Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2bdf700e

net: remove unneeded switch fall-through · a2b5a3fa

由 Li RongQing 提交于 2月 19, 2019

This case block has been terminated by a return, so not need
a switch fall-through
Signed-off-by: NLi RongQing <lirongqing@baidu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a2b5a3fa

ipmr: ip6mr: Create new sockopt to clear mfc cache or vifs · ca8d4794

由 Callum Sinclair 提交于 2月 18, 2019

Currently the only way to clear the forwarding cache was to delete the
entries one by one using the MRT_DEL_MFC socket option or to destroy and
recreate the socket.

Create a new socket option which with the use of optional flags can
clear any combination of multicast entries (static or not static) and
multicast vifs (static or not static).

Calling the new socket option MRT_FLUSH with the flags MRT_FLUSH_MFC and
MRT_FLUSH_VIFS will clear all entries and vifs on the socket except for
static entries.
Signed-off-by: NCallum Sinclair <callum.sinclair@alliedtelesis.co.nz>
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ca8d4794

21 2月, 2019 1 次提交

gso: validate gso_type on ipip style tunnels · 418e897e

由 Willem de Bruijn 提交于 2月 20, 2019

Commit 121d57af ("gso: validate gso_type in GSO handlers") added
gso_type validation to existing gso_segment callback functions, to
filter out illegal and potentially dangerous SKB_GSO_DODGY packets.

Convert tunnels that now call inet_gso_segment and ipv6_gso_segment
directly to have their own callbacks and extend validation to these.
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

418e897e

18 2月, 2019 2 次提交

tcp: tcp_v4_err() should be more careful · 2c4cc971

由 Eric Dumazet 提交于 2月 15, 2019

ICMP handlers are not very often stressed, we should
make them more resilient to bugs that might surface in
the future.

If there is no packet in retransmit queue, we should
avoid a NULL deref.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: Nsoukjin bae <soukjin.bae@samsung.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2c4cc971

tcp: clear icsk_backoff in tcp_write_queue_purge() · 04c03114

由 Eric Dumazet 提交于 2月 15, 2019

soukjin bae reported a crash in tcp_v4_err() handling
ICMP_DEST_UNREACH after tcp_write_queue_head(sk)
returned a NULL pointer.

Current logic should have prevented this :

  if (seq != tp->snd_una  || !icsk->icsk_retransmits ||
      !icsk->icsk_backoff || fastopen)
      break;

Problem is the write queue might have been purged
and icsk_backoff has not been cleared.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: Nsoukjin bae <soukjin.bae@samsung.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04c03114

16 2月, 2019 1 次提交

netfilter: ipt_CLUSTERIP: make symbol 'cip_netdev_notifier' static · dddaf89e

由 Wei Yongjun 提交于 2月 16, 2019

Fixes the following sparse warnings:

net/ipv4/netfilter/ipt_CLUSTERIP.c:867:23: warning:
 symbol 'cip_netdev_notifier' was not declared. Should it be static?

Fixes: 5a86d68b ("netfilter: ipt_CLUSTERIP: fix deadlock in netns exit routine")
Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

dddaf89e

13 2月, 2019 2 次提交

netfilter: reject: skip csum verification for protocols that don't support it · 7fc38225

由 Alin Nastac 提交于 2月 13, 2019

Some protocols have other means to verify the payload integrity
(AH, ESP, SCTP) while others are incompatible with nf_ip(6)_checksum
implementation because checksum is either optional or might be
partial (UDPLITE, DCCP, GRE). Because nf_ip(6)_checksum was used
to validate the packets, ip(6)tables REJECT rules were not capable
to generate ICMP(v6) errors for the protocols mentioned above.

This commit also fixes the incorrect pseudo-header protocol used
for IPv4 packets that carry other transport protocols than TCP or
UDP (pseudo-header used protocol 0 iso the proper value).
Signed-off-by: NAlin Nastac <alin.nastac@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

7fc38225

inet_diag: fix reporting cgroup classid and fallback to priority · 1ec17dbd

由 Konstantin Khlebnikov 提交于 2月 09, 2019

Field idiag_ext in struct inet_diag_req_v2 used as bitmap of requested
extensions has only 8 bits. Thus extensions starting from DCTCPINFO
cannot be requested directly. Some of them included into response
unconditionally or hook into some of lower 8 bits.

Extension INET_DIAG_CLASS_ID has not way to request from the beginning.

This patch bundle it with INET_DIAG_TCLASS (ipv6 tos), fixes space
reservation, and documents behavior for other extensions.

Also this patch adds fallback to reporting socket priority. This filed
is more widely used for traffic classification because ipv4 sockets
automatically maps TOS to priority and default qdisc pfifo_fast knows
about that. But priority could be changed via setsockopt SO_PRIORITY so
INET_DIAG_TOS isn't enough for predicting class.

Also cgroup2 obsoletes net_cls classid (it always zero), but we cannot
reuse this field for reporting cgroup2 id because it is 64-bit (ino+gen).

So, after this patch INET_DIAG_CLASS_ID will report socket priority
for most common setup when net_cls isn't set and/or cgroup2 in use.

Fixes: 0888e372 ("net: inet: diag: expose sockets cgroup classid")
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ec17dbd

12 2月, 2019 2 次提交

netfilter: nat: fix spurious connection timeouts · 8303b7e8

由 Florian Westphal 提交于 2月 08, 2019

Sander Eikelenboom bisected a NAT related regression down
to the l4proto->manip_pkt indirection removal.

I forgot that ICMP(v6) errors (e.g. PKTTOOBIG) can be set as related
to the existing conntrack entry.

Therefore, when passing the skb to nf_nat_ipv4/6_manip_pkt(), that
ended up calling the wrong l4 manip function, as tuple->dst.protonum
is the original flows l4 protocol (TCP, UDP, etc).

Set the dst protocol field to ICMP(v6), we already have a private copy
of the tuple due to the inversion of src/dst.
Reported-by: NSander Eikelenboom <linux@eikelenboom.it>
Tested-by: NSander Eikelenboom <linux@eikelenboom.it>
Fixes: faec18db ("netfilter: nat: remove l4proto->manip_pkt")
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

8303b7e8

netfilter: nf_nat_snmp_basic: add missing length checks in ASN.1 cbs · c4c07b4d

由 Jann Horn 提交于 2月 06, 2019

The generic ASN.1 decoder infrastructure doesn't guarantee that callbacks
will get as much data as they expect; callbacks have to check the `datalen`
parameter before looking at `data`. Make sure that snmp_version() and
snmp_helper() don't read/write beyond the end of the packet data.

(Also move the assignment to `pdata` down below the check to make it clear
that it isn't necessarily a pointer we can use before the `datalen` check.)

Fixes: cc2d5863 ("netfilter: nf_nat_snmp_basic: use asn1 decoder library")
Signed-off-by: NJann Horn <jannh@google.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

c4c07b4d

09 2月, 2019 1 次提交

net: ipv4: use a dedicated counter for icmp_v4 redirect packets · c09551c6

由 Lorenzo Bianconi 提交于 2月 06, 2019

According to the algorithm described in the comment block at the
beginning of ip_rt_send_redirect, the host should try to send
'ip_rt_redirect_number' ICMP redirect packets with an exponential
backoff and then stop sending them at all assuming that the destination
ignores redirects.
If the device has previously sent some ICMP error packets that are
rate-limited (e.g TTL expired) and continues to receive traffic,
the redirect packets will never be transmitted. This happens since
peer->rate_tokens will be typically greater than 'ip_rt_redirect_number'
and so it will never be reset even if the redirect silence timeout
(ip_rt_redirect_silence) has elapsed without receiving any packet
requiring redirects.

Fix it by using a dedicated counter for the number of ICMP redirect
packets that has been sent by the host

I have not been able to identify a given commit that introduced the
issue since ip_rt_send_redirect implements the same rate-limiting
algorithm from commit 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c09551c6

07 2月, 2019 2 次提交

net: Get rid of SWITCHDEV_ATTR_ID_PORT_PARENT_ID · bccb3025

由 Florian Fainelli 提交于 2月 06, 2019

Now that we have a dedicated NDO for getting a port's parent ID, get rid
of SWITCHDEV_ATTR_ID_PORT_PARENT_ID and convert all callers to use the
NDO exclusively. This is a preliminary change to getting rid of
switchdev_ops eventually.
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bccb3025

net: Introduce ndo_get_port_parent_id() · d6abc596

由 Florian Fainelli 提交于 2月 06, 2019

In preparation for getting rid of switchdev_ops, create a dedicated NDO
operation for getting the port's parent identifier. There are
essentially two classes of drivers that need to implement getting the
port's parent ID which are VF/PF drivers with a built-in switch, and
pure switchdev drivers such as mlxsw, ocelot, dsa etc.

We introduce a helper function: dev_get_port_parent_id() which supports
recursion into the lower devices to obtain the first port's parent ID.

Convert the bridge, core and ipv4 multicast routing code to check for
such ndo_get_port_parent_id() and call the helper function when valid
before falling back to switchdev_port_attr_get(). This will allow us to
convert all relevant drivers in one go instead of having to implement
both switchdev_port_attr_get() and ndo_get_port_parent_id() operations,
then get rid of switchdev_port_attr_get().
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d6abc596

04 2月, 2019 6 次提交

net: Fix ip_mc_{dec,inc}_group allocation context · 9fb20801

由 Florian Fainelli 提交于 2月 01, 2019

After 4effd28c ("bridge: join all-snoopers multicast address"), I
started seeing the following sleep in atomic warnings:

[   26.763893] BUG: sleeping function called from invalid context at mm/slab.h:421
[   26.771425] in_atomic(): 1, irqs_disabled(): 0, pid: 1658, name: sh
[   26.777855] INFO: lockdep is turned off.
[   26.781916] CPU: 0 PID: 1658 Comm: sh Not tainted 5.0.0-rc4 #20
[   26.787943] Hardware name: BCM97278SV (DT)
[   26.792118] Call trace:
[   26.794645]  dump_backtrace+0x0/0x170
[   26.798391]  show_stack+0x24/0x30
[   26.801787]  dump_stack+0xa4/0xe4
[   26.805182]  ___might_sleep+0x208/0x218
[   26.809102]  __might_sleep+0x78/0x88
[   26.812762]  kmem_cache_alloc_trace+0x64/0x28c
[   26.817301]  igmp_group_dropped+0x150/0x230
[   26.821573]  ip_mc_dec_group+0x1b0/0x1f8
[   26.825585]  br_ip4_multicast_leave_snoopers.isra.11+0x174/0x190
[   26.831704]  br_multicast_toggle+0x78/0xcc
[   26.835887]  store_bridge_parm+0xc4/0xfc
[   26.839894]  multicast_snooping_store+0x3c/0x4c
[   26.844517]  dev_attr_store+0x44/0x5c
[   26.848262]  sysfs_kf_write+0x50/0x68
[   26.852006]  kernfs_fop_write+0x14c/0x1b4
[   26.856102]  __vfs_write+0x60/0x190
[   26.859668]  vfs_write+0xc8/0x168
[   26.863059]  ksys_write+0x70/0xc8
[   26.866449]  __arm64_sys_write+0x24/0x30
[   26.870458]  el0_svc_common+0xa0/0x11c
[   26.874291]  el0_svc_handler+0x38/0x70
[   26.878120]  el0_svc+0x8/0xc

while toggling the bridge's multicast_snooping attribute dynamically.

Pass a gfp_t down to igmpv3_add_delrec(), introduce
__igmp_group_dropped() and introduce __ip_mc_dec_group() to take a gfp_t
argument.

Similarly introduce ____ip_mc_inc_group() and __ip_mc_inc_group() to
allow caller to specify gfp_t.

IPv6 part of the patch appears fine.

Fixes: 4effd28c ("bridge: join all-snoopers multicast address")
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9fb20801

socket: Add SO_TIMESTAMPING_NEW · 9718475e

由 Deepa Dinamani 提交于 2月 02, 2019

Add SO_TIMESTAMPING_NEW variant of socket timestamp options.
This is the y2038 safe versions of the SO_TIMESTAMPING_OLD
for all architectures.
Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Cc: chris@zankel.net
Cc: fenghua.yu@intel.com
Cc: rth@twiddle.net
Cc: tglx@linutronix.de
Cc: ubraun@linux.ibm.com
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-s390@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9718475e

socket: Add SO_TIMESTAMP[NS]_NEW · 887feae3

由 Deepa Dinamani 提交于 2月 02, 2019

Add SO_TIMESTAMP_NEW and SO_TIMESTAMPNS_NEW variants of
socket timestamp options.
These are the y2038 safe versions of the SO_TIMESTAMP_OLD
and SO_TIMESTAMPNS_OLD for all architectures.

Note that the format of scm_timestamping.ts[0] is not changed
in this patch.
Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Cc: jejb@parisc-linux.org
Cc: ralf@linux-mips.org
Cc: rth@twiddle.net
Cc: linux-alpha@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-parisc@vger.kernel.org
Cc: linux-rdma@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

887feae3

socket: Use old_timeval types for socket timestamps · 13c6ee2a

由 Deepa Dinamani 提交于 2月 02, 2019

As part of y2038 solution, all internal uses of
struct timeval are replaced by struct __kernel_old_timeval
and struct compat_timeval by struct old_timeval32.
Make socket timestamps use these new types.

This is mainly to be able to verify that the kernel build
is y2038 safe when such non y2038 safe types are not
supported anymore.
Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Cc: isdn@linux-pingi.de
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13c6ee2a

sockopt: Rename SO_TIMESTAMP* to SO_TIMESTAMP*_OLD · 7f1bc6e9

由 Deepa Dinamani 提交于 2月 02, 2019

SO_TIMESTAMP, SO_TIMESTAMPNS and SO_TIMESTAMPING options, the
way they are currently defined, are not y2038 safe.
Subsequent patches in the series add new y2038 safe versions
of these options which provide 64 bit timestamps on all
architectures uniformly.
Hence, rename existing options with OLD tag suffixes.

Also note that kernel will not use the untagged SO_TIMESTAMP*
and SCM_TIMESTAMP* options internally anymore.
Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Cc: deller@gmx.de
Cc: dhowells@redhat.com
Cc: jejb@parisc-linux.org
Cc: ralf@linux-mips.org
Cc: rth@twiddle.net
Cc: linux-afs@lists.infradead.org
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-parisc@vger.kernel.org
Cc: linux-rdma@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f1bc6e9

ipv4/igmp: Don't drop IGMP pkt with zeros src addr · 1d2f4ebb

由 Edward Chron 提交于 1月 31, 2019

Don't drop IGMP packets with a source address of all zeros which are
IGMP proxy reports. This is documented in Section 2.1.1 IGMP
Forwarding Rules of RFC 4541 IGMP and MLD Snooping Switches
Considerations.
Signed-off-by: NEdward Chron <echron@arista.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1d2f4ebb

02 2月, 2019 2 次提交

ipconfig: add carrier_timeout kernel parameter · 3fc46fc9

由 Martin Kepplinger 提交于 1月 31, 2019

commit 3fb72f1e ("ipconfig wait for carrier") added a
"wait for carrier" policy, with a fixed worst case maximum wait
of two minutes.

Now make the wait for carrier timeout configurable on the kernel
commandline and use the 120s as the default.

The timeout messages introduced with
commit 5e404cd6 ("ipconfig: add informative timeout messages while
waiting for carrier") are done in a fixed interval of 20 seconds, just
like they were before (240/12).
Signed-off-by: NMartin Kepplinger <martin.kepplinger@ginzinger.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3fc46fc9

ipv4: fib: use struct_size() in kzalloc() · 1f533ba6

由 Gustavo A. R. Silva 提交于 1月 30, 2019

One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
    int stuff;
    struct boo entry[];
};

instance = kzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.
Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f533ba6

31 1月, 2019 1 次提交

net: ip_gre: always reports o_key to userspace · feaf5c79

由 Lorenzo Bianconi 提交于 1月 28, 2019

Erspan protocol (version 1 and 2) relies on o_key to configure
session id header field. However TUNNEL_KEY bit is cleared in
erspan_xmit since ERSPAN protocol does not set the key field
of the external GRE header and so the configured o_key is not reported
to userspace. The issue can be triggered with the following reproducer:

$ip link add erspan1 type erspan local 192.168.0.1 remote 192.168.0.2 \
    key 1 seq erspan_ver 1
$ip link set erspan1 up
$ip -d link sh erspan1

erspan1@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UNKNOWN mode DEFAULT
  link/ether 52:aa:99:95:9a:b5 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500
  erspan remote 192.168.0.2 local 192.168.0.1 ttl inherit ikey 0.0.0.1 iseq oseq erspan_index 0

Fix the issue adding TUNNEL_KEY bit to the o_flags parameter in
ipgre_fill_info

Fixes: 84e54fe0 ("gre: introduce native tunnel support for ERSPAN")
Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

feaf5c79

28 1月, 2019 5 次提交

netfilter: ipv4: remove useless export_symbol · 83f52928

由 Florian Westphal 提交于 1月 27, 2019

Only one caller; place it where needed and get rid of the EXPORT_SYMBOL.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

83f52928

esp: Skip TX bytes accounting when sending from a request socket · 09db5124

由 Martin Willi 提交于 1月 28, 2019

On ESP output, sk_wmem_alloc is incremented for the added padding if a
socket is associated to the skb. When replying with TCP SYNACKs over
IPsec, the associated sk is a casted request socket, only. Increasing
sk_wmem_alloc on a request socket results in a write at an arbitrary
struct offset. In the best case, this produces the following WARNING:

WARNING: CPU: 1 PID: 0 at lib/refcount.c:102 esp_output_head+0x2e4/0x308 [esp4]
refcount_t: addition on 0; use-after-free.
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-rc3 #2
Hardware name: Marvell Armada 380/385 (Device Tree)
[...]
[<bf0ff354>] (esp_output_head [esp4]) from [<bf1006a4>] (esp_output+0xb8/0x180 [esp4])
[<bf1006a4>] (esp_output [esp4]) from [<c05dee64>] (xfrm_output_resume+0x558/0x664)
[<c05dee64>] (xfrm_output_resume) from [<c05d07b0>] (xfrm4_output+0x44/0xc4)
[<c05d07b0>] (xfrm4_output) from [<c05956bc>] (tcp_v4_send_synack+0xa8/0xe8)
[<c05956bc>] (tcp_v4_send_synack) from [<c0586ad8>] (tcp_conn_request+0x7f4/0x948)
[<c0586ad8>] (tcp_conn_request) from [<c058c404>] (tcp_rcv_state_process+0x2a0/0xe64)
[<c058c404>] (tcp_rcv_state_process) from [<c05958ac>] (tcp_v4_do_rcv+0xf0/0x1f4)
[<c05958ac>] (tcp_v4_do_rcv) from [<c0598a4c>] (tcp_v4_rcv+0xdb8/0xe20)
[<c0598a4c>] (tcp_v4_rcv) from [<c056eb74>] (ip_protocol_deliver_rcu+0x2c/0x2dc)
[<c056eb74>] (ip_protocol_deliver_rcu) from [<c056ee6c>] (ip_local_deliver_finish+0x48/0x54)
[<c056ee6c>] (ip_local_deliver_finish) from [<c056eecc>] (ip_local_deliver+0x54/0xec)
[<c056eecc>] (ip_local_deliver) from [<c056efac>] (ip_rcv+0x48/0xb8)
[<c056efac>] (ip_rcv) from [<c0519c2c>] (__netif_receive_skb_one_core+0x50/0x6c)
[...]

The issue triggers only when not using TCP syncookies, as for syncookies
no socket is associated.

Fixes: cac2661c ("esp4: Avoid skb_cow_data whenever possible")
Fixes: 03e2a30f ("esp6: Avoid skb_cow_data whenever possible")
Signed-off-by: NMartin Willi <martin@strongswan.org>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

09db5124

netfilter: ipt_CLUSTERIP: fix warning unused variable cn · 206b8cc5

由 Anders Roxell 提交于 1月 23, 2019

When CONFIG_PROC_FS isn't set the variable cn isn't used.

net/ipv4/netfilter/ipt_CLUSTERIP.c: In function ‘clusterip_net_exit’:
net/ipv4/netfilter/ipt_CLUSTERIP.c:849:24: warning: unused variable ‘cn’ [-Wunused-variable]
  struct clusterip_net *cn = clusterip_pernet(net);
                        ^~

Rework so the variable 'cn' is declared inside "#ifdef CONFIG_PROC_FS".

Fixes: b12f7bad ("netfilter: ipt_CLUSTERIP: remove wrong WARN_ON_ONCE in netns exit routine")
Signed-off-by: NAnders Roxell <anders.roxell@linaro.org>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

206b8cc5

tcp: change pingpong threshold to 3 · 4a41f453

由 Wei Wang 提交于 1月 25, 2019

In order to be more confident about an on-going interactive session, we
increment pingpong count by 1 for every interactive transaction and we
adjust TCP_PINGPONG_THRESH to 3.
This means, we only consider a session in pingpong mode after we see 3
interactive transactions, and start to activate delayed acks in quick
ack mode.
And in order to not over-count the credits, we only increase pingpong
count for the first packet sent in response for the previous received
packet.
This is mainly to prevent delaying the ack immediately after some
handshake protocol but no real interactive traffic pattern afterwards.
Signed-off-by: NWei Wang <weiwan@google.com>
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4a41f453

tcp: Refactor pingpong code · 31954cd8

由 Wei Wang 提交于 1月 25, 2019

Instead of using pingpong as a single bit information, we refactor the
code to treat it as a counter. When interactive session is detected,
we set pingpong count to TCP_PINGPONG_THRESH. And when pingpong count
is >= TCP_PINGPONG_THRESH, we consider the session in pingpong mode.

This patch is a pure refactor and sets foundation for the next patch.
This patch itself does not change any pingpong logic.
Signed-off-by: NWei Wang <weiwan@google.com>
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31954cd8

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功