提交 · fc37987265b5979129a72c672b20245119768fb8 · openeuler / Kernel

02 6月, 2020 1 次提交

tun: correct header offsets in napi frags mode · 96aa1b22

由 Willem de Bruijn 提交于 5月 30, 2020

Tun in IFF_NAPI_FRAGS mode calls napi_gro_frags. Unlike netif_rx and
netif_gro_receive, this expects skb->data to point to the mac layer.

But skb_probe_transport_header, __skb_get_hash_symmetric, and
xdp_do_generic in tun_get_user need skb->data to point to the network
header. Flow dissection also needs skb->protocol set, so
eth_type_trans has to be called.

Ensure the link layer header lies in linear as eth_type_trans pulls
ETH_HLEN. Then take the same code paths for frags as for not frags.
Push the link layer header back just before calling napi_gro_frags.

By pulling up to ETH_HLEN from frag0 into linear, this disables the
frag0 optimization in the special case when IFF_NAPI_FRAGS is used
with zero length iov[0] (and thus empty skb->linear).

Fixes: 90e33d45 ("tun: enable napi_gro_frags() for TUN/TAP driver")
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Acked-by: NPetar Penkov <ppenkov@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

96aa1b22

15 5月, 2020 1 次提交

tun: Add XDP frame size · fb3e6e93

由 Jesper Dangaard Brouer 提交于 5月 14, 2020

The tun driver have two code paths for running XDP (bpf_prog_run_xdp).
In both cases 'buflen' contains enough tailroom for skb_shared_info.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/bpf/158945343419.97035.9594485183958037621.stgit@firesoul

fb3e6e93

13 4月, 2020 1 次提交

net: tun: record RX queue in skb before do_xdp_generic() · 3fe260e0

由 Gilberto Bertin 提交于 4月 10, 2020

This allows netif_receive_generic_xdp() to correctly determine the RX
queue from which the skb is coming, so that the context passed to the
XDP program will contain the correct RX queue index.
Signed-off-by: NGilberto Bertin <me@jibi.io>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3fe260e0

07 4月, 2020 1 次提交

tun: Don't put_page() for all negative return values from XDP program · bee34890

由 Will Deacon 提交于 4月 03, 2020

When an XDP program is installed, tun_build_skb() grabs a reference to
the current page fragment page if the program returns XDP_REDIRECT or
XDP_TX. However, since tun_xdp_act() passes through negative return
values from the XDP program, it is possible to trigger the error path by
mistake and accidentally drop a reference to the fragments page without
taking one, leading to a spurious free. This is believed to be the cause
of some KASAN use-after-free reports from syzbot [1], although without a
reproducer it is not possible to confirm whether this patch fixes the
problem.

Ensure that we only drop a reference to the fragments page if the XDP
transmit or redirect operations actually fail.

[1] https://syzkaller.appspot.com/bug?id=e76a6af1be4acd727ff6bbca669833f98cbf5d95

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
CC: Eric Dumazet <edumazet@google.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Fixes: 8ae1aff0 ("tuntap: split out XDP logic")
Signed-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bee34890

07 3月, 2020 1 次提交

tun: reject unsupported coalescing params · e5ad00b3

由 Jakub Kicinski 提交于 3月 05, 2020

Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver did not previously reject unsupported parameters.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e5ad00b3

06 3月, 2020 5 次提交

tun: drop TUN_DEBUG and tun_debug() · 5af09071

由 Michal Kubecek 提交于 3月 04, 2020

TUN_DEBUG and tun_debug() are no longer used anywhere, drop them.
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5af09071

tun: replace tun_debug() by netif_info() · 3424170f

由 Michal Kubecek 提交于 3月 04, 2020

The tun driver uses custom macro tun_debug() which is only available if
TUN_DEBUG is set. Replace it by standard netif_ifinfo(). For that purpose,
rename tun_struct::debug to msg_enable and make it u32 and always present.
Finally, make tun_get_msglevel(), tun_set_msglevel() and TUNSETDEBUG ioctl
independent of TUN_DEBUG.
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3424170f

tun: drop useless debugging statements · 18209434

由 Michal Kubecek 提交于 3月 04, 2020

Some of the tun_debug() statements only inform us about entering
a function which can be easily achieved with ftrace or kprobe. As
tun_debug() is no-op unless TUN_DEBUG is set which requires editing the
source and recompiling, setting up ftrace or kprobe is easier. Drop these
debug statements.

Also drop the tun_debug() statement informing about SIOCSIFHWADDR ioctl.
We can monitor these through rtnetlink and it makes little sense to log
address changes through ioctl but not changes through rtnetlink. Moreover,
this tun_debug() is called even if the actual address change fails which
makes it even less useful.
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

18209434

tun: get rid of DBG1() macro · 7522416d

由 Michal Kubecek 提交于 3月 04, 2020

This macro is no-op unless TUN_DEBUG is defined (which requires editing and
recompiling the source) and only does something if variable debug is 2 but
that variable is zero initialized and never set to anything else. Moreover,
the only use of the macro informs about entering function tun_chr_open()
which can be easily achieved using ftrace or kprobe.

Drop DBG1() macro, its only use and global variable debug.
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7522416d

tun: fix misleading comment format · 516c512b

由 Michal Kubecek 提交于 3月 04, 2020

The comment above tun_flow_save_rps_rxhash() starts with "/**" which
makes it look like kerneldoc comment and results in warnings when
building with W=1. Fix the format to make it look like a normal comment.
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

516c512b

24 2月, 2020 1 次提交

tun: Remove unnecessary BUG_ON check in tun_net_xmit · fb0b1c60

由 David Ahern 提交于 2月 21, 2020

The BUG_ON for NULL tfile is now redundant due to a recently
added null check after the rcu_dereference. Remove the BUG_ON.

Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb0b1c60

23 1月, 2020 1 次提交

tun: add mutex_unlock() call and napi.skb clearing in tun_get_user() · 1efba987

由 Eric Dumazet 提交于 1月 22, 2020

If both IFF_NAPI_FRAGS mode and XDP are enabled, and the XDP program
consumes the skb, we need to clear the napi.skb (or risk
a use-after-free) and release the mutex (or risk a deadlock)

WARNING: lock held when returning to user space!
5.5.0-rc6-syzkaller #0 Not tainted
------------------------------------------------
syz-executor.0/455 is leaving the kernel with locks still held!
1 lock held by syz-executor.0/455:
 #0: ffff888098f6e748 (&tfile->napi_mutex){+.+.}, at: tun_get_user+0x1604/0x3fc0 drivers/net/tun.c:1835

Fixes: 90e33d45 ("tun: enable napi_gro_frags() for TUN/TAP driver")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Cc: Petar Penkov <ppenkov@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1efba987

17 1月, 2020 1 次提交

xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths · 1d233886

由 Toke Høiland-Jørgensen 提交于 1月 16, 2020

Since the bulk queue used by XDP_REDIRECT now lives in struct net_device,
we can re-use the bulking for the non-map version of the bpf_redirect()
helper. This is a simple matter of having xdp_do_redirect_slow() queue the
frame on the bulk queue instead of sending it out with __bpf_tx_xdp().

Unfortunately we can't make the bpf_redirect() helper return an error if
the ifindex doesn't exit (as bpf_redirect_map() does), because we don't
have a reference to the network namespace of the ingress device at the time
the helper is called. So we have to leave it as-is and keep the device
lookup in xdp_do_redirect_slow().

Since this leaves less reason to have the non-map redirect code in a
separate function, so we get rid of the xdp_do_redirect_slow() function
entirely. This does lose us the tracepoint disambiguation, but fortunately
the xdp_redirect and xdp_redirect_map tracepoints use the same tracepoint
entry structures. This means both can contain a map index, so we can just
amend the tracepoint definitions so we always emit the xdp_redirect(_err)
tracepoints, but with the map ID only populated if a map is present. This
means we retire the xdp_redirect_map(_err) tracepoints entirely, but keep
the definitions around in case someone is still listening for them.

With this change, the performance of the xdp_redirect sample program goes
from 5Mpps to 8.4Mpps (a 68% increase).

Since the flush functions are no longer map-specific, rename the flush()
functions to drop _map from their names. One of the renamed functions is
the xdp_do_flush_map() callback used in all the xdp-enabled drivers. To
keep from having to update all drivers, use a #define to keep the old name
working, and only update the virtual drivers in this patch.
Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/157918768505.1458396.17518057312953572912.stgit@toke.dk

1d233886

16 11月, 2019 1 次提交

tun: fix data-race in gro_normal_list() · c39e342a

由 Petar Penkov 提交于 11月 14, 2019

There is a race in the TUN driver between napi_busy_loop and
napi_gro_frags. This commit resolves the race by adding the NAPI struct
via netif_tx_napi_add, instead of netif_napi_add, which disables polling
for the NAPI struct.

KCSAN reported:
BUG: KCSAN: data-race in gro_normal_list.part.0 / napi_busy_loop

write to 0xffff8880b5d474b0 of 4 bytes by task 11205 on cpu 0:
 gro_normal_list.part.0+0x77/0xb0 net/core/dev.c:5682
 gro_normal_list net/core/dev.c:5678 [inline]
 gro_normal_one net/core/dev.c:5692 [inline]
 napi_frags_finish net/core/dev.c:5705 [inline]
 napi_gro_frags+0x625/0x770 net/core/dev.c:5778
 tun_get_user+0x2150/0x26a0 drivers/net/tun.c:1976
 tun_chr_write_iter+0x79/0xd0 drivers/net/tun.c:2022
 call_write_iter include/linux/fs.h:1895 [inline]
 do_iter_readv_writev+0x487/0x5b0 fs/read_write.c:693
 do_iter_write fs/read_write.c:970 [inline]
 do_iter_write+0x13b/0x3c0 fs/read_write.c:951
 vfs_writev+0x118/0x1c0 fs/read_write.c:1015
 do_writev+0xe3/0x250 fs/read_write.c:1058
 __do_sys_writev fs/read_write.c:1131 [inline]
 __se_sys_writev fs/read_write.c:1128 [inline]
 __x64_sys_writev+0x4e/0x60 fs/read_write.c:1128
 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

read to 0xffff8880b5d474b0 of 4 bytes by task 11168 on cpu 1:
 gro_normal_list net/core/dev.c:5678 [inline]
 napi_busy_loop+0xda/0x4f0 net/core/dev.c:6126
 sk_busy_loop include/net/busy_poll.h:108 [inline]
 __skb_recv_udp+0x4ad/0x560 net/ipv4/udp.c:1689
 udpv6_recvmsg+0x29e/0xe90 net/ipv6/udp.c:288
 inet6_recvmsg+0xbb/0x240 net/ipv6/af_inet6.c:592
 sock_recvmsg_nosec net/socket.c:871 [inline]
 sock_recvmsg net/socket.c:889 [inline]
 sock_recvmsg+0x92/0xb0 net/socket.c:885
 sock_read_iter+0x15f/0x1e0 net/socket.c:967
 call_read_iter include/linux/fs.h:1889 [inline]
 new_sync_read+0x389/0x4f0 fs/read_write.c:414
 __vfs_read+0xb1/0xc0 fs/read_write.c:427
 vfs_read fs/read_write.c:461 [inline]
 vfs_read+0x143/0x2c0 fs/read_write.c:446
 ksys_read+0xd5/0x1b0 fs/read_write.c:587
 __do_sys_read fs/read_write.c:597 [inline]
 __se_sys_read fs/read_write.c:595 [inline]
 __x64_sys_read+0x4c/0x60 fs/read_write.c:595
 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 11168 Comm: syz-executor.0 Not tainted 5.4.0-rc6+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: 94317099 ("tun: enable NAPI for TUN/TAP driver")
Signed-off-by: NPetar Penkov <ppenkov@google.com>
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c39e342a

08 11月, 2019 1 次提交

tun: switch to u64_stats_t · 5260dd3e

由 Eric Dumazet 提交于 11月 07, 2019

In order to fix this data-race found by KCSAN [1],
switch to u64_stats_t helpers. They provide all
the needed annotations, without adding extra cost.

[1]
BUG: KCSAN: data-race in tun_get_user / tun_net_get_stats64

read to 0xffffe8ffffd8aca8 of 8 bytes by task 4882 on cpu 0:
 tun_net_get_stats64+0x9b/0x230 drivers/net/tun.c:1171
 dev_get_stats+0x89/0x1e0 net/core/dev.c:9103
 rtnl_fill_stats+0x56/0x370 net/core/rtnetlink.c:1177
 rtnl_fill_ifinfo+0xd3b/0x2100 net/core/rtnetlink.c:1667
 rtmsg_ifinfo_build_skb+0xb0/0x150 net/core/rtnetlink.c:3472
 rtmsg_ifinfo_event.part.0+0x4e/0xb0 net/core/rtnetlink.c:3504
 rtmsg_ifinfo_event net/core/rtnetlink.c:3515 [inline]
 rtmsg_ifinfo+0x85/0x90 net/core/rtnetlink.c:3513
 __dev_notify_flags+0x18b/0x200 net/core/dev.c:7649
 dev_change_flags+0xb8/0xe0 net/core/dev.c:7691
 dev_ifsioc+0x201/0x6a0 net/core/dev_ioctl.c:237
 dev_ioctl+0x149/0x660 net/core/dev_ioctl.c:489
 sock_do_ioctl+0xdb/0x230 net/socket.c:1061
 sock_ioctl+0x3a3/0x5e0 net/socket.c:1189
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:509 [inline]
 do_vfs_ioctl+0x991/0xc60 fs/ioctl.c:696

write to 0xffffe8ffffd8aca8 of 8 bytes by task 4883 on cpu 1:
 tun_get_user+0x1d94/0x2ba0 drivers/net/tun.c:2002
 tun_chr_write_iter+0x79/0xd0 drivers/net/tun.c:2022
 call_write_iter include/linux/fs.h:1895 [inline]
 new_sync_write+0x388/0x4a0 fs/read_write.c:483
 __vfs_write+0xb1/0xc0 fs/read_write.c:496
 __kernel_write+0xb8/0x240 fs/read_write.c:515
 write_pipe_buf+0xb6/0xf0 fs/splice.c:794
 splice_from_pipe_feed fs/splice.c:500 [inline]
 __splice_from_pipe+0x248/0x480 fs/splice.c:624
 splice_from_pipe+0xbb/0x100 fs/splice.c:659
 default_file_splice_write+0x45/0x90 fs/splice.c:806
 do_splice_from fs/splice.c:848 [inline]
 direct_splice_actor+0xa0/0xc0 fs/splice.c:1020
 splice_direct_to_actor+0x215/0x510 fs/splice.c:975
 do_splice_direct+0x161/0x1e0 fs/splice.c:1063
 do_sendfile+0x384/0x7f0 fs/read_write.c:1464

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 4883 Comm: syz-executor.1 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5260dd3e

10 10月, 2019 1 次提交

tun: remove possible false sharing in tun_flow_update() · 4ffdd22e

由 Eric Dumazet 提交于 10月 09, 2019

As mentioned in https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE#it-may-improve-performance
a C compiler can legally transform

if (e->queue_index != queue_index)
	e->queue_index = queue_index;

to :

	e->queue_index = queue_index;

Note that the code using jiffies has no issue, since jiffies
has volatile attribute.

if (e->updated != jiffies)
    e->updated = jiffies;

Fixes: 83b1bc12 ("tun: align write-heavy flow entry members to a cache line")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Zhang Yu <zhangyu31@baidu.com>
Cc: Wang Li <wangli39@baidu.com>
Cc: Li RongQing <lirongqing@baidu.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>

4ffdd22e

09 10月, 2019 2 次提交

Revert "tun: call dev_get_valid_name() before register_netdevice()" · bacb7e18

由 Eric Dumazet 提交于 10月 08, 2019

This reverts commit 0ad646c8.

As noticed by Jakub, this is no longer needed after
commit 11fc7d5a ("tun: fix memory leak in error path")

This no longer exports dev_get_valid_name() for the exclusive
use of tun driver.
Suggested-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>

bacb7e18

tun: fix memory leak in error path · 11fc7d5a

由 Eric Dumazet 提交于 10月 07, 2019

syzbot reported a warning [1] that triggered after recent Jiri patch.

This exposes a bug that we hit already in the past (see commit
ff244c6b ("tun: handle register_netdevice() failures properly")
for details)

tun uses priv->destructor without an ndo_init() method.

register_netdevice() can return an error, but will
not call priv->destructor() in some cases. Jiri recent
patch added one more.

A long term fix would be to transfer the initialization
of what we destroy in ->destructor() in the ndo_init()

This looks a bit risky given the complexity of tun driver.

A simpler fix is to detect after the failed register_netdevice()
if the tun_free_netdev() function was called already.

[1]
ODEBUG: free active (active state 0) object type: timer_list hint: tun_flow_cleanup+0x0/0x280 drivers/net/tun.c:457
WARNING: CPU: 0 PID: 8653 at lib/debugobjects.c:481 debug_print_object+0x168/0x250 lib/debugobjects.c:481
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 8653 Comm: syz-executor976 Not tainted 5.4.0-rc1-next-20191004 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 panic+0x2dc/0x755 kernel/panic.c:220
 __warn.cold+0x2f/0x3c kernel/panic.c:581
 report_bug+0x289/0x300 lib/bug.c:195
 fixup_bug arch/x86/kernel/traps.c:174 [inline]
 fixup_bug arch/x86/kernel/traps.c:169 [inline]
 do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:267
 do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:286
 invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1028
RIP: 0010:debug_print_object+0x168/0x250 lib/debugobjects.c:481
Code: dd 80 b9 e6 87 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 b5 00 00 00 48 8b 14 dd 80 b9 e6 87 48 c7 c7 e0 ae e6 87 e8 80 84 ff fd <0f> 0b 83 05 e3 ee 80 06 01 48 83 c4 20 5b 41 5c 41 5d 41 5e 5d c3
RSP: 0018:ffff888095997a28 EFLAGS: 00010082
RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff815cb526 RDI: ffffed1012b32f37
RBP: ffff888095997a68 R08: ffff8880a92ac580 R09: ffffed1015d04101
R10: ffffed1015d04100 R11: ffff8880ae820807 R12: 0000000000000001
R13: ffffffff88fb5340 R14: ffffffff81627110 R15: ffff8880aa41eab8
 __debug_check_no_obj_freed lib/debugobjects.c:963 [inline]
 debug_check_no_obj_freed+0x2d4/0x43f lib/debugobjects.c:994
 kfree+0xf8/0x2c0 mm/slab.c:3755
 kvfree+0x61/0x70 mm/util.c:593
 netdev_freemem net/core/dev.c:9384 [inline]
 free_netdev+0x39d/0x450 net/core/dev.c:9533
 tun_set_iff drivers/net/tun.c:2871 [inline]
 __tun_chr_ioctl+0x317b/0x3f30 drivers/net/tun.c:3075
 tun_chr_ioctl+0x2b/0x40 drivers/net/tun.c:3355
 vfs_ioctl fs/ioctl.c:47 [inline]
 file_ioctl fs/ioctl.c:539 [inline]
 do_vfs_ioctl+0xdb6/0x13e0 fs/ioctl.c:726
 ksys_ioctl+0xab/0xd0 fs/ioctl.c:743
 __do_sys_ioctl fs/ioctl.c:750 [inline]
 __se_sys_ioctl fs/ioctl.c:748 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:748
 do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x441439
Code: e8 9c ae 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fff61c37438 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000441439
RDX: 0000000020000400 RSI: 00000000400454ca RDI: 0000000000000004
RBP: 00007fff61c37470 R08: 0000000000000001 R09: 0000000100000000
R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffffffffff
R13: 0000000000000005 R14: 0000000000000000 R15: 0000000000000000
Kernel Offset: disabled
Rebooting in 86400 seconds..

Fixes: ff927412 ("net: introduce name_node struct to be used in hashlist")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>

11fc7d5a

02 10月, 2019 1 次提交

netfilter: drop bridge nf reset from nf_reset · 895b5c9f

由 Florian Westphal 提交于 9月 29, 2019

commit 174e2381
("sk_buff: drop all skb extensions on free and skb scrubbing") made napi
recycle always drop skb extensions.  The additional skb_ext_del() that is
performed via nf_reset on napi skb recycle is not needed anymore.

Most nf_reset() calls in the stack are there so queued skb won't block
'rmmod nf_conntrack' indefinitely.

This removes the skb_ext_del from nf_reset, and renames it to a more
fitting nf_reset_ct().

In a few selected places, add a call to skb_ext_reset to make sure that
no active extensions remain.

I am submitting this for "net", because we're still early in the release
cycle.  The patch applies to net-next too, but I think the rename causes
needless divergence between those trees.
Suggested-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

895b5c9f

12 9月, 2019 1 次提交

tun: fix use-after-free when register netdev failed · 77f22f92

由 Yang Yingliang 提交于 9月 10, 2019

I got a UAF repport in tun driver when doing fuzzy test:

[  466.269490] ==================================================================
[  466.271792] BUG: KASAN: use-after-free in tun_chr_read_iter+0x2ca/0x2d0
[  466.271806] Read of size 8 at addr ffff888372139250 by task tun-test/2699
[  466.271810]
[  466.271824] CPU: 1 PID: 2699 Comm: tun-test Not tainted 5.3.0-rc1-00001-g5a9433db2614-dirty #427
[  466.271833] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[  466.271838] Call Trace:
[  466.271858]  dump_stack+0xca/0x13e
[  466.271871]  ? tun_chr_read_iter+0x2ca/0x2d0
[  466.271890]  print_address_description+0x79/0x440
[  466.271906]  ? vprintk_func+0x5e/0xf0
[  466.271920]  ? tun_chr_read_iter+0x2ca/0x2d0
[  466.271935]  __kasan_report+0x15c/0x1df
[  466.271958]  ? tun_chr_read_iter+0x2ca/0x2d0
[  466.271976]  kasan_report+0xe/0x20
[  466.271987]  tun_chr_read_iter+0x2ca/0x2d0
[  466.272013]  do_iter_readv_writev+0x4b7/0x740
[  466.272032]  ? default_llseek+0x2d0/0x2d0
[  466.272072]  do_iter_read+0x1c5/0x5e0
[  466.272110]  vfs_readv+0x108/0x180
[  466.299007]  ? compat_rw_copy_check_uvector+0x440/0x440
[  466.299020]  ? fsnotify+0x888/0xd50
[  466.299040]  ? __fsnotify_parent+0xd0/0x350
[  466.299064]  ? fsnotify_first_mark+0x1e0/0x1e0
[  466.304548]  ? vfs_write+0x264/0x510
[  466.304569]  ? ksys_write+0x101/0x210
[  466.304591]  ? do_preadv+0x116/0x1a0
[  466.304609]  do_preadv+0x116/0x1a0
[  466.309829]  do_syscall_64+0xc8/0x600
[  466.309849]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  466.309861] RIP: 0033:0x4560f9
[  466.309875] Code: 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
[  466.309889] RSP: 002b:00007ffffa5166e8 EFLAGS: 00000206 ORIG_RAX: 0000000000000127
[  466.322992] RAX: ffffffffffffffda RBX: 0000000000400460 RCX: 00000000004560f9
[  466.322999] RDX: 0000000000000003 RSI: 00000000200008c0 RDI: 0000000000000003
[  466.323007] RBP: 00007ffffa516700 R08: 0000000000000004 R09: 0000000000000000
[  466.323014] R10: 0000000000000000 R11: 0000000000000206 R12: 000000000040cb10
[  466.323021] R13: 0000000000000000 R14: 00000000006d7018 R15: 0000000000000000
[  466.323057]
[  466.323064] Allocated by task 2605:
[  466.335165]  save_stack+0x19/0x80
[  466.336240]  __kasan_kmalloc.constprop.8+0xa0/0xd0
[  466.337755]  kmem_cache_alloc+0xe8/0x320
[  466.339050]  getname_flags+0xca/0x560
[  466.340229]  user_path_at_empty+0x2c/0x50
[  466.341508]  vfs_statx+0xe6/0x190
[  466.342619]  __do_sys_newstat+0x81/0x100
[  466.343908]  do_syscall_64+0xc8/0x600
[  466.345303]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  466.347034]
[  466.347517] Freed by task 2605:
[  466.348471]  save_stack+0x19/0x80
[  466.349476]  __kasan_slab_free+0x12e/0x180
[  466.350726]  kmem_cache_free+0xc8/0x430
[  466.351874]  putname+0xe2/0x120
[  466.352921]  filename_lookup+0x257/0x3e0
[  466.354319]  vfs_statx+0xe6/0x190
[  466.355498]  __do_sys_newstat+0x81/0x100
[  466.356889]  do_syscall_64+0xc8/0x600
[  466.358037]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  466.359567]
[  466.360050] The buggy address belongs to the object at ffff888372139100
[  466.360050]  which belongs to the cache names_cache of size 4096
[  466.363735] The buggy address is located 336 bytes inside of
[  466.363735]  4096-byte region [ffff888372139100, ffff88837213a100)
[  466.367179] The buggy address belongs to the page:
[  466.368604] page:ffffea000dc84e00 refcount:1 mapcount:0 mapping:ffff8883df1b4f00 index:0x0 compound_mapcount: 0
[  466.371582] flags: 0x2fffff80010200(slab|head)
[  466.372910] raw: 002fffff80010200 dead000000000100 dead000000000122 ffff8883df1b4f00
[  466.375209] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[  466.377778] page dumped because: kasan: bad access detected
[  466.379730]
[  466.380288] Memory state around the buggy address:
[  466.381844]  ffff888372139100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  466.384009]  ffff888372139180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  466.386131] >ffff888372139200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  466.388257]                                                  ^
[  466.390234]  ffff888372139280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  466.392512]  ffff888372139300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  466.394667] ==================================================================

tun_chr_read_iter() accessed the memory which freed by free_netdev()
called by tun_set_iff():

        CPUA                                           CPUB
  tun_set_iff()
    alloc_netdev_mqs()
    tun_attach()
                                                  tun_chr_read_iter()
                                                    tun_get()
                                                    tun_do_read()
                                                      tun_ring_recv()
    register_netdevice() <-- inject error
    goto err_detach
    tun_detach_all() <-- set RCV_SHUTDOWN
    free_netdev() <-- called from
                     err_free_dev path
      netdev_freemem() <-- free the memory
                        without check refcount
      (In this path, the refcount cannot prevent
       freeing the memory of dev, and the memory
       will be used by dev_put() called by
       tun_chr_read_iter() on CPUB.)
                                                     (Break from tun_ring_recv(),
                                                     because RCV_SHUTDOWN is set)
                                                   tun_put()
                                                     dev_put() <-- use the memory
                                                                   freed by netdev_freemem()

Put the publishing of tfile->tun after register_netdevice(),
so tun_get() won't get the tun pointer that freed by
err_detach path if register_netdevice() failed.

Fixes: eb0fb363 ("tuntap: attach queue 0 before registering netdevice")
Reported-by: NHulk Robot <hulkci@huawei.com>
Suggested-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77f22f92

26 7月, 2019 1 次提交

tun: mark small packets as owned by the tap sock · 4b663366

由 Alexis Bauvin 提交于 7月 23, 2019

- v1 -> v2: Move skb_set_owner_w to __tun_build_skb to reduce patch size

Small packets going out of a tap device go through an optimized code
path that uses build_skb() rather than sock_alloc_send_pskb(). The
latter calls skb_set_owner_w(), but the small packet code path does not.

The net effect is that small packets are not owned by the userland
application's socket (e.g. QEMU), while large packets are.
This can be seen with a TCP session, where packets are not owned when
the window size is small enough (around PAGE_SIZE), while they are once
the window grows (note that this requires the host to support virtio
tso for the guest to offload segmentation).
All this leads to inconsistent behaviour in the kernel, especially on
netfilter modules that uses sk->socket (e.g. xt_owner).

Fixes: 66ccbc9c ("tap: use build_skb() for small packet")
Signed-off-by: NAlexis Bauvin <abauvin@scaleway.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b663366

09 7月, 2019 1 次提交

coallocate socket_wq with socket itself · 333f7909

由 Al Viro 提交于 7月 05, 2019

socket->wq is assign-once, set when we are initializing both
struct socket it's in and struct socket_wq it points to.  As the
matter of fact, the only reason for separate allocation was the
ability to RCU-delay freeing of socket_wq.  RCU-delaying the
freeing of socket itself gets rid of that need, so we can just
fold struct socket_wq into the end of struct socket and simplify
the life both for sock_alloc_inode() (one allocation instead of
two) and for tun/tap oddballs, where we used to embed struct socket
and struct socket_wq into the same structure (now - embedding just
the struct socket).

Note that reference to struct socket_wq in struct sock does remain
a reference - that's unchanged.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

333f7909

19 6月, 2019 1 次提交

tun: wake up waitqueues after IFF_UP is set · 72b319dc

由 Fei Li 提交于 6月 17, 2019

Currently after setting tap0 link up, the tun code wakes tx/rx waited
queues up in tun_net_open() when .ndo_open() is called, however the
IFF_UP flag has not been set yet. If there's already a wait queue, it
would fail to transmit when checking the IFF_UP flag in tun_sendmsg().
Then the saving vhost_poll_start() will add the wq into wqh until it
is waken up again. Although this works when IFF_UP flag has been set
when tun_chr_poll detects; this is not true if IFF_UP flag has not
been set at that time. Sadly the latter case is a fatal error, as
the wq will never be waken up in future unless later manually
setting link up on purpose.

Fix this by moving the wakeup process into the NETDEV_UP event
notifying process, this makes sure IFF_UP has been set before all
waited queues been waken up.
Signed-off-by: NFei Li <lifei.shirley@bytedance.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72b319dc

31 5月, 2019 1 次提交

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157 · c942fddf

由 Thomas Gleixner 提交于 5月 27, 2019

Based on 3 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version this program is distributed in the
  hope that it will be useful but without any warranty without even
  the implied warranty of merchantability or fitness for a particular
  purpose see the gnu general public license for more details

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version [author] [kishon] [vijay] [abraham]
  [i] [kishon]@[ti] [com] this program is distributed in the hope that
  it will be useful but without any warranty without even the implied
  warranty of merchantability or fitness for a particular purpose see
  the gnu general public license for more details

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version [author] [graeme] [gregory]
  [gg]@[slimlogic] [co] [uk] [author] [kishon] [vijay] [abraham] [i]
  [kishon]@[ti] [com] [based] [on] [twl6030]_[usb] [c] [author] [hema]
  [hk] [hemahk]@[ti] [com] this program is distributed in the hope
  that it will be useful but without any warranty without even the
  implied warranty of merchantability or fitness for a particular
  purpose see the gnu general public license for more details

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 1105 file(s).
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NAllison Randal <allison@lohutok.net>
Reviewed-by: NRichard Fontana <rfontana@redhat.com>
Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070033.202006027@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

c942fddf

10 5月, 2019 2 次提交

tuntap: synchronize through tfiles array instead of tun->numqueues · 9871a9e4

由 Jason Wang 提交于 5月 08, 2019

When a queue(tfile) is detached through __tun_detach(), we move the
last enabled tfile to the position where detached one sit but don't
NULL out last position. We expect to synchronize the datapath through
tun->numqueues. Unfortunately, this won't work since we're lacking
sufficient mechanism to order or synchronize the access to
tun->numqueues.

To fix this, NULL out the last position during detaching and check
RCU protected tfile against NULL instead of checking tun->numqueues in
datapath.

Cc: YueHaibing <yuehaibing@huawei.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: weiyongjun (A) <weiyongjun1@huawei.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Fixes: c8d68e6b ("tuntap: multiqueue support")
Signed-off-by: NJason Wang <jasowang@redhat.com>
Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9871a9e4

tuntap: fix dividing by zero in ebpf queue selection · a35d310f

由 Jason Wang 提交于 5月 08, 2019

We need check if tun->numqueues is zero (e.g for the persist device)
before trying to use it for modular arithmetic.
Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
Fixes: 96f84061("tun: add eBPF based queue selection method")
Signed-off-by: NJason Wang <jasowang@redhat.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a35d310f

24 4月, 2019 1 次提交

net: pass net_device argument to the eth_get_headlen · c43f1255

由 Stanislav Fomichev 提交于 4月 22, 2019

Update all users of eth_get_headlen to pass network device, fetch
network namespace from it and pass it down to the flow dissector.
This commit is a noop until administrator inserts BPF flow dissector
program.

Cc: Maxim Krasnyansky <maxk@qti.qualcomm.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Salil Mehta <salil.mehta@huawei.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

c43f1255

24 3月, 2019 1 次提交

net: convert rps_needed and rfs_needed to new static branch api · dc05360f

由 Eric Dumazet 提交于 3月 22, 2019

We prefer static_branch_unlikely() over static_key_false() these days.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc05360f

22 3月, 2019 2 次提交

tun: Remove unused first parameter of tun_get_iff() · 12132768

由 Kirill Tkhai 提交于 3月 20, 2019

Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

12132768

tun: Add ioctl() TUNGETDEVNETNS cmd to allow obtaining real net ns of tun device · 0c3e0e3b

由 Kirill Tkhai 提交于 3月 20, 2019

In commit f2780d6d "tun: Add ioctl() SIOCGSKNS cmd to allow
obtaining net ns of tun device" it was missed that tun may change
its net ns, while net ns of socket remains the same as it was
created initially. SIOCGSKNS returns net ns of socket, so it is
not suitable for obtaining net ns of device.

We may have two tun devices with the same names in two net ns,
and in this case it's not possible to determ, which of them
fd refers to (TUNGETIFF will return the same name).

This patch adds new ioctl() cmd for obtaining net ns of a device.
Reported-by: NHarald Albrecht <harald.albrecht@gmx.net>
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c3e0e3b

21 3月, 2019 1 次提交

net: remove 'fallback' argument from dev->ndo_select_queue() · a350ecce

由 Paolo Abeni 提交于 3月 20, 2019

After the previous patch, all the callers of ndo_select_queue()
provide as a 'fallback' argument netdev_pick_tx.
The only exceptions are nested calls to ndo_select_queue(),
which pass down the 'fallback' available in the current scope
- still netdev_pick_tx.

We can drop such argument and replace fallback() invocation with
netdev_pick_tx(). This avoids an indirect call per xmit packet
in some scenarios (TCP syn, UDP unconnected, XDP generic, pktgen)
with device drivers implementing such ndo. It also clean the code
a bit.

Tested with ixgbe and CONFIG_FCOE=m

With pktgen using queue xmit:
threads		vanilla 	patched
		(kpps)		(kpps)
1		2334		2428
2		4166		4278
4		7895		8100

 v1 -> v2:
 - rebased after helper's name change
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a350ecce

17 3月, 2019 1 次提交

tun: add a missing rcu_read_unlock() in error path · 9180bb4f

由 Eric Dumazet 提交于 3月 16, 2019

In my latest patch I missed one rcu_read_unlock(), in case
device is down.

Fixes: 4477138f ("tun: properly test for IFF_UP")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9180bb4f

16 3月, 2019 1 次提交

tun: properly test for IFF_UP · 4477138f

由 Eric Dumazet 提交于 3月 14, 2019

Same reasons than the ones explained in commit 4179cb5a
("vxlan: test dev->flags & IFF_UP before calling netif_rx()")

netif_rx_ni() or napi_gro_frags() must be called under a strict contract.

At device dismantle phase, core networking clears IFF_UP
and flush_all_backlogs() is called after rcu grace period
to make sure no incoming packet might be in a cpu backlog
and still referencing the device.

A similar protocol is used for gro layer.

Most drivers call netif_rx() from their interrupt handler,
and since the interrupts are disabled at device dismantle,
netif_rx() does not have to check dev->flags & IFF_UP

Virtual drivers do not have this guarantee, and must
therefore make the check themselves.

Fixes: 1bd4978a ("tun: honor IFF_UP in tun_get_user()")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4477138f

26 2月, 2019 1 次提交

tun: remove unnecessary memory barrier · ecef67cb

由 Timur Celik 提交于 2月 25, 2019

Replace set_current_state with __set_current_state since no memory
barrier is needed at this point.
Signed-off-by: NTimur Celik <mail@timurcelik.de>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ecef67cb

25 2月, 2019 1 次提交

tun: fix blocking read · 71828b22

由 Timur Celik 提交于 2月 23, 2019

This patch moves setting of the current state into the loop. Otherwise
the task may end up in a busy wait loop if none of the break conditions
are met.
Signed-off-by: NTimur Celik <mail@timurcelik.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

71828b22

23 2月, 2019 1 次提交

net: Don't set transport offset to invalid value · d2aa125d

由 Maxim Mikityanskiy 提交于 2月 21, 2019

If the socket was created with socket(AF_PACKET, SOCK_RAW, 0),
skb->protocol will be unset, __skb_flow_dissect() will fail, and
skb_probe_transport_header() will fall back to the offset_hint, making
the resulting skb_transport_offset incorrect.

If, however, there is no transport header in the packet,
transport_header shouldn't be set to an arbitrary value.

Fix it by leaving the transport offset unset if it couldn't be found, to
be explicit rather than to fill it with some wrong value. It changes the
behavior, but if some code relied on the old behavior, it would be
broken anyway, as the old one is incorrect.
Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d2aa125d

31 1月, 2019 1 次提交

tun: move the call to tun_set_real_num_queues · 3a03cb84

由 George Amanakis 提交于 1月 29, 2019

Call tun_set_real_num_queues() after the increment of tun->numqueues
since the former depends on it. Otherwise, the number of queues is not
correctly accounted for, which results to warnings similar to:
"vnet0 selects TX queue 11, but real number of TX queues is 11".

Fixes: 0b7959b6 ("tun: publish tfile after it's fully initialized")
Reported-and-tested-by: NGeorge Amanakis <gamanakis@gmail.com>
Signed-off-by: NGeorge Amanakis <gamanakis@gmail.com>
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3a03cb84

10 1月, 2019 1 次提交

tun: publish tfile after it's fully initialized · 0b7959b6

由 Stanislav Fomichev 提交于 1月 07, 2019

BUG: unable to handle kernel NULL pointer dereference at 00000000000000d1
Call Trace:
 ? napi_gro_frags+0xa7/0x2c0
 tun_get_user+0xb50/0xf20
 tun_chr_write_iter+0x53/0x70
 new_sync_write+0xff/0x160
 vfs_write+0x191/0x1e0
 __x64_sys_write+0x5e/0xd0
 do_syscall_64+0x47/0xf0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

I think there is a subtle race between sending a packet via tap and
attaching it:

CPU0:                    CPU1:
tun_chr_ioctl(TUNSETIFF)
  tun_set_iff
    tun_attach
      rcu_assign_pointer(tfile->tun, tun);
                         tun_fops->write_iter()
                           tun_chr_write_iter
                             tun_napi_alloc_frags
                               napi_get_frags
                                 napi->skb = napi_alloc_skb
      tun_napi_init
        netif_napi_add
          napi->skb = NULL
                              napi->skb is NULL here
                              napi_gro_frags
                                napi_frags_skb
				  skb = napi->skb
				  skb_reset_mac_header(skb)
				  panic()

Move rcu_assign_pointer(tfile->tun) and rcu_assign_pointer(tun->tfiles) to
be the last thing we do in tun_attach(); this should guarantee that when we
call tun_get() we always get an initialized object.

v2 changes:
* remove extra napi_mutex locks/unlocks for napi operations
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Fixes: 90e33d45 ("tun: enable napi_gro_frags() for TUN/TAP driver")
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0b7959b6

15 12月, 2018 1 次提交

tun: replace get_cpu_ptr with this_cpu_ptr when bh disabled · 6342ca64

由 Prashant Bhole 提交于 12月 11, 2018

tun_xdp_one() runs with local bh disabled. So there is no need to
disable preemption by calling get_cpu_ptr while updating stats. This
patch replaces the use of get_cpu_ptr() with this_cpu_ptr() as a
micro-optimization. Also removes related put_cpu_ptr call.
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NPrashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6342ca64

14 12月, 2018 1 次提交

net: dev: Add extack argument to dev_set_mac_address() · 3a37a963

由 Petr Machata 提交于 12月 13, 2018

A follow-up patch will add a notifier type NETDEV_PRE_CHANGEADDR, which
allows vetoing of MAC address changes. One prominent path to that
notification is through dev_set_mac_address(). Therefore give this
function an extack argument, so that it can be packed together with the
notification. Thus a textual reason for rejection (or a warning) can be
communicated back to the user.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3a37a963

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功