- 18 1月, 2018 2 次提交
-
-
由 Jason Wang 提交于
This patch allows userspace to attach eBPF filter to tun. This will allow to implement VM dataplane filtering in a more efficient way compared to cBPF filter by allowing either qemu or libvirt to attach eBPF filter to tun. Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jason Wang 提交于
To be reused by other eBPF program other than queue selection. Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 1月, 2018 2 次提交
-
-
由 Jason Wang 提交于
This patch implements XDP transmission for TAP. Since we can't create new queues for TAP during XDP set, exist ptr_ring was reused for queuing XDP buffers. To differ xdp_buff from sk_buff, TUN_XDP_FLAG (0x1UL) was encoded into lowest bit of xpd_buff pointer during ptr_ring_produce, and was decoded during consuming. XDP metadata was stored in the headroom of the packet which should work in most of cases since driver usually reserve enough headroom. Very minor changes were done for vhost_net: it just need to peek the length depends on the type of pointer. Tests were done on two Intel E5-2630 2.40GHz machines connected back to back through two 82599ES. Traffic were generated/received through MoonGen/testpmd(rxonly). It reports ~20% improvements when xdp_redirect_map is doing redirection from ixgbe to TAP (from 2.50Mpps to 3.05Mpps) Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jason Wang 提交于
This patch switches to use ptr_ring instead of skb_array. This will be used to enqueue different types of pointers by encoding type into lower bits. Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 1月, 2018 1 次提交
-
-
由 Jesper Dangaard Brouer 提交于
Driver hook points for xdp_rxq_info: * reg : tun_attach * unreg: __tun_detach I've done some manual testing of this tun driver, but I would appriciate good review and someone else running their use-case tests, as I'm not 100% sure I understand the tfile->detached semantics. V2: Removed the skb_array_cleanup() call from V1 by request from Jason Wang. Cc: Jason Wang <jasowang@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 08 12月, 2017 1 次提交
-
-
由 Jason Wang 提交于
Private destructor could be called when register_netdev() fail with rtnl lock held. This will lead deadlock in tun_free_netdev() who tries to hold rtnl_lock. Fixing this by switching to use spinlock to synchronize. Fixes: 96f84061 ("tun: add eBPF based queue selection method") Reported-by: NEric Dumazet <eric.dumazet@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: NJason Wang <jasowang@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Reviewed-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 12月, 2017 1 次提交
-
-
由 Willem de Bruijn 提交于
The statement no longer serves a purpose. Commit fa35864e ("tuntap: Fix for a race in accessing numqueues") added the ACCESS_ONCE to avoid a race condition with skb_queue_len. Commit 436acceb ("tuntap: remove unnecessary sk_receive_queue length check during xmit") removed the affected skb_queue_len check. Commit 96f84061 ("tun: add eBPF based queue selection method") split the function, reading the field a second time in the callee. The temp variable is now only read once, so just remove it. Signed-off-by: NWillem de Bruijn <willemb@google.com> Acked-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 12月, 2017 1 次提交
-
-
由 Jason Wang 提交于
This patch introduces an eBPF based queue selection method. With this, the policy could be offloaded to userspace completely through a new ioctl TUNSETSTEERINGEBPF. Signed-off-by: NJason Wang <jasowang@redhat.com> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 12月, 2017 2 次提交
-
-
由 Wei Xu 提交于
tun_recvmsg() supports accepting skb by msg_control after commit ac77cfd4 ("tun: support receiving skb through msg_control"), the skb if presented should be freed no matter how far it can go along, otherwise it would be leaked. This patch fixes several missed cases. Signed-off-by: NWei Xu <wexu@redhat.com> Reported-by: NMatthew Rosato <mjrosato@linux.vnet.ibm.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
Since day one of XDP drivers had to remember to free the program on the remove path. This leads to code duplication and is error prone. Make the stack query the installed programs on unregister and if something is installed, remove the program. Freeing of program attached to XDP generic is moved from free_netdev() as well. Because the remove will now be called before notifiers are invoked, BPF offload state of the program will not get destroyed before uninstall. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 24 11月, 2017 1 次提交
-
-
由 Willem de Bruijn 提交于
Tuntap and similar devices can inject GSO packets. Accept type VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively. Processes are expected to use feature negotiation such as TUNSETOFFLOAD to detect supported offload types and refrain from injecting other packets. This process breaks down with live migration: guest kernels do not renegotiate flags, so destination hosts need to expose all features that the source host does. Partially revert the UFO removal from 182e0b6b~1..d9d30adf. This patch introduces nearly(*) no new code to simplify verification. It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP insertion and software UFO segmentation. It does not reinstate protocol stack support, hardware offload (NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception of VIRTIO_NET_HDR_GSO_UDP packets in tuntap. To support SKB_GSO_UDP reappearing in the stack, also reinstate logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD by squashing in commit 93991221 ("net: skb_needs_check() removes CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee6 ("net: avoid skb_warn_bad_offload false positives on UFO"). (*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id, ipv6_proxy_select_ident is changed to return a __be32 and this is assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted at the end of the enum to minimize code churn. Tested Booted a v4.13 guest kernel with QEMU. On a host kernel before this patch `ethtool -k eth0` shows UFO disabled. After the patch, it is enabled, same as on a v4.13 host kernel. A UFO packet sent from the guest appears on the tap device: host: nc -l -p -u 8000 & tcpdump -n -i tap0 guest: dd if=/dev/zero of=payload.txt bs=1 count=2000 nc -u 192.16.1.1 8000 < payload.txt Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds, packets arriving fragmented: ./with_tap_pair.sh ./tap_send_ufo tap0 tap1 (from https://github.com/wdebruij/kerneltools/tree/master/tests) Changes v1 -> v2 - simplified set_offload change (review comment) - documented test procedure Link: http://lkml.kernel.org/r/<CAF=yD-LuUeDuL9YWPJD9ykOZ0QCjNeznPDr6whqZ9NGMNF12Mw@mail.gmail.com> Fixes: fb652fdf ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.") Reported-by: NMichal Kubecek <mkubecek@suse.cz> Signed-off-by: NWillem de Bruijn <willemb@google.com> Acked-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 11月, 2017 1 次提交
-
-
由 Kees Cook 提交于
This converts all remaining cases of the old setup_timer() API into using timer_setup(), where the callback argument is the structure already holding the struct timer_list. These should have no behavioral changes, since they just change which pointer is passed into the callback with the same available pointers after conversion. It handles the following examples, in addition to some other variations. Casting from unsigned long: void my_callback(unsigned long data) { struct something *ptr = (struct something *)data; ... } ... setup_timer(&ptr->my_timer, my_callback, ptr); and forced object casts: void my_callback(struct something *ptr) { ... } ... setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr); become: void my_callback(struct timer_list *t) { struct something *ptr = from_timer(ptr, t, my_timer); ... } ... timer_setup(&ptr->my_timer, my_callback, 0); Direct function assignments: void my_callback(unsigned long data) { struct something *ptr = (struct something *)data; ... } ... ptr->my_timer.function = my_callback; have a temporary cast added, along with converting the args: void my_callback(struct timer_list *t) { struct something *ptr = from_timer(ptr, t, my_timer); ... } ... ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback; And finally, callbacks without a data assignment: void my_callback(unsigned long data) { ... } ... setup_timer(&ptr->my_timer, my_callback, 0); have their argument renamed to verify they're unused during conversion: void my_callback(struct timer_list *unused) { ... } ... timer_setup(&ptr->my_timer, my_callback, 0); The conversion is done with the following Coccinelle script: spatch --very-quiet --all-includes --include-headers \ -I ./arch/x86/include -I ./arch/x86/include/generated \ -I ./include -I ./arch/x86/include/uapi \ -I ./arch/x86/include/generated/uapi -I ./include/uapi \ -I ./include/generated/uapi --include ./include/linux/kconfig.h \ --dir . \ --cocci-file ~/src/data/timer_setup.cocci @fix_address_of@ expression e; @@ setup_timer( -&(e) +&e , ...) // Update any raw setup_timer() usages that have a NULL callback, but // would otherwise match change_timer_function_usage, since the latter // will update all function assignments done in the face of a NULL // function initialization in setup_timer(). @change_timer_function_usage_NULL@ expression _E; identifier _timer; type _cast_data; @@ ( -setup_timer(&_E->_timer, NULL, _E); +timer_setup(&_E->_timer, NULL, 0); | -setup_timer(&_E->_timer, NULL, (_cast_data)_E); +timer_setup(&_E->_timer, NULL, 0); | -setup_timer(&_E._timer, NULL, &_E); +timer_setup(&_E._timer, NULL, 0); | -setup_timer(&_E._timer, NULL, (_cast_data)&_E); +timer_setup(&_E._timer, NULL, 0); ) @change_timer_function_usage@ expression _E; identifier _timer; struct timer_list _stl; identifier _callback; type _cast_func, _cast_data; @@ ( -setup_timer(&_E->_timer, _callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, &_callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, _callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, &_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)_callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)&_callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E._timer, _callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, _callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, &_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, &_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | _E->_timer@_stl.function = _callback; | _E->_timer@_stl.function = &_callback; | _E->_timer@_stl.function = (_cast_func)_callback; | _E->_timer@_stl.function = (_cast_func)&_callback; | _E._timer@_stl.function = _callback; | _E._timer@_stl.function = &_callback; | _E._timer@_stl.function = (_cast_func)_callback; | _E._timer@_stl.function = (_cast_func)&_callback; ) // callback(unsigned long arg) @change_callback_handle_cast depends on change_timer_function_usage@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _origtype; identifier _origarg; type _handletype; identifier _handle; @@ void _callback( -_origtype _origarg +struct timer_list *t ) { ( ... when != _origarg _handletype *_handle = -(_handletype *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg | ... when != _origarg _handletype *_handle = -(void *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg | ... when != _origarg _handletype *_handle; ... when != _handle _handle = -(_handletype *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg | ... when != _origarg _handletype *_handle; ... when != _handle _handle = -(void *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg ) } // callback(unsigned long arg) without existing variable @change_callback_handle_cast_no_arg depends on change_timer_function_usage && !change_callback_handle_cast@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _origtype; identifier _origarg; type _handletype; @@ void _callback( -_origtype _origarg +struct timer_list *t ) { + _handletype *_origarg = from_timer(_origarg, t, _timer); + ... when != _origarg - (_handletype *)_origarg + _origarg ... when != _origarg } // Avoid already converted callbacks. @match_callback_converted depends on change_timer_function_usage && !change_callback_handle_cast && !change_callback_handle_cast_no_arg@ identifier change_timer_function_usage._callback; identifier t; @@ void _callback(struct timer_list *t) { ... } // callback(struct something *handle) @change_callback_handle_arg depends on change_timer_function_usage && !match_callback_converted && !change_callback_handle_cast && !change_callback_handle_cast_no_arg@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _handletype; identifier _handle; @@ void _callback( -_handletype *_handle +struct timer_list *t ) { + _handletype *_handle = from_timer(_handle, t, _timer); ... } // If change_callback_handle_arg ran on an empty function, remove // the added handler. @unchange_callback_handle_arg depends on change_timer_function_usage && change_callback_handle_arg@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _handletype; identifier _handle; identifier t; @@ void _callback(struct timer_list *t) { - _handletype *_handle = from_timer(_handle, t, _timer); } // We only want to refactor the setup_timer() data argument if we've found // the matching callback. This undoes changes in change_timer_function_usage. @unchange_timer_function_usage depends on change_timer_function_usage && !change_callback_handle_cast && !change_callback_handle_cast_no_arg && !change_callback_handle_arg@ expression change_timer_function_usage._E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type change_timer_function_usage._cast_data; @@ ( -timer_setup(&_E->_timer, _callback, 0); +setup_timer(&_E->_timer, _callback, (_cast_data)_E); | -timer_setup(&_E._timer, _callback, 0); +setup_timer(&_E._timer, _callback, (_cast_data)&_E); ) // If we fixed a callback from a .function assignment, fix the // assignment cast now. @change_timer_function_assignment depends on change_timer_function_usage && (change_callback_handle_cast || change_callback_handle_cast_no_arg || change_callback_handle_arg)@ expression change_timer_function_usage._E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type _cast_func; typedef TIMER_FUNC_TYPE; @@ ( _E->_timer.function = -_callback +(TIMER_FUNC_TYPE)_callback ; | _E->_timer.function = -&_callback +(TIMER_FUNC_TYPE)_callback ; | _E->_timer.function = -(_cast_func)_callback; +(TIMER_FUNC_TYPE)_callback ; | _E->_timer.function = -(_cast_func)&_callback +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -_callback +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -&_callback; +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -(_cast_func)_callback +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -(_cast_func)&_callback +(TIMER_FUNC_TYPE)_callback ; ) // Sometimes timer functions are called directly. Replace matched args. @change_timer_function_calls depends on change_timer_function_usage && (change_callback_handle_cast || change_callback_handle_cast_no_arg || change_callback_handle_arg)@ expression _E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type _cast_data; @@ _callback( ( -(_cast_data)_E +&_E->_timer | -(_cast_data)&_E +&_E._timer | -_E +&_E->_timer ) ) // If a timer has been configured without a data argument, it can be // converted without regard to the callback argument, since it is unused. @match_timer_function_unused_data@ expression _E; identifier _timer; identifier _callback; @@ ( -setup_timer(&_E->_timer, _callback, 0); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, _callback, 0L); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, _callback, 0UL); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E._timer, _callback, 0); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, _callback, 0L); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, _callback, 0UL); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_timer, _callback, 0); +timer_setup(&_timer, _callback, 0); | -setup_timer(&_timer, _callback, 0L); +timer_setup(&_timer, _callback, 0); | -setup_timer(&_timer, _callback, 0UL); +timer_setup(&_timer, _callback, 0); | -setup_timer(_timer, _callback, 0); +timer_setup(_timer, _callback, 0); | -setup_timer(_timer, _callback, 0L); +timer_setup(_timer, _callback, 0); | -setup_timer(_timer, _callback, 0UL); +timer_setup(_timer, _callback, 0); ) @change_callback_unused_data depends on match_timer_function_unused_data@ identifier match_timer_function_unused_data._callback; type _origtype; identifier _origarg; @@ void _callback( -_origtype _origarg +struct timer_list *unused ) { ... when != _origarg } Signed-off-by: NKees Cook <keescook@chromium.org>
-
- 19 11月, 2017 1 次提交
-
-
由 Xin Long 提交于
rcu_read_lock in tun_build_skb is used to rcu_dereference tun->xdp_prog safely, rcu_read_unlock should be done in every return path. Now I could see one place missing it, where it returns NULL in switch-case XDP_REDIRECT, another palce using rcu_read_lock wrongly, where it returns NULL in if (xdp_xmit) chunk. So fix both in this patch. Fixes: 761876c8 ("tap: XDP support") Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 11月, 2017 1 次提交
-
-
由 Jakub Kicinski 提交于
ndo_xdp is a control path callback for setting up XDP in the driver. We can reuse it for other forms of communication between the eBPF stack and the drivers. Rename the callback and associated structures and definitions. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 11月, 2017 1 次提交
-
-
由 Craig Gallek 提交于
Syzkaller found several variants of the lockup below by setting negative values with the TUNSETSNDBUF ioctl. This patch adds a sanity check to both the tun and tap versions of this ioctl. watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [repro:2389] Modules linked in: irq event stamp: 329692056 hardirqs last enabled at (329692055): [<ffffffff824b8381>] _raw_spin_unlock_irqrestore+0x31/0x75 hardirqs last disabled at (329692056): [<ffffffff824b9e58>] apic_timer_interrupt+0x98/0xb0 softirqs last enabled at (35659740): [<ffffffff824bc958>] __do_softirq+0x328/0x48c softirqs last disabled at (35659731): [<ffffffff811c796c>] irq_exit+0xbc/0xd0 CPU: 0 PID: 2389 Comm: repro Not tainted 4.14.0-rc7 #23 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff880009452140 task.stack: ffff880006a20000 RIP: 0010:_raw_spin_lock_irqsave+0x11/0x80 RSP: 0018:ffff880006a27c50 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff10 RAX: ffff880009ac68d0 RBX: ffff880006a27ce0 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffff880006a27ce0 RDI: ffff880009ac6900 RBP: ffff880006a27c60 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 000000000063ff00 R12: ffff880009ac6900 R13: ffff880006a27cf8 R14: 0000000000000001 R15: ffff880006a27cf8 FS: 00007f4be4838700(0000) GS:ffff88000cc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020101000 CR3: 0000000009616000 CR4: 00000000000006f0 Call Trace: prepare_to_wait+0x26/0xc0 sock_alloc_send_pskb+0x14e/0x270 ? remove_wait_queue+0x60/0x60 tun_get_user+0x2cc/0x19d0 ? __tun_get+0x60/0x1b0 tun_chr_write_iter+0x57/0x86 __vfs_write+0x156/0x1e0 vfs_write+0xf7/0x230 SyS_write+0x57/0xd0 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x7f4be4356df9 RSP: 002b:00007ffc18101c08 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f4be4356df9 RDX: 0000000000000046 RSI: 0000000020101000 RDI: 0000000000000005 RBP: 00007ffc18101c40 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000293 R12: 0000559c75f64780 R13: 00007ffc18101d30 R14: 0000000000000000 R15: 0000000000000000 Fixes: 33dccbb0 ("tun: Limit amount of queued packets per device") Fixes: 20d29d7a ("net: macvtap driver") Signed-off-by: NCraig Gallek <kraig@google.com> Reviewed-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 10月, 2017 1 次提交
-
-
由 Jason Wang 提交于
An unaligned alloc_frag->offset caused by previous allocation will result an unaligned skb->head. This will lead unaligned skb_shared_info and then unaligned dataref which requires to be aligned for accessing on some architecture. Fix this by aligning alloc_frag->offset before the frag refilling. Fixes: 0bbd7dad ("tun: make tun_build_skb() thread safe") Cc: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Cc: Wei Wei <dotweiba@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Reported-by: NWei Wei <dotweiba@gmail.com> Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 10月, 2017 1 次提交
-
-
由 Julien Gomes 提交于
If the name argument of dev_get_valid_name() contains "%d", it will try to assign it a unit number in __dev__alloc_name() and return either the unit number (>= 0) or an error code (< 0). Considering positive values as error values prevent tun device creations relying this mechanism, therefor we should only consider negative values as errors here. Signed-off-by: NJulien Gomes <julien@arista.com> Acked-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 10月, 2017 1 次提交
-
-
由 Mark Rutland 提交于
locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE() Please do not apply this to mainline directly, instead please re-run the coccinelle script shown below and apply its output. For several reasons, it is desirable to use {READ,WRITE}_ONCE() in preference to ACCESS_ONCE(), and new code is expected to use one of the former. So far, there's been no reason to change most existing uses of ACCESS_ONCE(), as these aren't harmful, and changing them results in churn. However, for some features, the read/write distinction is critical to correct operation. To distinguish these cases, separate read/write accessors must be used. This patch migrates (most) remaining ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following coccinelle script: ---- // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and // WRITE_ONCE() // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch virtual patch @ depends on patch @ expression E1, E2; @@ - ACCESS_ONCE(E1) = E2 + WRITE_ONCE(E1, E2) @ depends on patch @ expression E; @@ - ACCESS_ONCE(E) + READ_ONCE(E) ---- Signed-off-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: davem@davemloft.net Cc: linux-arch@vger.kernel.org Cc: mpe@ellerman.id.au Cc: shuah@kernel.org Cc: snitzer@redhat.com Cc: thor.thayer@linux.intel.com Cc: tj@kernel.org Cc: viro@zeniv.linux.org.uk Cc: will.deacon@arm.com Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 22 10月, 2017 3 次提交
-
-
由 Eric Dumazet 提交于
Timer is properly armed on demand from tun_flow_update(), so there is no need to arm it at tun init. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
If tun_flow_cleanup() deleted all flows, no need to arm the timer again. It will be armed next time tun_flow_update() is called. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
tun_flow_cleanup() being a timer callback, it is already running in BH context. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 10月, 2017 1 次提交
-
-
由 Eric Dumazet 提交于
syzkaller got crashes at dismantle time [1] It is not correct to test (tun->flags & IFF_NAPI) in tun_napi_disable() and tun_napi_del() : Each tun_file can have different mode, depending on how they were created. Similarly I have changed tun_get_user() and tun_poll_controller() to use the new tfile->napi_enabled boolean. [ 154.331360] BUG: unable to handle kernel NULL pointer dereference at (null) [ 154.339220] IP: [<ffffffff9634cad6>] hrtimer_active+0x26/0x60 [ 154.344983] PGD 0 [ 154.347009] Oops: 0000 [#1] SMP [ 154.350680] gsmi: Log Shutdown Reason 0x03 [ 154.379572] task: ffff994719150dc0 ti: ffff99475c0ae000 task.ti: ffff99475c0ae000 [ 154.387043] RIP: 0010:[<ffffffff9634cad6>] [<ffffffff9634cad6>] hrtimer_active+0x26/0x60 [ 154.395232] RSP: 0018:ffff99475c0afce8 EFLAGS: 00010246 [ 154.400542] RAX: ffff994754850ac0 RBX: ffff994753e65408 RCX: ffff994753e65388 [ 154.407666] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff994753e65408 [ 154.414790] RBP: ffff99475c0afce8 R08: 0000000000000000 R09: 0000000000000000 [ 154.421921] R10: ffff99475f6f5910 R11: 0000000000000001 R12: 0000000000000000 [ 154.429044] R13: ffff99417deab668 R14: ffff99417deaa780 R15: ffff99475f45dde0 [ 154.436174] FS: 0000000000000000(0000) GS:ffff994767a00000(0000) knlGS:0000000000000000 [ 154.444249] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 154.449986] CR2: 0000000000000000 CR3: 00000005a8a0e000 CR4: 0000000000022670 [ 154.457110] Stack: [ 154.459120] ffff99475c0afd28 ffffffff9634d614 1000000000000000 0000000000000000 [ 154.466598] ffffe54240000000 ffff994753e65408 ffff994753e653a8 ffff99417deab668 [ 154.474067] ffff99475c0afd48 ffffffff9634d6fd ffff99474c2be678 ffff994753e65398 [ 154.481537] Call Trace: [ 154.483985] [<ffffffff9634d614>] hrtimer_try_to_cancel+0x24/0xf0 [ 154.490074] [<ffffffff9634d6fd>] hrtimer_cancel+0x1d/0x30 [ 154.495563] [<ffffffff96860b3c>] napi_disable+0x3c/0x70 [ 154.500875] [<ffffffff9678ae62>] __tun_detach+0xd2/0x360 [ 154.506272] [<ffffffff9678b117>] tun_chr_close+0x27/0x40 [ 154.511669] [<ffffffff9646ebe6>] __fput+0xd6/0x1e0 [ 154.516548] [<ffffffff9646ed3e>] ____fput+0xe/0x10 [ 154.521429] [<ffffffff963035a2>] task_work_run+0x72/0x90 [ 154.526827] [<ffffffff962e9407>] do_exit+0x317/0xb60 [ 154.531879] [<ffffffff962e9c8f>] do_group_exit+0x3f/0xa0 [ 154.537275] [<ffffffff962e9d07>] SyS_exit_group+0x17/0x20 [ 154.542769] [<ffffffff969784be>] entry_SYSCALL_64_fastpath+0x12/0x17 Fixes: 94317099 ("net-tun: enable NAPI for TUN/TAP driver") Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 10月, 2017 1 次提交
-
-
由 Eric Dumazet 提交于
syzkaller hit the WARN() in tun_get_user(), providing skb with payload in fragments only, and nothing in skb->head GRO layer is fine with this, so relax the check. Fixes: 90e33d45 ("tun: enable napi_gro_frags() for TUN/TAP driver") Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 10月, 2017 1 次提交
-
-
由 Cong Wang 提交于
register_netdevice() could fail early when we have an invalid dev name, in which case ->ndo_uninit() is not called. For tun device, this is a problem because a timer etc. are already initialized and it expects ->ndo_uninit() to clean them up. We could move these initializations into a ->ndo_init() so that register_netdevice() knows better, however this is still complicated due to the logic in tun_detach(). Therefore, I choose to just call dev_get_valid_name() before register_netdevice(), which is quicker and much easier to audit. And for this specific case, it is already enough. Fixes: 96442e42 ("tuntap: choose the txq based on rxq") Reported-by: NDmitry Alexeev <avekceeb@gmail.com> Cc: Jason Wang <jasowang@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 9月, 2017 1 次提交
-
-
由 Alexander Potapenko 提交于
KMSAN (https://github.com/google/kmsan) reported accessing uninitialized skb->data[0] in the case the skb is empty (i.e. skb->len is 0): ================================================ BUG: KMSAN: use of uninitialized memory in tun_get_user+0x19ba/0x3770 CPU: 0 PID: 3051 Comm: probe Not tainted 4.13.0+ #3140 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: ... __msan_warning_32+0x66/0xb0 mm/kmsan/kmsan_instr.c:477 tun_get_user+0x19ba/0x3770 drivers/net/tun.c:1301 tun_chr_write_iter+0x19f/0x300 drivers/net/tun.c:1365 call_write_iter ./include/linux/fs.h:1743 new_sync_write fs/read_write.c:457 __vfs_write+0x6c3/0x7f0 fs/read_write.c:470 vfs_write+0x3e4/0x770 fs/read_write.c:518 SYSC_write+0x12f/0x2b0 fs/read_write.c:565 SyS_write+0x55/0x80 fs/read_write.c:557 do_syscall_64+0x242/0x330 arch/x86/entry/common.c:284 entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:245 ... origin: ... kmsan_poison_shadow+0x6e/0xc0 mm/kmsan/kmsan.c:211 slab_alloc_node mm/slub.c:2732 __kmalloc_node_track_caller+0x351/0x370 mm/slub.c:4351 __kmalloc_reserve net/core/skbuff.c:138 __alloc_skb+0x26a/0x810 net/core/skbuff.c:231 alloc_skb ./include/linux/skbuff.h:903 alloc_skb_with_frags+0x1d7/0xc80 net/core/skbuff.c:4756 sock_alloc_send_pskb+0xabf/0xfe0 net/core/sock.c:2037 tun_alloc_skb drivers/net/tun.c:1144 tun_get_user+0x9a8/0x3770 drivers/net/tun.c:1274 tun_chr_write_iter+0x19f/0x300 drivers/net/tun.c:1365 call_write_iter ./include/linux/fs.h:1743 new_sync_write fs/read_write.c:457 __vfs_write+0x6c3/0x7f0 fs/read_write.c:470 vfs_write+0x3e4/0x770 fs/read_write.c:518 SYSC_write+0x12f/0x2b0 fs/read_write.c:565 SyS_write+0x55/0x80 fs/read_write.c:557 do_syscall_64+0x242/0x330 arch/x86/entry/common.c:284 return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:245 ================================================ Make sure tun_get_user() doesn't touch skb->data[0] unless there is actual data. C reproducer below: ========================== // autogenerated by syzkaller (http://github.com/google/syzkaller) #define _GNU_SOURCE #include <fcntl.h> #include <linux/if_tun.h> #include <netinet/ip.h> #include <net/if.h> #include <string.h> #include <sys/ioctl.h> int main() { int sock = socket(PF_INET, SOCK_STREAM, IPPROTO_IP); int tun_fd = open("/dev/net/tun", O_RDWR); struct ifreq req; memset(&req, 0, sizeof(struct ifreq)); strcpy((char*)&req.ifr_name, "gre0"); req.ifr_flags = IFF_UP | IFF_MULTICAST; ioctl(tun_fd, TUNSETIFF, &req); ioctl(sock, SIOCSIFFLAGS, "gre0"); write(tun_fd, "hi", 0); return 0; } ========================== Signed-off-by: NAlexander Potapenko <glider@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 9月, 2017 1 次提交
-
-
由 Daniel Borkmann 提交于
This work enables generic transfer of metadata from XDP into skb. The basic idea is that we can make use of the fact that the resulting skb must be linear and already comes with a larger headroom for supporting bpf_xdp_adjust_head(), which mangles xdp->data. Here, we base our work on a similar principle and introduce a small helper bpf_xdp_adjust_meta() for adjusting a new pointer called xdp->data_meta. Thus, the packet has a flexible and programmable room for meta data, followed by the actual packet data. struct xdp_buff is therefore laid out that we first point to data_hard_start, then data_meta directly prepended to data followed by data_end marking the end of packet. bpf_xdp_adjust_head() takes into account whether we have meta data already prepended and if so, memmove()s this along with the given offset provided there's enough room. xdp->data_meta is optional and programs are not required to use it. The rationale is that when we process the packet in XDP (e.g. as DoS filter), we can push further meta data along with it for the XDP_PASS case, and give the guarantee that a clsact ingress BPF program on the same device can pick this up for further post-processing. Since we work with skb there, we can also set skb->mark, skb->priority or other skb meta data out of BPF, thus having this scratch space generic and programmable allows for more flexibility than defining a direct 1:1 transfer of potentially new XDP members into skb (it's also more efficient as we don't need to initialize/handle each of such new members). The facility also works together with GRO aggregation. The scratch space at the head of the packet can be multiple of 4 byte up to 32 byte large. Drivers not yet supporting xdp->data_meta can simply be set up with xdp->data_meta as xdp->data + 1 as bpf_xdp_adjust_meta() will detect this and bail out, such that the subsequent match against xdp->data for later access is guaranteed to fail. The verifier treats xdp->data_meta/xdp->data the same way as we treat xdp->data/xdp->data_end pointer comparisons. The requirement for doing the compare against xdp->data is that it hasn't been modified from it's original address we got from ctx access. It may have a range marking already from prior successful xdp->data/xdp->data_end pointer comparisons though. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 9月, 2017 3 次提交
-
-
由 yuan linyu 提交于
it seems no need to keep tun_get() and __tun_get() at same time. Signed-off-by: Nyuan linyu <Linyu.Yuan@alcatel-sbell.com.cn> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Petar Penkov 提交于
Add a TUN/TAP receive mode that exercises the napi_gro_frags() interface. This mode is available only in TAP mode, as the interface expects packets with Ethernet headers. Furthermore, packets follow the layout of the iovec_iter that was received. The first iovec is the linear data, and every one after the first is a fragment. If there are more fragments than the max number, drop the packet. Additionally, invoke eth_get_headlen() to exercise flow dissector code and to verify that the header resides in the linear data. The napi_gro_frags() mode requires setting the IFF_NAPI_FRAGS option. This is imposed because this mode is intended for testing via tools like syzkaller and packetdrill, and the increased flexibility it provides can introduce security vulnerabilities. This flag is accepted only if the device is in TAP mode and has the IFF_NAPI flag set as well. This is done because both of these are explicit requirements for correct operation in this mode. Signed-off-by: NPetar Penkov <peterpenkov96@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: davem@davemloft.net Cc: ppenkov@stanford.edu Acked-by: NMahesh Bandewar <maheshb@google,com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Petar Penkov 提交于
Changes TUN driver to use napi_gro_receive() upon receiving packets rather than netif_rx_ni(). Adds flag IFF_NAPI that enables these changes and operation is not affected if the flag is disabled. SKBs are constructed upon packet arrival and are queued to be processed later. The new path was evaluated with a benchmark with the following setup: Open two tap devices and a receiver thread that reads in a loop for each device. Start one sender thread and pin all threads to different CPUs. Send 1M minimum UDP packets to each device and measure sending time for each of the sending methods: napi_gro_receive(): 4.90s netif_rx_ni(): 4.90s netif_receive_skb(): 7.20s Signed-off-by: NPetar Penkov <peterpenkov96@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: davem@davemloft.net Cc: ppenkov@stanford.edu Acked-by: NMahesh Bandewar <maheshb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 9月, 2017 2 次提交
-
-
由 Jason Wang 提交于
Rename "generic_xdp" to "skb_xdp" to avoid confusing it with the generic XDP which will be done at netif_receive_skb(). Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jason Wang 提交于
We reserve headroom unconditionally which could cause unnecessary stress on socket memory accounting because of increased trusesize. Fix this by only reserve extra headroom when XDP is set. Cc: Jakub Kicinski <kubakici@wp.pl> Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 8月, 2017 1 次提交
-
-
由 Eric Dumazet 提交于
syzkaller reported a double free [1], caused by the fact that tun driver was not updated properly when priv_destructor was added. When/if register_netdevice() fails, priv_destructor() must have been called already. [1] BUG: KASAN: double-free or invalid-free in selinux_tun_dev_free_security+0x15/0x20 security/selinux/hooks.c:5023 CPU: 0 PID: 2919 Comm: syzkaller227220 Not tainted 4.13.0-rc4+ #23 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 print_address_description+0x7f/0x260 mm/kasan/report.c:252 kasan_report_double_free+0x55/0x80 mm/kasan/report.c:333 kasan_slab_free+0xa0/0xc0 mm/kasan/kasan.c:514 __cache_free mm/slab.c:3503 [inline] kfree+0xd3/0x260 mm/slab.c:3820 selinux_tun_dev_free_security+0x15/0x20 security/selinux/hooks.c:5023 security_tun_dev_free_security+0x48/0x80 security/security.c:1512 tun_set_iff drivers/net/tun.c:1884 [inline] __tun_chr_ioctl+0x2ce6/0x3d50 drivers/net/tun.c:2064 tun_chr_ioctl+0x2a/0x40 drivers/net/tun.c:2309 vfs_ioctl fs/ioctl.c:45 [inline] do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685 SYSC_ioctl fs/ioctl.c:700 [inline] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x443ff9 RSP: 002b:00007ffc34271f68 EFLAGS: 00000217 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00000000004002e0 RCX: 0000000000443ff9 RDX: 0000000020533000 RSI: 00000000400454ca RDI: 0000000000000003 RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000217 R12: 0000000000401ce0 R13: 0000000000401d70 R14: 0000000000000000 R15: 0000000000000000 Allocated by task 2919: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_kmalloc+0xaa/0xd0 mm/kasan/kasan.c:551 kmem_cache_alloc_trace+0x101/0x6f0 mm/slab.c:3627 kmalloc include/linux/slab.h:493 [inline] kzalloc include/linux/slab.h:666 [inline] selinux_tun_dev_alloc_security+0x49/0x170 security/selinux/hooks.c:5012 security_tun_dev_alloc_security+0x6d/0xa0 security/security.c:1506 tun_set_iff drivers/net/tun.c:1839 [inline] __tun_chr_ioctl+0x1730/0x3d50 drivers/net/tun.c:2064 tun_chr_ioctl+0x2a/0x40 drivers/net/tun.c:2309 vfs_ioctl fs/ioctl.c:45 [inline] do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685 SYSC_ioctl fs/ioctl.c:700 [inline] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691 entry_SYSCALL_64_fastpath+0x1f/0xbe Freed by task 2919: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_slab_free+0x6e/0xc0 mm/kasan/kasan.c:524 __cache_free mm/slab.c:3503 [inline] kfree+0xd3/0x260 mm/slab.c:3820 selinux_tun_dev_free_security+0x15/0x20 security/selinux/hooks.c:5023 security_tun_dev_free_security+0x48/0x80 security/security.c:1512 tun_free_netdev+0x13b/0x1b0 drivers/net/tun.c:1563 register_netdevice+0x8d0/0xee0 net/core/dev.c:7605 tun_set_iff drivers/net/tun.c:1859 [inline] __tun_chr_ioctl+0x1caf/0x3d50 drivers/net/tun.c:2064 tun_chr_ioctl+0x2a/0x40 drivers/net/tun.c:2309 vfs_ioctl fs/ioctl.c:45 [inline] do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685 SYSC_ioctl fs/ioctl.c:700 [inline] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691 entry_SYSCALL_64_fastpath+0x1f/0xbe The buggy address belongs to the object at ffff8801d2843b40 which belongs to the cache kmalloc-32 of size 32 The buggy address is located 0 bytes inside of 32-byte region [ffff8801d2843b40, ffff8801d2843b60) The buggy address belongs to the page: page:ffffea000660cea8 count:1 mapcount:0 mapping:ffff8801d2843000 index:0xffff8801d2843fc1 flags: 0x200000000000100(slab) raw: 0200000000000100 ffff8801d2843000 ffff8801d2843fc1 000000010000003f raw: ffffea0006626a40 ffffea00066141a0 ffff8801dbc00100 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8801d2843a00: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc ffff8801d2843a80: 00 00 00 fc fc fc fc fc fb fb fb fb fc fc fc fc >ffff8801d2843b00: 00 00 00 00 fc fc fc fc fb fb fb fb fc fc fc fc ^ ffff8801d2843b80: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc ffff8801d2843c00: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc ================================================================== Fixes: cf124db5 ("net: Fix inconsistent teardown and release of private netdev state.") Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 8月, 2017 2 次提交
-
-
由 Eric Dumazet 提交于
tun_build_skb() is not thread safe since it uses per queue page frag, this will break things when multiple threads are sending through same queue. Switch to use per-thread generator (no lock involved). Fixes: 66ccbc9c ("tap: use build_skb() for small packet") Tested-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NJason Wang <jasowang@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 stephen hemminger 提交于
Although sizeof is an operator in C. The kernel coding style convention is to always use it like a function and add parenthesis. Signed-off-by: NStephen Hemminger <stephen@networkplumber.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 8月, 2017 2 次提交
-
-
由 Jason Wang 提交于
This patch tries to implement XDP for tun. The implementation was split into two parts: - fast path: small and no gso packet. We try to do XDP at page level before build_skb(). For XDP_TX, since creating/destroying queues were completely under control of userspace, it was implemented through generic XDP helper after skb has been built. This could be optimized in the future. - slow path: big or gso packet. We try to do it after skb was created through generic XDP helpers. Test were done through pktgen with small packets. xdp1 test shows ~41.1% improvement: Before: ~1.7Mpps After: ~2.3Mpps xdp_redirect to ixgbe shows ~60% improvement: Before: ~0.8Mpps After: ~1.38Mpps Suggested-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jason Wang 提交于
We use tun_alloc_skb() which calls sock_alloc_send_pskb() to allocate skb in the past. This socket based method is not suitable for high speed userspace like virtualization which usually: - ignore sk_sndbuf (INT_MAX) and expect to receive the packet as fast as possible - don't want to be block at sendmsg() To eliminate the above overheads, this patch tries to use build_skb() for small packet. We will do this only when the following conditions are all met: - TAP instead of TUN - sk_sndbuf is INT_MAX - caller don't want to be blocked - zerocopy is not used - packet size is smaller enough to use build_skb() Pktgen from guest to host shows ~11% improvement for rx pps of tap: Before: ~1.70Mpps After : ~1.88Mpps What's more important, this makes it possible to implement XDP for tap before creating skbs. Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 8月, 2017 1 次提交
-
-
由 Willem de Bruijn 提交于
Prepare the datapath for refcounted ubuf_info. Clone ubuf_info with skb_zerocopy_clone() wherever needed due to skb split, merge, resize or clone. Split skb_orphan_frags into two variants. The split, merge, .. paths support reference counted zerocopy buffers, so do not do a deep copy. Add skb_orphan_frags_rx for paths that may loop packets to receive sockets. That is not allowed, as it may cause unbounded latency. Deep copy all zerocopy copy buffers, ref-counted or not, in this path. The exact locations to modify were chosen by exhaustively searching through all code that might modify skb_frag references and/or the the SKBTX_DEV_ZEROCOPY tx_flags bit. The changes err on the safe side, in two ways. (1) legacy ubuf_info paths virtio and tap are not modified. They keep a 1:1 ubuf_info to sk_buff relationship. Calls to skb_orphan_frags still call skb_copy_ubufs and thus copy frags in this case. (2) not all copies deep in the stack are addressed yet. skb_shift, skb_split and skb_try_coalesce can be refined to avoid copying. These are not in the hot path and this patch is hairy enough as is, so that is left for future refinement. Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 7月, 2017 1 次提交
-
-
由 Tonghao Zhang 提交于
There is some codes of tun/tap module which did not check the return value of register_netdevice_notifier. Add the check now. Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 7月, 2017 1 次提交
-
-
由 David S. Miller 提交于
It is going away. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 6月, 2017 1 次提交
-
-
由 Matthias Schiffer 提交于
Add support for extended error reporting. Signed-off-by: NMatthias Schiffer <mschiffer@universe-factory.net> Acked-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-