提交 · 0c3e0e3bb623c3735b8c9ab8aa8332f944f83a9f · openeuler / Kernel

22 3月, 2019 1 次提交

tun: Add ioctl() TUNGETDEVNETNS cmd to allow obtaining real net ns of tun device · 0c3e0e3b

由 Kirill Tkhai 提交于 3月 20, 2019

In commit f2780d6d "tun: Add ioctl() SIOCGSKNS cmd to allow
obtaining net ns of tun device" it was missed that tun may change
its net ns, while net ns of socket remains the same as it was
created initially. SIOCGSKNS returns net ns of socket, so it is
not suitable for obtaining net ns of device.

We may have two tun devices with the same names in two net ns,
and in this case it's not possible to determ, which of them
fd refers to (TUNGETIFF will return the same name).

This patch adds new ioctl() cmd for obtaining net ns of a device.
Reported-by: NHarald Albrecht <harald.albrecht@gmx.net>
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c3e0e3b

21 3月, 2019 1 次提交

net: remove 'fallback' argument from dev->ndo_select_queue() · a350ecce

由 Paolo Abeni 提交于 3月 20, 2019

After the previous patch, all the callers of ndo_select_queue()
provide as a 'fallback' argument netdev_pick_tx.
The only exceptions are nested calls to ndo_select_queue(),
which pass down the 'fallback' available in the current scope
- still netdev_pick_tx.

We can drop such argument and replace fallback() invocation with
netdev_pick_tx(). This avoids an indirect call per xmit packet
in some scenarios (TCP syn, UDP unconnected, XDP generic, pktgen)
with device drivers implementing such ndo. It also clean the code
a bit.

Tested with ixgbe and CONFIG_FCOE=m

With pktgen using queue xmit:
threads		vanilla 	patched
		(kpps)		(kpps)
1		2334		2428
2		4166		4278
4		7895		8100

 v1 -> v2:
 - rebased after helper's name change
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a350ecce

17 3月, 2019 1 次提交

tun: add a missing rcu_read_unlock() in error path · 9180bb4f

由 Eric Dumazet 提交于 3月 16, 2019

In my latest patch I missed one rcu_read_unlock(), in case
device is down.

Fixes: 4477138f ("tun: properly test for IFF_UP")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9180bb4f

16 3月, 2019 1 次提交

tun: properly test for IFF_UP · 4477138f

由 Eric Dumazet 提交于 3月 14, 2019

Same reasons than the ones explained in commit 4179cb5a
("vxlan: test dev->flags & IFF_UP before calling netif_rx()")

netif_rx_ni() or napi_gro_frags() must be called under a strict contract.

At device dismantle phase, core networking clears IFF_UP
and flush_all_backlogs() is called after rcu grace period
to make sure no incoming packet might be in a cpu backlog
and still referencing the device.

A similar protocol is used for gro layer.

Most drivers call netif_rx() from their interrupt handler,
and since the interrupts are disabled at device dismantle,
netif_rx() does not have to check dev->flags & IFF_UP

Virtual drivers do not have this guarantee, and must
therefore make the check themselves.

Fixes: 1bd4978a ("tun: honor IFF_UP in tun_get_user()")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4477138f

26 2月, 2019 1 次提交

tun: remove unnecessary memory barrier · ecef67cb

由 Timur Celik 提交于 2月 25, 2019

Replace set_current_state with __set_current_state since no memory
barrier is needed at this point.
Signed-off-by: NTimur Celik <mail@timurcelik.de>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ecef67cb

25 2月, 2019 1 次提交

tun: fix blocking read · 71828b22

由 Timur Celik 提交于 2月 23, 2019

This patch moves setting of the current state into the loop. Otherwise
the task may end up in a busy wait loop if none of the break conditions
are met.
Signed-off-by: NTimur Celik <mail@timurcelik.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

71828b22

23 2月, 2019 1 次提交

net: Don't set transport offset to invalid value · d2aa125d

由 Maxim Mikityanskiy 提交于 2月 21, 2019

If the socket was created with socket(AF_PACKET, SOCK_RAW, 0),
skb->protocol will be unset, __skb_flow_dissect() will fail, and
skb_probe_transport_header() will fall back to the offset_hint, making
the resulting skb_transport_offset incorrect.

If, however, there is no transport header in the packet,
transport_header shouldn't be set to an arbitrary value.

Fix it by leaving the transport offset unset if it couldn't be found, to
be explicit rather than to fill it with some wrong value. It changes the
behavior, but if some code relied on the old behavior, it would be
broken anyway, as the old one is incorrect.
Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d2aa125d

31 1月, 2019 1 次提交

tun: move the call to tun_set_real_num_queues · 3a03cb84

由 George Amanakis 提交于 1月 29, 2019

Call tun_set_real_num_queues() after the increment of tun->numqueues
since the former depends on it. Otherwise, the number of queues is not
correctly accounted for, which results to warnings similar to:
"vnet0 selects TX queue 11, but real number of TX queues is 11".

Fixes: 0b7959b6 ("tun: publish tfile after it's fully initialized")
Reported-and-tested-by: NGeorge Amanakis <gamanakis@gmail.com>
Signed-off-by: NGeorge Amanakis <gamanakis@gmail.com>
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3a03cb84

10 1月, 2019 1 次提交

tun: publish tfile after it's fully initialized · 0b7959b6

由 Stanislav Fomichev 提交于 1月 07, 2019

BUG: unable to handle kernel NULL pointer dereference at 00000000000000d1
Call Trace:
 ? napi_gro_frags+0xa7/0x2c0
 tun_get_user+0xb50/0xf20
 tun_chr_write_iter+0x53/0x70
 new_sync_write+0xff/0x160
 vfs_write+0x191/0x1e0
 __x64_sys_write+0x5e/0xd0
 do_syscall_64+0x47/0xf0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

I think there is a subtle race between sending a packet via tap and
attaching it:

CPU0:                    CPU1:
tun_chr_ioctl(TUNSETIFF)
  tun_set_iff
    tun_attach
      rcu_assign_pointer(tfile->tun, tun);
                         tun_fops->write_iter()
                           tun_chr_write_iter
                             tun_napi_alloc_frags
                               napi_get_frags
                                 napi->skb = napi_alloc_skb
      tun_napi_init
        netif_napi_add
          napi->skb = NULL
                              napi->skb is NULL here
                              napi_gro_frags
                                napi_frags_skb
				  skb = napi->skb
				  skb_reset_mac_header(skb)
				  panic()

Move rcu_assign_pointer(tfile->tun) and rcu_assign_pointer(tun->tfiles) to
be the last thing we do in tun_attach(); this should guarantee that when we
call tun_get() we always get an initialized object.

v2 changes:
* remove extra napi_mutex locks/unlocks for napi operations
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Fixes: 90e33d45 ("tun: enable napi_gro_frags() for TUN/TAP driver")
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0b7959b6

15 12月, 2018 1 次提交

tun: replace get_cpu_ptr with this_cpu_ptr when bh disabled · 6342ca64

由 Prashant Bhole 提交于 12月 11, 2018

tun_xdp_one() runs with local bh disabled. So there is no need to
disable preemption by calling get_cpu_ptr while updating stats. This
patch replaces the use of get_cpu_ptr() with this_cpu_ptr() as a
micro-optimization. Also removes related put_cpu_ptr call.
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NPrashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6342ca64

14 12月, 2018 1 次提交

net: dev: Add extack argument to dev_set_mac_address() · 3a37a963

由 Petr Machata 提交于 12月 13, 2018

A follow-up patch will add a notifier type NETDEV_PRE_CHANGEADDR, which
allows vetoing of MAC address changes. One prominent path to that
notification is through dev_set_mac_address(). Therefore give this
function an extack argument, so that it can be packed together with the
notification. Thus a textual reason for rejection (or a warning) can be
communicated back to the user.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3a37a963

07 12月, 2018 2 次提交

tun: remove unnecessary check in tun_flow_update · 5c327f67

由 Li RongQing 提交于 12月 06, 2018

caller has guaranted that rxhash is not zero
Signed-off-by: NLi RongQing <lirongqing@baidu.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c327f67

tun: align write-heavy flow entry members to a cache line · 83b1bc12

由 Li RongQing 提交于 12月 06, 2018

tun flow entry 'updated' fields are written when receive
every packet. Thus if a flow is receiving packets from a
particular flow entry, it'll cause false-sharing with
all the other who has looked it up, so move it in its own
cache line

and update 'queue_index' and 'update' field only when
they are changed to reduce the cache false-sharing.
Signed-off-by: NZhang Yu <zhangyu31@baidu.com>
Signed-off-by: NWang Li <wangli39@baidu.com>
Signed-off-by: NLi RongQing <lirongqing@baidu.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

83b1bc12

04 12月, 2018 1 次提交

tun: remove skb access after netif_receive_skb · 4e4b08e5

由 Prashant Bhole 提交于 12月 03, 2018

In tun.c skb->len was accessed while doing stats accounting after a
call to netif_receive_skb. We can not access skb after this call
because buffers may be dropped.

The fix for this bug would be to store skb->len in local variable and
then use it after netif_receive_skb(). IMO using xdp data size for
accounting bytes will be better because input for tun_xdp_one() is
xdp_buff.

Hence this patch:
- fixes a bug by removing skb access after netif_receive_skb()
- uses xdp data size for accounting bytes

[613.019057] BUG: KASAN: use-after-free in tun_sendmsg+0x77c/0xc50 [tun]
[613.021062] Read of size 4 at addr ffff8881da9ab7c0 by task vhost-1115/1155
[613.023073]
[613.024003] CPU: 0 PID: 1155 Comm: vhost-1115 Not tainted 4.20.0-rc3-vm+ #232
[613.026029] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[613.029116] Call Trace:
[613.031145]  dump_stack+0x5b/0x90
[613.032219]  print_address_description+0x6c/0x23c
[613.034156]  ? tun_sendmsg+0x77c/0xc50 [tun]
[613.036141]  kasan_report.cold.5+0x241/0x308
[613.038125]  tun_sendmsg+0x77c/0xc50 [tun]
[613.040109]  ? tun_get_user+0x1960/0x1960 [tun]
[613.042094]  ? __isolate_free_page+0x270/0x270
[613.045173]  vhost_tx_batch.isra.14+0xeb/0x1f0 [vhost_net]
[613.047127]  ? peek_head_len.part.13+0x90/0x90 [vhost_net]
[613.049096]  ? get_tx_bufs+0x5a/0x2c0 [vhost_net]
[613.051106]  ? vhost_enable_notify+0x2d8/0x420 [vhost]
[613.053139]  handle_tx_copy+0x2d0/0x8f0 [vhost_net]
[613.053139]  ? vhost_net_buf_peek+0x340/0x340 [vhost_net]
[613.053139]  ? __mutex_lock+0x8d9/0xb30
[613.053139]  ? finish_task_switch+0x8f/0x3f0
[613.053139]  ? handle_tx+0x32/0x120 [vhost_net]
[613.053139]  ? mutex_trylock+0x110/0x110
[613.053139]  ? finish_task_switch+0xcf/0x3f0
[613.053139]  ? finish_task_switch+0x240/0x3f0
[613.053139]  ? __switch_to_asm+0x34/0x70
[613.053139]  ? __switch_to_asm+0x40/0x70
[613.053139]  ? __schedule+0x506/0xf10
[613.053139]  handle_tx+0xc7/0x120 [vhost_net]
[613.053139]  vhost_worker+0x166/0x200 [vhost]
[613.053139]  ? vhost_dev_init+0x580/0x580 [vhost]
[613.053139]  ? __kthread_parkme+0x77/0x90
[613.053139]  ? vhost_dev_init+0x580/0x580 [vhost]
[613.053139]  kthread+0x1b1/0x1d0
[613.053139]  ? kthread_park+0xb0/0xb0
[613.053139]  ret_from_fork+0x35/0x40
[613.088705]
[613.088705] Allocated by task 1155:
[613.088705]  kasan_kmalloc+0xbf/0xe0
[613.088705]  kmem_cache_alloc+0xdc/0x220
[613.088705]  __build_skb+0x2a/0x160
[613.088705]  build_skb+0x14/0xc0
[613.088705]  tun_sendmsg+0x4f0/0xc50 [tun]
[613.088705]  vhost_tx_batch.isra.14+0xeb/0x1f0 [vhost_net]
[613.088705]  handle_tx_copy+0x2d0/0x8f0 [vhost_net]
[613.088705]  handle_tx+0xc7/0x120 [vhost_net]
[613.088705]  vhost_worker+0x166/0x200 [vhost]
[613.088705]  kthread+0x1b1/0x1d0
[613.088705]  ret_from_fork+0x35/0x40
[613.088705]
[613.088705] Freed by task 1155:
[613.088705]  __kasan_slab_free+0x12e/0x180
[613.088705]  kmem_cache_free+0xa0/0x230
[613.088705]  ip6_mc_input+0x40f/0x5a0
[613.088705]  ipv6_rcv+0xc9/0x1e0
[613.088705]  __netif_receive_skb_one_core+0xc1/0x100
[613.088705]  netif_receive_skb_internal+0xc4/0x270
[613.088705]  br_pass_frame_up+0x2b9/0x2e0
[613.088705]  br_handle_frame_finish+0x2fb/0x7a0
[613.088705]  br_handle_frame+0x30f/0x6c0
[613.088705]  __netif_receive_skb_core+0x61a/0x15b0
[613.088705]  __netif_receive_skb_one_core+0x8e/0x100
[613.088705]  netif_receive_skb_internal+0xc4/0x270
[613.088705]  tun_sendmsg+0x738/0xc50 [tun]
[613.088705]  vhost_tx_batch.isra.14+0xeb/0x1f0 [vhost_net]
[613.088705]  handle_tx_copy+0x2d0/0x8f0 [vhost_net]
[613.088705]  handle_tx+0xc7/0x120 [vhost_net]
[613.088705]  vhost_worker+0x166/0x200 [vhost]
[613.088705]  kthread+0x1b1/0x1d0
[613.088705]  ret_from_fork+0x35/0x40
[613.088705]
[613.088705] The buggy address belongs to the object at ffff8881da9ab740
[613.088705]  which belongs to the cache skbuff_head_cache of size 232

Fixes: 043d222f ("tuntap: accept an array of XDP buffs through sendmsg()")
Reviewed-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: NPrashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: NJason Wang <jasowang@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e4b08e5

01 12月, 2018 2 次提交

tun: forbid iface creation with rtnl ops · 35b827b6

由 Nicolas Dichtel 提交于 11月 29, 2018

It's not supported right now (the goal of the initial patch was to support
'ip link del' only).

Before the patch:
$ ip link add foo type tun
[  239.632660] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[snip]
[  239.636410] RIP: 0010:register_netdevice+0x8e/0x3a0

This panic occurs because dev->netdev_ops is not set by tun_setup(). But to
have something usable, it will require more than just setting
netdev_ops.

Fixes: f019a7a5 ("tun: Implement ip link del tunXXX")
CC: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35b827b6

tun: implement carrier change · 26d31925

由 Nicolas Dichtel 提交于 11月 28, 2018

The userspace may need to control the carrier state.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDidier Pallard <didier.pallard@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

26d31925

19 11月, 2018 2 次提交

tuntap: fix multiqueue rx · 8ebebcba

由 Matthew Cover 提交于 11月 18, 2018

When writing packets to a descriptor associated with a combined queue, the
packets should end up on that queue.

Before this change all packets written to any descriptor associated with a
tap interface end up on rx-0, even when the descriptor is associated with a
different queue.

The rx traffic can be generated by either of the following.
  1. a simple tap program which spins up multiple queues and writes packets
     to each of the file descriptors
  2. tx from a qemu vm with a tap multiqueue netdev

The queue for rx traffic can be observed by either of the following (done
on the hypervisor in the qemu case).
  1. a simple netmap program which opens and reads from per-queue
     descriptors
  2. configuring RPS and doing per-cpu captures with rxtxcpu

Alternatively, if you printk() the return value of skb_get_rx_queue() just
before each instance of netif_receive_skb() in tun.c, you will get 65535
for every skb.

Calling skb_record_rx_queue() to set the rx queue to the queue_index fixes
the association between descriptor and rx queue.
Signed-off-by: NMatthew Cover <matthew.cover@stackpath.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8ebebcba

tun: use netdev_alloc_frag() in tun_napi_alloc_frags() · aa6daaca

由 Eric Dumazet 提交于 11月 18, 2018

In order to cook skbs in the same way than Ethernet drivers,
it is probably better to not use GFP_KERNEL, but rather
use the GFP_ATOMIC and PFMEMALLOC mechanisms provided by
netdev_alloc_frag().

This would allow to use tun driver even in memory stress
situations, especially if swap is used over this tun channel.

Fixes: 90e33d45 ("tun: enable napi_gro_frags() for TUN/TAP driver")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Petar Penkov <peterpenkov96@gmail.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa6daaca

18 11月, 2018 2 次提交

tun: Adjust on-stack tun_page initialization. · 6f0271d9

由 David S. Miller 提交于 11月 17, 2018

Instead of constantly playing with the struct initializer
syntax trying to make gcc and CLang both happy, just clear
it out using memset().

>> drivers/net/tun.c:2503:42: warning: Using plain integer as NULL pointer
Reported-by: Nkbuild test robot <lkp@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f0271d9

tuntap: free XDP dropped packets in a batch · f9e06c45

由 Jason Wang 提交于 11月 15, 2018

Thanks to the batched XDP buffs through msg_control. Instead of
calling put_page() for each page which involves a atomic operation,
let's batch them by record the last page that needs to be freed and
its refcnt count and free them in a batch.

Testpmd(virtio-user + vhost_net) + XDP_DROP shows 3.8% improvement.

Before: 4.71Mpps
After : 4.89Mpps
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f9e06c45

08 11月, 2018 1 次提交

tun: compute the RFS hash only if needed. · f29eb2a9

由 Paolo Abeni 提交于 11月 07, 2018

The tun XDP sendmsg code path, unconditionally computes the symmetric
hash of each packet for RFS's sake, even when we could skip it. e.g.
when the device has a single queue.

This change adds the check already in-place for the skb sendmsg path
to avoid unneeded hashing.

The above gives small, but measurable, performance gain for VM xmit
path when zerocopy is not enabled.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f29eb2a9

16 10月, 2018 1 次提交

tun: Consistently configure generic netdev params via rtnetlink · df52eab2

由 Serhey Popovych 提交于 10月 09, 2018

Configuring generic network device parameters on tun will fail in
presence of IFLA_INFO_KIND attribute in IFLA_LINKINFO nested attribute
since tun_validate() always return failure.

This can be visualized with following ip-link(8) command sequences:

  # ip link set dev tun0 group 100
  # ip link set dev tun0 group 100 type tun
  RTNETLINK answers: Invalid argument

with contrast to dummy and veth drivers:

  # ip link set dev dummy0 group 100
  # ip link set dev dummy0 type dummy

  # ip link set dev veth0 group 100
  # ip link set dev veth0 group 100 type veth

Fix by returning zero in tun_validate() when @data is NULL that is
always in case since rtnl_link_ops->maxtype is zero in tun driver.

Fixes: f019a7a5 ("tun: Implement ip link del tunXXX")
Signed-off-by: NSerhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df52eab2

11 10月, 2018 1 次提交

net: tun: remove useless codes of tun_automq_select_queue · 4b035271

由 Wang Li 提交于 10月 09, 2018

Because the function __skb_get_hash_symmetric always returns non-zero.
Signed-off-by: NZhang Yu <zhangyu31@baidu.com>
Signed-off-by: NWang Li <wangli39@baidu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b035271

02 10月, 2018 3 次提交

tun: napi flags belong to tfile · af3fb24e

由 Eric Dumazet 提交于 9月 28, 2018

Since tun->flags might be shared by multiple tfile structures,
it is better to make sure tun_get_user() is using the flags
for the current tfile.

Presence of the READ_ONCE() in tun_napi_frags_enabled() gave a hint
of what could happen, but we need something stronger to please
syzbot.

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 13647 Comm: syz-executor5 Not tainted 4.19.0-rc5+ #59
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:dev_gro_receive+0x132/0x2720 net/core/dev.c:5427
Code: 48 c1 ea 03 80 3c 02 00 0f 85 6e 20 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 6e 10 49 8d bd d0 00 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 59 20 00 00 4d 8b a5 d0 00 00 00 31 ff 41 81 e4
RSP: 0018:ffff8801c400f410 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff8618d325
RDX: 000000000000001a RSI: ffffffff86189f97 RDI: 00000000000000d0
RBP: ffff8801c400f608 R08: ffff8801c8fb4300 R09: 0000000000000000
R10: ffffed0038801ed7 R11: 0000000000000003 R12: ffff8801d327d358
R13: 0000000000000000 R14: ffff8801c16dd8c0 R15: 0000000000000004
FS:  00007fe003615700(0000) GS:ffff8801dac00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe1f3c43db8 CR3: 00000001bebb2000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 napi_gro_frags+0x3f4/0xc90 net/core/dev.c:5715
 tun_get_user+0x31d5/0x42a0 drivers/net/tun.c:1922
 tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:1967
 call_write_iter include/linux/fs.h:1808 [inline]
 new_sync_write fs/read_write.c:474 [inline]
 __vfs_write+0x6b8/0x9f0 fs/read_write.c:487
 vfs_write+0x1fc/0x560 fs/read_write.c:549
 ksys_write+0x101/0x260 fs/read_write.c:598
 __do_sys_write fs/read_write.c:610 [inline]
 __se_sys_write fs/read_write.c:607 [inline]
 __x64_sys_write+0x73/0xb0 fs/read_write.c:607
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457579
Code: 1d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fe003614c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457579
RDX: 0000000000000012 RSI: 0000000020000000 RDI: 000000000000000a
RBP: 000000000072c040 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe0036156d4
R13: 00000000004c5574 R14: 00000000004d8e98 R15: 00000000ffffffff
Modules linked in:

RIP: 0010:dev_gro_receive+0x132/0x2720 net/core/dev.c:5427
Code: 48 c1 ea 03 80 3c 02 00 0f 85 6e 20 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 6e 10 49 8d bd d0 00 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 59 20 00 00 4d 8b a5 d0 00 00 00 31 ff 41 81 e4
RSP: 0018:ffff8801c400f410 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff8618d325
RDX: 000000000000001a RSI: ffffffff86189f97 RDI: 00000000000000d0
RBP: ffff8801c400f608 R08: ffff8801c8fb4300 R09: 0000000000000000
R10: ffffed0038801ed7 R11: 0000000000000003 R12: ffff8801d327d358
R13: 0000000000000000 R14: ffff8801c16dd8c0 R15: 0000000000000004
FS:  00007fe003615700(0000) GS:ffff8801dac00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe1f3c43db8 CR3: 00000001bebb2000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Fixes: 90e33d45 ("tun: enable napi_gro_frags() for TUN/TAP driver")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

af3fb24e

tun: initialize napi_mutex unconditionally · c7256f57

由 Eric Dumazet 提交于 9月 28, 2018

This is the first part to fix following syzbot report :

console output: https://syzkaller.appspot.com/x/log.txt?x=145378e6400000
kernel config: https://syzkaller.appspot.com/x/.config?x=443816db871edd66
dashboard link: https://syzkaller.appspot.com/bug?extid=e662df0ac1d753b57e80

Following patch is fixing the race condition, but it seems safer
to initialize this mutex at tfile creation anyway.

Fixes: 90e33d45 ("tun: enable napi_gro_frags() for TUN/TAP driver")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: syzbot+e662df0ac1d753b57e80@syzkaller.appspotmail.com
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c7256f57

tun: remove unused parameters · 06e55add

由 Eric Dumazet 提交于 9月 28, 2018

tun_napi_disable() and tun_napi_del() do not need
a pointer to the tun_struct
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

06e55add

24 9月, 2018 1 次提交

tun: remove ndo_poll_controller · 765cdc20

由 Eric Dumazet 提交于 9月 21, 2018

As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.

tun uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

765cdc20

14 9月, 2018 9 次提交

tuntap: accept an array of XDP buffs through sendmsg() · 043d222f

由 Jason Wang 提交于 9月 12, 2018

This patch implement TUN_MSG_PTR msg_control type. This type allows
the caller to pass an array of XDP buffs to tuntap through ptr field
of the tun_msg_control. If an XDP program is attached, tuntap can run
XDP program directly. If not, tuntap will build skb and do a fast
receiving since part of the work has been done by vhost_net.

This will avoid lots of indirect calls thus improves the icache
utilization and allows to do XDP batched flushing when doing XDP
redirection.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

043d222f

tun: switch to new type of msg_control · fe8dd45b

由 Jason Wang 提交于 9月 12, 2018

This patch introduces to a new tun/tap specific msg_control:

#define TUN_MSG_UBUF 1
#define TUN_MSG_PTR  2
struct tun_msg_ctl {
       int type;
       void *ptr;
};

This allows us to pass different kinds of msg_control through
sendmsg(). The first supported type is ubuf (TUN_MSG_UBUF) which will
be used by the existed vhost_net zerocopy code. The second is XDP
buff, which allows vhost_net to pass XDP buff to TUN. This could be
used to implement accepting an array of XDP buffs from vhost_net in
the following patches.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fe8dd45b

tuntap: move XDP flushing out of tun_do_xdp() · 1a097910

由 Jason Wang 提交于 9月 12, 2018

This will allow adding batch flushing on top.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1a097910

tuntap: split out XDP logic · 8ae1aff0

由 Jason Wang 提交于 9月 12, 2018

This patch split out XDP logic into a single function. This make it to
be reused by XDP batching path in the following patch.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8ae1aff0

tuntap: tweak on the path of skb XDP case in tun_build_skb() · ac1f1f6c

由 Jason Wang 提交于 9月 12, 2018

If we're sure not to go native XDP, there's no need for several things
like bh and rcu stuffs. So this patch introduces a helper to build skb
and hold page refcnt. When we found we will go through skb path, build
skb directly.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac1f1f6c

tuntap: simplify error handling in tun_build_skb() · f7053b6c

由 Jason Wang 提交于 9月 12, 2018

There's no need to duplicate page get logic in each action. So this
patch tries to get page and calculate the offset before processing XDP
actions (except for XDP_DROP), and undo them when meet errors (we
don't care the performance on errors). This will be used for factoring
out XDP logic.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f7053b6c

tuntap: enable bh early during processing XDP · 291aeb2b

由 Jason Wang 提交于 9月 12, 2018

This patch move the bh enabling a little bit earlier, this will be
used for factoring out the core XDP logic of tuntap.
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

291aeb2b

tuntap: switch to use XDP_PACKET_HEADROOM · 4f23aff8

由 Jason Wang 提交于 9月 12, 2018

Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4f23aff8

net: sock: introduce SOCK_XDP · e4a2a304

由 Jason Wang 提交于 9月 12, 2018

This patch introduces a new sock flag - SOCK_XDP. This will be used
for notifying the upper layer that XDP program is attached on the
lower socket, and requires for extra headroom.

TUN will be the first user.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e4a2a304

05 8月, 2018 1 次提交

tun: not use hardcoded mask value · f13b5468

由 Li RongQing 提交于 8月 03, 2018

0x3ff in tun_hashfn is mask of TUN_NUM_FLOW_ENTRIES, instead
of hardcode, define a macro to setup the relationship with
TUN_NUM_FLOW_ENTRIES
Signed-off-by: NLi RongQing <lirongqing@baidu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f13b5468

21 7月, 2018 1 次提交

signal: Use PIDTYPE_TGID to clearly store where file signals will be sent · 01919134

由 Eric W. Biederman 提交于 7月 16, 2017

When f_setown is called a pid and a pid type are stored.  Replace the use
of PIDTYPE_PID with PIDTYPE_TGID as PIDTYPE_TGID goes to the entire thread
group.  Replace the use of PIDTYPE_MAX with PIDTYPE_PID as PIDTYPE_PID now
is only for a thread.

Update the users of __f_setown to use PIDTYPE_TGID instead of
PIDTYPE_PID.

For now the code continues to capture task_pid (when task_tgid would
really be appropriate), and iterate on PIDTYPE_PID (even when type ==
PIDTYPE_TGID) out of an abundance of caution to preserve existing
behavior.

Oleg Nesterov suggested using the test to ensure we use PIDTYPE_PID
for tgid lookup also be used to avoid taking the tasklist lock.
Suggested-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

01919134

17 7月, 2018 1 次提交

tun: Fix use-after-free on XDP_TX · 6e8cfd6d

由 Toshiaki Makita 提交于 7月 13, 2018

On XDP_TX we need to free up the frame only when tun_xdp_tx() returns a
negative value. A positive value indicates that the packet is
successfully enqueued to the ptr_ring, so freeing the page causes
use-after-free.

Fixes: 735fc405 ("xdp: change ndo_xdp_xmit API to support bulking")
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: NJason Wang <jasowang@redhat.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e8cfd6d

14 7月, 2018 1 次提交

xdp: don't make drivers report attachment mode · 6b867589

由 Jakub Kicinski 提交于 7月 11, 2018

prog_attached of struct netdev_bpf should have been superseded
by simply setting prog_id long time ago, but we kept it around
to allow offloading drivers to communicate attachment mode (drv
vs hw).  Subsequently drivers were also allowed to report back
attachment flags (prog_flags), and since nowadays only programs
attached will XDP_FLAGS_HW_MODE can get offloaded, we can tell
the attachment mode from the flags driver reports.  Remove
prog_attached member.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

6b867589

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功