1. 08 6月, 2018 1 次提交
    • W
      net: in virtio_net_hdr only add VLAN_HLEN to csum_start if payload holds vlan · fd3a8862
      Willem de Bruijn 提交于
      Tun, tap, virtio, packet and uml vector all use struct virtio_net_hdr
      to communicate packet metadata to userspace.
      
      For skbuffs with vlan, the first two return the packet as it may have
      existed on the wire, inserting the VLAN tag in the user buffer.  Then
      virtio_net_hdr.csum_start needs to be adjusted by VLAN_HLEN bytes.
      
      Commit f09e2249 ("macvtap: restore vlan header on user read")
      added this feature to macvtap. Commit 3ce9b20f ("macvtap: Fix
      csum_start when VLAN tags are present") then fixed up csum_start.
      
      Virtio, packet and uml do not insert the vlan header in the user
      buffer.
      
      When introducing virtio_net_hdr_from_skb to deduplicate filling in
      the virtio_net_hdr, the variant from macvtap which adds VLAN_HLEN was
      applied uniformly, breaking csum offset for packets with vlan on
      virtio and packet.
      
      Make insertion of VLAN_HLEN optional. Convert the callers to pass it
      when needed.
      
      Fixes: e858fae2 ("virtio_net: use common code for virtio_net_hdr and skb GSO conversion")
      Fixes: 1276f24e ("packet: use common code for virtio_net_hdr and skb GSO conversion")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd3a8862
  2. 05 6月, 2018 2 次提交
  3. 03 6月, 2018 2 次提交
  4. 29 5月, 2018 1 次提交
    • T
      tun: Fix NULL pointer dereference in XDP redirect · 6547e387
      Toshiaki Makita 提交于
      Calling XDP redirection requires bh disabled. Softirq can call another
      XDP function and redirection functions, then the percpu static variable
      ri->map can be overwritten to NULL.
      
      This is a generic XDP case called from tun.
      
      [ 3535.736058] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
      [ 3535.743974] PGD 0 P4D 0
      [ 3535.746530] Oops: 0000 [#1] SMP PTI
      [ 3535.750049] Modules linked in: vhost_net vhost tap tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat ext4 mbcache jbd2 intel_rapl skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ses aesni_intel crypto_simd cryptd enclosure hpwdt hpilo glue_helper ipmi_si pcspkr wmi mei_me ioatdma mei ipmi_devintf shpchp dca ipmi_msghandler lpc_ich acpi_power_meter sch_fq_codel ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm smartpqi i40e crc32c_intel scsi_transport_sas tg3 i2c_core ptp pps_core
      [ 3535.813456] CPU: 5 PID: 1630 Comm: vhost-1614 Not tainted 4.17.0-rc4 #2
      [ 3535.820127] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 11/14/2017
      [ 3535.828732] RIP: 0010:__xdp_map_lookup_elem+0x5/0x30
      [ 3535.833740] RSP: 0018:ffffb4bc47bf7c58 EFLAGS: 00010246
      [ 3535.839009] RAX: ffff9fdfcfea1c40 RBX: 0000000000000000 RCX: ffff9fdf27fe3100
      [ 3535.846205] RDX: ffff9fdfca769200 RSI: 0000000000000000 RDI: 0000000000000000
      [ 3535.853402] RBP: ffffb4bc491d9000 R08: 00000000000045ad R09: 0000000000000ec0
      [ 3535.860597] R10: 0000000000000001 R11: ffff9fdf26c3ce4e R12: ffff9fdf9e72c000
      [ 3535.867794] R13: 0000000000000000 R14: fffffffffffffff2 R15: ffff9fdfc82cdd00
      [ 3535.874990] FS:  0000000000000000(0000) GS:ffff9fdfcfe80000(0000) knlGS:0000000000000000
      [ 3535.883152] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 3535.888948] CR2: 0000000000000018 CR3: 0000000bde724004 CR4: 00000000007626e0
      [ 3535.896145] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 3535.903342] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 3535.910538] PKRU: 55555554
      [ 3535.913267] Call Trace:
      [ 3535.915736]  xdp_do_generic_redirect+0x7a/0x310
      [ 3535.920310]  do_xdp_generic.part.117+0x285/0x370
      [ 3535.924970]  tun_get_user+0x5b9/0x1260 [tun]
      [ 3535.929279]  tun_sendmsg+0x52/0x70 [tun]
      [ 3535.933237]  handle_tx+0x2ad/0x5f0 [vhost_net]
      [ 3535.937721]  vhost_worker+0xa5/0x100 [vhost]
      [ 3535.942030]  kthread+0xf5/0x130
      [ 3535.945198]  ? vhost_dev_ioctl+0x3b0/0x3b0 [vhost]
      [ 3535.950031]  ? kthread_bind+0x10/0x10
      [ 3535.953727]  ret_from_fork+0x35/0x40
      [ 3535.957334] Code: 0e 74 15 83 f8 10 75 05 e9 49 aa b3 ff f3 c3 0f 1f 80 00 00 00 00 f3 c3 e9 29 9d b3 ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 <8b> 47 18 83 f8 0e 74 0d 83 f8 10 75 05 e9 49 a9 b3 ff 31 c0 c3
      [ 3535.976387] RIP: __xdp_map_lookup_elem+0x5/0x30 RSP: ffffb4bc47bf7c58
      [ 3535.982883] CR2: 0000000000000018
      [ 3535.987096] ---[ end trace 383b299dd1430240 ]---
      [ 3536.131325] Kernel panic - not syncing: Fatal exception
      [ 3536.137484] Kernel Offset: 0x26a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      [ 3536.281406] ---[ end Kernel panic - not syncing: Fatal exception ]---
      
      And a kernel with generic case fixed still panics in tun driver XDP
      redirect, because it disabled only preemption, but not bh.
      
      [ 2055.128746] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
      [ 2055.136662] PGD 0 P4D 0
      [ 2055.139219] Oops: 0000 [#1] SMP PTI
      [ 2055.142736] Modules linked in: vhost_net vhost tap tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat ext4 mbcache jbd2 intel_rapl skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ses aesni_intel ipmi_ssif crypto_simd enclosure cryptd hpwdt glue_helper ioatdma hpilo wmi dca pcspkr ipmi_si acpi_power_meter ipmi_devintf shpchp mei_me ipmi_msghandler mei lpc_ich sch_fq_codel ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm i40e smartpqi tg3 scsi_transport_sas crc32c_intel i2c_core ptp pps_core
      [ 2055.206142] CPU: 6 PID: 1693 Comm: vhost-1683 Tainted: G        W         4.17.0-rc5-fix-tun+ #1
      [ 2055.215011] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 11/14/2017
      [ 2055.223617] RIP: 0010:__xdp_map_lookup_elem+0x5/0x30
      [ 2055.228624] RSP: 0018:ffff998b07607cc0 EFLAGS: 00010246
      [ 2055.233892] RAX: ffff8dbd8e235700 RBX: ffff8dbd8ff21c40 RCX: 0000000000000004
      [ 2055.241089] RDX: ffff998b097a9000 RSI: 0000000000000000 RDI: 0000000000000000
      [ 2055.248286] RBP: 0000000000000000 R08: 00000000000065a8 R09: 0000000000005d80
      [ 2055.255483] R10: 0000000000000040 R11: ffff8dbcf0100000 R12: ffff998b097a9000
      [ 2055.262681] R13: ffff8dbd8c98c000 R14: 0000000000000000 R15: ffff998b07607d78
      [ 2055.269879] FS:  0000000000000000(0000) GS:ffff8dbd8ff00000(0000) knlGS:0000000000000000
      [ 2055.278039] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 2055.283834] CR2: 0000000000000018 CR3: 0000000c0c8cc005 CR4: 00000000007626e0
      [ 2055.291030] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 2055.298227] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 2055.305424] PKRU: 55555554
      [ 2055.308153] Call Trace:
      [ 2055.310624]  xdp_do_redirect+0x7b/0x380
      [ 2055.314499]  tun_get_user+0x10fe/0x12a0 [tun]
      [ 2055.318895]  tun_sendmsg+0x52/0x70 [tun]
      [ 2055.322852]  handle_tx+0x2ad/0x5f0 [vhost_net]
      [ 2055.327337]  vhost_worker+0xa5/0x100 [vhost]
      [ 2055.331646]  kthread+0xf5/0x130
      [ 2055.334813]  ? vhost_dev_ioctl+0x3b0/0x3b0 [vhost]
      [ 2055.339646]  ? kthread_bind+0x10/0x10
      [ 2055.343343]  ret_from_fork+0x35/0x40
      [ 2055.346950] Code: 0e 74 15 83 f8 10 75 05 e9 e9 aa b3 ff f3 c3 0f 1f 80 00 00 00 00 f3 c3 e9 c9 9d b3 ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 <8b> 47 18 83 f8 0e 74 0d 83 f8 10 75 05 e9 e9 a9 b3 ff 31 c0 c3
      [ 2055.366004] RIP: __xdp_map_lookup_elem+0x5/0x30 RSP: ffff998b07607cc0
      [ 2055.372500] CR2: 0000000000000018
      [ 2055.375856] ---[ end trace 2a2dcc5e9e174268 ]---
      [ 2055.523626] Kernel panic - not syncing: Fatal exception
      [ 2055.529796] Kernel Offset: 0x2e000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      [ 2055.677539] ---[ end Kernel panic - not syncing: Fatal exception ]---
      
      v2:
       - Removed preempt_disable/enable since local_bh_disable will prevent
         preemption as well, feedback from Jason Wang.
      
      Fixes: 761876c8 ("tap: XDP support")
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6547e387
  5. 25 5月, 2018 1 次提交
    • J
      xdp: change ndo_xdp_xmit API to support bulking · 735fc405
      Jesper Dangaard Brouer 提交于
      This patch change the API for ndo_xdp_xmit to support bulking
      xdp_frames.
      
      When kernel is compiled with CONFIG_RETPOLINE, XDP sees a huge slowdown.
      Most of the slowdown is caused by DMA API indirect function calls, but
      also the net_device->ndo_xdp_xmit() call.
      
      Benchmarked patch with CONFIG_RETPOLINE, using xdp_redirect_map with
      single flow/core test (CPU E5-1650 v4 @ 3.60GHz), showed
      performance improved:
       for driver ixgbe: 6,042,682 pps -> 6,853,768 pps = +811,086 pps
       for driver i40e : 6,187,169 pps -> 6,724,519 pps = +537,350 pps
      
      With frames avail as a bulk inside the driver ndo_xdp_xmit call,
      further optimizations are possible, like bulk DMA-mapping for TX.
      
      Testing without CONFIG_RETPOLINE show the same performance for
      physical NIC drivers.
      
      The virtual NIC driver tun sees a huge performance boost, as it can
      avoid doing per frame producer locking, but instead amortize the
      locking cost over the bulk.
      
      V2: Fix compile errors reported by kbuild test robot <lkp@intel.com>
      V4: Isolated ndo, driver changes and callers.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      735fc405
  6. 24 5月, 2018 1 次提交
    • J
      tuntap: correctly set SOCKWQ_ASYNC_NOSPACE · 2f3ab622
      Jason Wang 提交于
      When link is down, writes to the device might fail with
      -EIO. Userspace needs an indication when the status is resolved.  As a
      fix, tun_net_open() attempts to wake up writers - but that is only
      effective if SOCKWQ_ASYNC_NOSPACE has been set in the past. This is
      not the case of vhost_net which only poll for EPOLLOUT after it meets
      errors during sendmsg().
      
      This patch fixes this by making sure SOCKWQ_ASYNC_NOSPACE is set when
      socket is not writable or device is down to guarantee EPOLLOUT will be
      raised in either tun_chr_poll() or tun_sock_write_space() after device
      is up.
      
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Fixes: 1bd4978a ("tun: honor IFF_UP in tun_get_user()")
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2f3ab622
  7. 17 5月, 2018 1 次提交
  8. 15 5月, 2018 1 次提交
    • J
      tun: fix use after free for ptr_ring · b196d88a
      Jason Wang 提交于
      We used to initialize ptr_ring during TUNSETIFF, this is because its
      size depends on the tx_queue_len of netdevice. And we try to clean it
      up when socket were detached from netdevice. A race were spotted when
      trying to do uninit during a read which will lead a use after free for
      pointer ring. Solving this by always initialize a zero size ptr_ring
      in open() and do resizing during TUNSETIFF, and then we can safely do
      cleanup during close(). With this, there's no need for the workaround
      that was introduced by commit 4df0bfc7 ("tun: fix a memory leak
      for tfile->tx_array").
      
      Reported-by: syzbot+e8b902c3c3fadf0a9dba@syzkaller.appspotmail.com
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Fixes: 1576d986 ("tun: switch to use skb array for tx")
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b196d88a
  9. 11 5月, 2018 1 次提交
  10. 20 4月, 2018 1 次提交
  11. 19 4月, 2018 2 次提交
    • N
      bpf: make tun compatible w/ bpf_xdp_adjust_tail · 8fb58f1e
      Nikita V. Shirokov 提交于
      w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as
      well (only "decrease" of pointer's location is going to be supported).
      changing of this pointer will change packet's size.
      for tun driver we need to adjust XDP_PASS handling by recalculating
      length of the packet if it was passed to the TCP/IP stack
      (in case if after xdp's prog run data_end pointer was adjusted)
      Reviewed-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NNikita V. Shirokov <tehnerd@tehnerd.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      8fb58f1e
    • B
      tun: fix vlan packet truncation · 81c89507
      Bjørn Mork 提交于
      Bogus trimming in tun_net_xmit() causes truncated vlan packets.
      
      skb->len is correct whether or not skb_vlan_tag_present() is true. There
      is no more reason to adjust the skb length on xmit in this driver than
      any other driver. tun_put_user() adds 4 bytes to the total for tagged
      packets because it transmits the tag inline to userspace.  This is
      similar to a nic transmitting the tag inline on the wire.
      
      Reproducing the bug by sending any tagged packet through back-to-back
      connected tap interfaces:
      
       socat TUN,tun-type=tap,iff-up,tun-name=in TUN,tun-type=tap,iff-up,tun-name=out &
       ip link add link in name in.20 type vlan id 20
       ip addr add 10.9.9.9/24 dev in.20
       ip link set in.20 up
       tshark -nxxi in -f arp -c1 2>/dev/null &
       tshark -nxxi out -f arp -c1 2>/dev/null &
       ping -c 1 10.9.9.5 >/dev/null 2>&1
      
      The output from the 'in' and 'out' interfaces are different when the
      bug is present:
      
       Capturing on 'in'
       0000  ff ff ff ff ff ff 76 cf 76 37 d5 0a 81 00 00 14   ......v.v7......
       0010  08 06 00 01 08 00 06 04 00 01 76 cf 76 37 d5 0a   ..........v.v7..
       0020  0a 09 09 09 00 00 00 00 00 00 0a 09 09 05         ..............
      
       Capturing on 'out'
       0000  ff ff ff ff ff ff 76 cf 76 37 d5 0a 81 00 00 14   ......v.v7......
       0010  08 06 00 01 08 00 06 04 00 01 76 cf 76 37 d5 0a   ..........v.v7..
       0020  0a 09 09 09 00 00 00 00 00 00                     ..........
      
      Fixes: aff3d70a ("tun: allow to attach ebpf socket filter")
      Cc: Jason Wang <jasowang@redhat.com>
      Signed-off-by: NBjørn Mork <bjorn@mork.no>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      81c89507
  12. 17 4月, 2018 4 次提交
    • J
      xdp: transition into using xdp_frame for ndo_xdp_xmit · 44fa2dbd
      Jesper Dangaard Brouer 提交于
      Changing API ndo_xdp_xmit to take a struct xdp_frame instead of struct
      xdp_buff.  This brings xdp_return_frame and ndp_xdp_xmit in sync.
      
      This builds towards changing the API further to become a bulk API,
      because xdp_buff is not a queue-able object while xdp_frame is.
      
      V4: Adjust for commit 59655a5b ("tuntap: XDP_TX can use native XDP")
      V7: Adjust for commit d9314c47 ("i40e: add support for XDP_REDIRECT")
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44fa2dbd
    • J
      xdp: transition into using xdp_frame for return API · 03993094
      Jesper Dangaard Brouer 提交于
      Changing API xdp_return_frame() to take struct xdp_frame as argument,
      seems like a natural choice. But there are some subtle performance
      details here that needs extra care, which is a deliberate choice.
      
      When de-referencing xdp_frame on a remote CPU during DMA-TX
      completion, result in the cache-line is change to "Shared"
      state. Later when the page is reused for RX, then this xdp_frame
      cache-line is written, which change the state to "Modified".
      
      This situation already happens (naturally) for, virtio_net, tun and
      cpumap as the xdp_frame pointer is the queued object.  In tun and
      cpumap, the ptr_ring is used for efficiently transferring cache-lines
      (with pointers) between CPUs. Thus, the only option is to
      de-referencing xdp_frame.
      
      It is only the ixgbe driver that had an optimization, in which it can
      avoid doing the de-reference of xdp_frame.  The driver already have
      TX-ring queue, which (in case of remote DMA-TX completion) have to be
      transferred between CPUs anyhow.  In this data area, we stored a
      struct xdp_mem_info and a data pointer, which allowed us to avoid
      de-referencing xdp_frame.
      
      To compensate for this, a prefetchw is used for telling the cache
      coherency protocol about our access pattern.  My benchmarks show that
      this prefetchw is enough to compensate the ixgbe driver.
      
      V7: Adjust for commit d9314c47 ("i40e: add support for XDP_REDIRECT")
      V8: Adjust for commit bd658dda ("net/mlx5e: Separate dma base address
      and offset in dma_sync call")
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      03993094
    • J
      xdp: rhashtable with allocator ID to pointer mapping · 8d5d8852
      Jesper Dangaard Brouer 提交于
      Use the IDA infrastructure for getting a cyclic increasing ID number,
      that is used for keeping track of each registered allocator per
      RX-queue xdp_rxq_info.  Instead of using the IDR infrastructure, which
      uses a radix tree, use a dynamic rhashtable, for creating ID to
      pointer lookup table, because this is faster.
      
      The problem that is being solved here is that, the xdp_rxq_info
      pointer (stored in xdp_buff) cannot be used directly, as the
      guaranteed lifetime is too short.  The info is needed on a
      (potentially) remote CPU during DMA-TX completion time . In an
      xdp_frame the xdp_mem_info is stored, when it got converted from an
      xdp_buff, which is sufficient for the simple page refcnt based recycle
      schemes.
      
      For more advanced allocators there is a need to store a pointer to the
      registered allocator.  Thus, there is a need to guard the lifetime or
      validity of the allocator pointer, which is done through this
      rhashtable ID map to pointer. The removal and validity of of the
      allocator and helper struct xdp_mem_allocator is guarded by RCU.  The
      allocator will be created by the driver, and registered with
      xdp_rxq_info_reg_mem_model().
      
      It is up-to debate who is responsible for freeing the allocator
      pointer or invoking the allocator destructor function.  In any case,
      this must happen via RCU freeing.
      
      Use the IDA infrastructure for getting a cyclic increasing ID number,
      that is used for keeping track of each registered allocator per
      RX-queue xdp_rxq_info.
      
      V4: Per req of Jason Wang
      - Use xdp_rxq_info_reg_mem_model() in all drivers implementing
        XDP_REDIRECT, even-though it's not strictly necessary when
        allocator==NULL for type MEM_TYPE_PAGE_SHARED (given it's zero).
      
      V6: Per req of Alex Duyck
      - Introduce rhashtable_lookup() call in later patch
      
      V8: Address sparse should be static warnings (from kbuild test robot)
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d5d8852
    • J
      tun: convert to use generic xdp_frame and xdp_return_frame API · 1ffcbc85
      Jesper Dangaard Brouer 提交于
      The tuntap driver invented it's own driver specific way of queuing
      XDP packets, by storing the xdp_buff information in the top of
      the XDP frame data.
      
      Convert it over to use the more generic xdp_frame structure.  The
      main problem with the in-driver method is that the xdp_rxq_info pointer
      cannot be trused/used when dequeueing the frame.
      
      V3: Remove check based on feedback from Jason
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ffcbc85
  13. 12 4月, 2018 2 次提交
  14. 15 3月, 2018 1 次提交
  15. 10 3月, 2018 1 次提交
  16. 27 2月, 2018 3 次提交
  17. 17 2月, 2018 2 次提交
  18. 16 2月, 2018 1 次提交
    • K
      tun: Add ioctl() SIOCGSKNS cmd to allow obtaining net ns of tun device · f2780d6d
      Kirill Tkhai 提交于
      This patch adds possibility to get tun device's net namespace fd
      in the same way we allow to do that for sockets.
      
      Socket ioctl numbers do not intersect with tun-specific, and there
      is already SIOCSIFHWADDR used in tun code. So, SIOCGSKNS number
      is choosen instead of custom-made for this functionality.
      
      Note, that open_related_ns() uses plain get_net_ns() and it's safe
      (net can't be already dead at this moment):
      
        tun socket is allocated via sk_alloc() with zero last arg (kern = 0).
        So, each alive socket increments net::count, and the socket is definitely
        alive during ioctl syscall.
      
      Also, common variable net is introduced, so small cleanup in TUNSETIFF
      is made.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f2780d6d
  19. 12 2月, 2018 1 次提交
    • L
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds 提交于
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  20. 09 2月, 2018 1 次提交
    • J
      tuntap: add missing xdp flush · 762c330d
      Jason Wang 提交于
      When using devmap to redirect packets between interfaces,
      xdp_do_flush() is usually a must to flush any batched
      packets. Unfortunately this is missed in current tuntap
      implementation.
      
      Unlike most hardware driver which did XDP inside NAPI loop and call
      xdp_do_flush() at then end of each round of poll. TAP did it in the
      context of process e.g tun_get_user(). So fix this by count the
      pending redirected packets and flush when it exceeds NAPI_POLL_WEIGHT
      or MSG_MORE was cleared by sendmsg() caller.
      
      With this fix, xdp_redirect_map works again between two TAPs.
      
      Fixes: 761876c8 ("tap: XDP support")
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      762c330d
  21. 23 1月, 2018 1 次提交
  22. 22 1月, 2018 1 次提交
  23. 18 1月, 2018 3 次提交
  24. 09 1月, 2018 2 次提交
    • J
      tuntap: XDP transmission · fc72d1d5
      Jason Wang 提交于
      This patch implements XDP transmission for TAP. Since we can't create
      new queues for TAP during XDP set, exist ptr_ring was reused for
      queuing XDP buffers. To differ xdp_buff from sk_buff, TUN_XDP_FLAG
      (0x1UL) was encoded into lowest bit of xpd_buff pointer during
      ptr_ring_produce, and was decoded during consuming. XDP metadata was
      stored in the headroom of the packet which should work in most of
      cases since driver usually reserve enough headroom. Very minor changes
      were done for vhost_net: it just need to peek the length depends on
      the type of pointer.
      
      Tests were done on two Intel E5-2630 2.40GHz machines connected back
      to back through two 82599ES. Traffic were generated/received through
      MoonGen/testpmd(rxonly). It reports ~20% improvements when
      xdp_redirect_map is doing redirection from ixgbe to TAP (from 2.50Mpps
      to 3.05Mpps)
      
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc72d1d5
    • J
      tun/tap: use ptr_ring instead of skb_array · 5990a305
      Jason Wang 提交于
      This patch switches to use ptr_ring instead of skb_array. This will be
      used to enqueue different types of pointers by encoding type into
      lower bits.
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5990a305
  25. 06 1月, 2018 1 次提交
  26. 08 12月, 2017 1 次提交
  27. 07 12月, 2017 1 次提交