1. 16 8月, 2013 1 次提交
  2. 14 8月, 2013 1 次提交
  3. 19 7月, 2013 1 次提交
    • J
      tuntap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS · 88529176
      Jason Wang 提交于
      We try to linearize part of the skb when the number of iov is greater than
      MAX_SKB_FRAGS. This is not enough since each single vector may occupy more than
      one pages, so zerocopy_sg_fromiovec() may still fail and may break the guest
      network.
      
      Solve this problem by calculate the pages needed for iov before trying to do
      zerocopy and switch to use copy instead of zerocopy if it needs more than
      MAX_SKB_FRAGS.
      
      This is done through introducing a new helper to count the pages for iov, and
      call uarg->callback() manually when switching from zerocopy to copy to notify
      vhost.
      
      We can do further optimization on top.
      
      The bug were introduced from commit 0690899b
      (tun: experimental zero copy tx support)
      
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88529176
  4. 11 7月, 2013 1 次提交
  5. 26 6月, 2013 1 次提交
  6. 13 6月, 2013 2 次提交
  7. 12 6月, 2013 1 次提交
  8. 11 6月, 2013 1 次提交
  9. 29 5月, 2013 1 次提交
    • J
      tuntap: forbid changing mq flag for persistent device · 8e6d91ae
      Jason Wang 提交于
      We currently allow changing the mq flag (IFF_MULTI_QUEUE) for a persistent
      device. This will result a mismatch between the number the queues in netdev and
      tuntap. This is because we only allocate a 1q netdevice when IFF_MULTI_QUEUE was
      not specified, so when we set the IFF_MULTI_QUEUE and try to attach more queues
      later, netif_set_real_num_tx_queues() may fail which result a single queue
      netdevice with multiple sockets attached.
      
      Solve this by disallowing changing the mq flag for persistent device.
      
      Bug was introduced by commit edfb6a14
      (tuntap: reduce memory using of queues).
      Reported-by: NSriram Narasimhan <sriram.narasimhan@hp.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8e6d91ae
  10. 29 4月, 2013 1 次提交
  11. 25 4月, 2013 1 次提交
  12. 13 4月, 2013 1 次提交
  13. 12 4月, 2013 1 次提交
    • J
      tuntap: initialize vlan_features · c0317998
      Jason Wang 提交于
      The vlan_features was zero which prevents vlan GSO packets to be transmitted to
      userspace. This is suboptimal so enable this by initialize vlan_features for
      tuntap.
      
      Netperf shows better performance of guest receiving since vlan TSO works for
      tuntap:
      
      before:
      netperf -H 192.168.5.4
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.5.4 ()
      port 0 AF_INET : demo
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.01    2786.67
      
      after:
      netperf -H 192.168.5.4
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.5.4 ()
      port 0 AF_INET : demo
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.00    8085.49
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0317998
  14. 28 3月, 2013 1 次提交
  15. 27 3月, 2013 1 次提交
    • J
      tuntap: set transport header before passing it to kernel · 38502af7
      Jason Wang 提交于
      Currently, for the packets receives from tuntap, before doing header check,
      kernel just reset the transport header in netif_receive_skb() which pretends no
      l4 header. This is suboptimal for precise packet length estimation (introduced
      in 1def9238) which needs correct l4 header for gso packets.
      
      So this patch set the transport header to csum_start for partial checksum
      packets, otherwise it first try skb_flow_dissect(), if it fails, just reset the
      transport header.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38502af7
  16. 13 3月, 2013 1 次提交
  17. 07 3月, 2013 1 次提交
    • E
      tun: add a missing nf_reset() in tun_net_xmit() · f8af75f3
      Eric Dumazet 提交于
      Dave reported following crash :
      
      general protection fault: 0000 [#1] SMP
      CPU 2
      Pid: 25407, comm: qemu-kvm Not tainted 3.7.9-205.fc18.x86_64 #1 Hewlett-Packard HP Z400 Workstation/0B4Ch
      RIP: 0010:[<ffffffffa0399bd5>]  [<ffffffffa0399bd5>] destroy_conntrack+0x35/0x120 [nf_conntrack]
      RSP: 0018:ffff880276913d78  EFLAGS: 00010206
      RAX: 50626b6b7876376c RBX: ffff88026e530d68 RCX: ffff88028d158e00
      RDX: ffff88026d0d5470 RSI: 0000000000000011 RDI: 0000000000000002
      RBP: ffff880276913d88 R08: 0000000000000000 R09: ffff880295002900
      R10: 0000000000000000 R11: 0000000000000003 R12: ffffffff81ca3b40
      R13: ffffffff8151a8e0 R14: ffff880270875000 R15: 0000000000000002
      FS:  00007ff3bce38a00(0000) GS:ffff88029fc40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 00007fd1430bd000 CR3: 000000027042b000 CR4: 00000000000027e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process qemu-kvm (pid: 25407, threadinfo ffff880276912000, task ffff88028c369720)
      Stack:
       ffff880156f59100 ffff880156f59100 ffff880276913d98 ffffffff815534f7
       ffff880276913db8 ffffffff8151a74b ffff880270875000 ffff880156f59100
       ffff880276913dd8 ffffffff8151a5a6 ffff880276913dd8 ffff88026d0d5470
      Call Trace:
       [<ffffffff815534f7>] nf_conntrack_destroy+0x17/0x20
       [<ffffffff8151a74b>] skb_release_head_state+0x7b/0x100
       [<ffffffff8151a5a6>] __kfree_skb+0x16/0xa0
       [<ffffffff8151a666>] kfree_skb+0x36/0xa0
       [<ffffffff8151a8e0>] skb_queue_purge+0x20/0x40
       [<ffffffffa02205f7>] __tun_detach+0x117/0x140 [tun]
       [<ffffffffa022184c>] tun_chr_close+0x3c/0xd0 [tun]
       [<ffffffff8119669c>] __fput+0xec/0x240
       [<ffffffff811967fe>] ____fput+0xe/0x10
       [<ffffffff8107eb27>] task_work_run+0xa7/0xe0
       [<ffffffff810149e1>] do_notify_resume+0x71/0xb0
       [<ffffffff81640152>] int_signal+0x12/0x17
      Code: 00 00 04 48 89 e5 41 54 53 48 89 fb 4c 8b a7 e8 00 00 00 0f 85 de 00 00 00 0f b6 73 3e 0f b7 7b 2a e8 10 40 00 00 48 85 c0 74 0e <48> 8b 40 28 48 85 c0 74 05 48 89 df ff d0 48 c7 c7 08 6a 3a a0
      RIP  [<ffffffffa0399bd5>] destroy_conntrack+0x35/0x120 [nf_conntrack]
       RSP <ffff880276913d78>
      
      This is because tun_net_xmit() needs to call nf_reset()
      before queuing skb into receive_queue
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8af75f3
  18. 28 2月, 2013 1 次提交
    • S
      hlist: drop the node parameter from iterators · b67bfe0d
      Sasha Levin 提交于
      I'm not sure why, but the hlist for each entry iterators were conceived
      
              list_for_each_entry(pos, head, member)
      
      The hlist ones were greedy and wanted an extra parameter:
      
              hlist_for_each_entry(tpos, pos, head, member)
      
      Why did they need an extra pos parameter? I'm not quite sure. Not only
      they don't really need it, it also prevents the iterator from looking
      exactly like the list iterator, which is unfortunate.
      
      Besides the semantic patch, there was some manual work required:
      
       - Fix up the actual hlist iterators in linux/list.h
       - Fix up the declaration of other iterators based on the hlist ones.
       - A very small amount of places were using the 'node' parameter, this
       was modified to use 'obj->member' instead.
       - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
       properly, so those had to be fixed up manually.
      
      The semantic patch which is mostly the work of Peter Senna Tschudin is here:
      
      @@
      iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
      
      type T;
      expression a,c,d,e;
      identifier b;
      statement S;
      @@
      
      -T b;
          <+... when != b
      (
      hlist_for_each_entry(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue(a,
      - b,
      c) S
      |
      hlist_for_each_entry_from(a,
      - b,
      c) S
      |
      hlist_for_each_entry_rcu(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_rcu_bh(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue_rcu_bh(a,
      - b,
      c) S
      |
      for_each_busy_worker(a, c,
      - b,
      d) S
      |
      ax25_uid_for_each(a,
      - b,
      c) S
      |
      ax25_for_each(a,
      - b,
      c) S
      |
      inet_bind_bucket_for_each(a,
      - b,
      c) S
      |
      sctp_for_each_hentry(a,
      - b,
      c) S
      |
      sk_for_each(a,
      - b,
      c) S
      |
      sk_for_each_rcu(a,
      - b,
      c) S
      |
      sk_for_each_from
      -(a, b)
      +(a)
      S
      + sk_for_each_from(a) S
      |
      sk_for_each_safe(a,
      - b,
      c, d) S
      |
      sk_for_each_bound(a,
      - b,
      c) S
      |
      hlist_for_each_entry_safe(a,
      - b,
      c, d, e) S
      |
      hlist_for_each_entry_continue_rcu(a,
      - b,
      c) S
      |
      nr_neigh_for_each(a,
      - b,
      c) S
      |
      nr_neigh_for_each_safe(a,
      - b,
      c, d) S
      |
      nr_node_for_each(a,
      - b,
      c) S
      |
      nr_node_for_each_safe(a,
      - b,
      c, d) S
      |
      - for_each_gfn_sp(a, c, d, b) S
      + for_each_gfn_sp(a, c, d) S
      |
      - for_each_gfn_indirect_valid_sp(a, c, d, b) S
      + for_each_gfn_indirect_valid_sp(a, c, d) S
      |
      for_each_host(a,
      - b,
      c) S
      |
      for_each_host_safe(a,
      - b,
      c, d) S
      |
      for_each_mesh_entry(a,
      - b,
      c, d) S
      )
          ...+>
      
      [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
      [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: fix warnings]
      [akpm@linux-foudnation.org: redo intrusive kvm changes]
      Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b67bfe0d
  19. 14 2月, 2013 1 次提交
    • P
      net: Fix possible wrong checksum generation. · c9af6db4
      Pravin B Shelar 提交于
      Patch cef401de (net: fix possible wrong checksum
      generation) fixed wrong checksum calculation but it broke TSO by
      defining new GSO type but not a netdev feature for that type.
      net_gso_ok() would not allow hardware checksum/segmentation
      offload of such packets without the feature.
      
      Following patch fixes TSO and wrong checksum. This patch uses
      same logic that Eric Dumazet used. Patch introduces new flag
      SKBTX_SHARED_FRAG if at least one frag can be modified by
      the user. but SKBTX_SHARED_FRAG flag is kept in skb shared
      info tx_flags rather than gso_type.
      
      tx_flags is better compared to gso_type since we can have skb with
      shared frag without gso packet. It does not link SHARED_FRAG to
      GSO, So there is no need to define netdev feature for this.
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9af6db4
  20. 30 1月, 2013 2 次提交
  21. 28 1月, 2013 1 次提交
    • E
      net: fix possible wrong checksum generation · cef401de
      Eric Dumazet 提交于
      Pravin Shelar mentioned that GSO could potentially generate
      wrong TX checksum if skb has fragments that are overwritten
      by the user between the checksum computation and transmit.
      
      He suggested to linearize skbs but this extra copy can be
      avoided for normal tcp skbs cooked by tcp_sendmsg().
      
      This patch introduces a new SKB_GSO_SHARED_FRAG flag, set
      in skb_shinfo(skb)->gso_type if at least one frag can be
      modified by the user.
      
      Typical sources of such possible overwrites are {vm}splice(),
      sendfile(), and macvtap/tun/virtio_net drivers.
      
      Tested:
      
      $ netperf -H 7.7.8.84
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
      7.7.8.84 () port 0 AF_INET
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.00    3959.52
      
      $ netperf -H 7.7.8.84 -t TCP_SENDFILE
      TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 ()
      port 0 AF_INET
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.00    3216.80
      
      Performance of the SENDFILE is impacted by the extra allocation and
      copy, and because we use order-0 pages, while the TCP_STREAM uses
      bigger pages.
      Reported-by: NPravin Shelar <pshelar@nicira.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cef401de
  22. 24 1月, 2013 2 次提交
    • J
      tuntap: limit the number of flow caches · b8732fb7
      Jason Wang 提交于
      We create new flow caches when a new flow is identified by tuntap, This may lead
      some issues:
      
      - userspace may produce a huge amount of short live flows to exhaust host memory
      - the unlimited number of flow caches may produce a long list which increase the
        time in the linear searching
      
      Solve this by introducing a limit of total number of flow caches.
      
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8732fb7
    • J
      tuntap: reduce memory using of queues · edfb6a14
      Jason Wang 提交于
      A MAX_TAP_QUEUES(1024) queues of tuntap device is always allocated
      unconditionally even userspace only requires a single queue device. This is
      unnecessary and will lead a very high order of page allocation when has a high
      possibility to fail. Solving this by creating a one queue net device when
      userspace only use one queue and also reduce MAX_TAP_QUEUES to
      DEFAULT_MAX_NUM_RSS_QUEUES which can guarantee the success of
      the allocation.
      Reported-by: NDirk Hohndel <dirk@hohndel.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      edfb6a14
  23. 15 1月, 2013 1 次提交
    • P
      tun: fix LSM/SELinux labeling of tun/tap devices · 5dbbaf2d
      Paul Moore 提交于
      This patch corrects some problems with LSM/SELinux that were introduced
      with the multiqueue patchset.  The problem stems from the fact that the
      multiqueue work changed the relationship between the tun device and its
      associated socket; before the socket persisted for the life of the
      device, however after the multiqueue changes the socket only persisted
      for the life of the userspace connection (fd open).  For non-persistent
      devices this is not an issue, but for persistent devices this can cause
      the tun device to lose its SELinux label.
      
      We correct this problem by adding an opaque LSM security blob to the
      tun device struct which allows us to have the LSM security state, e.g.
      SELinux labeling information, persist for the lifetime of the tun
      device.  In the process we tweak the LSM hooks to work with this new
      approach to TUN device/socket labeling and introduce a new LSM hook,
      security_tun_dev_attach_queue(), to approve requests to attach to a
      TUN queue via TUNSETQUEUE.
      
      The SELinux code has been adjusted to match the new LSM hooks, the
      other LSMs do not make use of the LSM TUN controls.  This patch makes
      use of the recently added "tun_socket:attach_queue" permission to
      restrict access to the TUNSETQUEUE operation.  On older SELinux
      policies which do not define the "tun_socket:attach_queue" permission
      the access control decision for TUNSETQUEUE will be handled according
      to the SELinux policy's unknown permission setting.
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      Acked-by: NEric Paris <eparis@parisplace.org>
      Tested-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5dbbaf2d
  24. 12 1月, 2013 3 次提交
  25. 11 1月, 2013 2 次提交
  26. 22 12月, 2012 1 次提交
  27. 18 12月, 2012 2 次提交
  28. 15 12月, 2012 1 次提交
    • J
      tuntap: fix ambigious multiqueue API · 4008e97f
      Jason Wang 提交于
      The current multiqueue API is ambigious which may confuse both user and LSM to
      do things correctly:
      
      - Both TUNSETIFF and TUNSETQUEUE could be used to create the queues of a tuntap
        device.
      - TUNSETQUEUE were used to disable and enable a specific queue of the
        device. But since the state of tuntap were completely removed from the queue,
        it could be used to attach to another device (there's no such kind of
        requirement currently, and it needs new kind of LSM policy.
      - TUNSETQUEUE could be used to attach to a persistent device without any
        queues. This kind of attching bypass the necessary checking during TUNSETIFF
        and may lead unexpected result.
      
      So this patch tries to make a cleaner and simpler API by:
      
      - Only allow TUNSETIFF to create queues.
      - TUNSETQUEUE could be only used to disable and enabled the queues of a device,
        and the state of the tuntap device were not detachd from the queues when it
        was disabled, so TUNSETQUEUE could be only used after TUNSETIFF and with the
         same device.
      
      This is done by introducing a list which keeps track of all queues which were
      disabled. The queue would be moved between this list and tfiles[] array when it
      was enabled/disabled. A pointer of the tun_struct were also introdued to track
      the device it belongs to when it was disabled.
      
      After the change, the isolation between management and application could be done
      through: TUNSETIFF were only called by management software and TUNSETQUEUE were
      only called by application.For LSM/SELinux, the things left is to do proper
      check during tun_set_queue() if needed.
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4008e97f
  29. 14 12月, 2012 1 次提交
    • E
      tuntap: dont use skb after netif_rx_ni(skb) · 49974420
      Eric Dumazet 提交于
      On Wed, 2012-12-12 at 23:16 -0500, Dave Jones wrote:
      > Since todays net merge, I see this when I start openvpn..
      >
      > general protection fault: 0000 [#1] PREEMPT SMP
      > Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_tables xfs iTCO_wdt iTCO_vendor_support snd_emu10k1 snd_util_mem snd_ac97_codec coretemp ac97_bus microcode snd_hwdep snd_seq pcspkr snd_pcm snd_page_alloc snd_timer lpc_ich i2c_i801 snd_rawmidi mfd_core snd_seq_device snd e1000e soundcore emu10k1_gp gameport i82975x_edac edac_core vhost_net tun macvtap macvlan kvm_intel kvm binfmt_misc nfsd auth_rpcgss nfs_acl lockd sunrpc btrfs libcrc32c zlib_deflate firewire_ohci sata_sil firewire_core crc_itu_t radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core floppy
      > CPU 0
      > Pid: 1381, comm: openvpn Not tainted 3.7.0+ #14                  /D975XBX
      > RIP: 0010:[<ffffffff815b54a4>]  [<ffffffff815b54a4>] skb_flow_dissect+0x314/0x3e0
      > RSP: 0018:ffff88007d0d9c48  EFLAGS: 00010206
      > RAX: 000000000000055d RBX: 6b6b6b6b6b6b6b4b RCX: 1471030a0180040a
      > RDX: 0000000000000005 RSI: 00000000ffffffe0 RDI: ffff8800ba83fa80
      > RBP: ffff88007d0d9cb8 R08: 0000000000000000 R09: 0000000000000000
      > R10: 0000000000000000 R11: 0000000000000101 R12: ffff8800ba83fa80
      > R13: 0000000000000008 R14: ffff88007d0d9cc8 R15: ffff8800ba83fa80
      > FS:  00007f6637104800(0000) GS:ffff8800bf600000(0000) knlGS:0000000000000000
      > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      > CR2: 00007f563f5b01c4 CR3: 000000007d140000 CR4: 00000000000007f0
      > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      > Process openvpn (pid: 1381, threadinfo ffff88007d0d8000, task ffff8800a540cd60)
      > Stack:
      >  ffff8800ba83fa80 0000000000000296 0000000000000000 0000000000000000
      >  ffff88007d0d9cc8 ffffffff815bcff4 ffff88007d0d9ce8 ffffffff815b1831
      >  ffff88007d0d9ca8 00000000703f6364 ffff8800ba83fa80 0000000000000000
      > Call Trace:
      >  [<ffffffff815bcff4>] ? netif_rx+0x114/0x4c0
      >  [<ffffffff815b1831>] ? skb_copy_datagram_from_iovec+0x61/0x290
      >  [<ffffffff815b672a>] __skb_get_rxhash+0x1a/0xd0
      >  [<ffffffffa03b9538>] tun_get_user+0x418/0x810 [tun]
      >  [<ffffffff8135f468>] ? delay_tsc+0x98/0xf0
      >  [<ffffffff8109605c>] ? __rcu_read_unlock+0x5c/0xa0
      >  [<ffffffffa03b9a41>] tun_chr_aio_write+0x81/0xb0 [tun]
      >  [<ffffffff81145011>] ? __buffer_unlock_commit+0x41/0x50
      >  [<ffffffff811db917>] do_sync_write+0xa7/0xe0
      >  [<ffffffff811dc01f>] vfs_write+0xaf/0x190
      >  [<ffffffff811dc375>] sys_write+0x55/0xa0
      >  [<ffffffff81705540>] tracesys+0xdd/0xe2
      > Code: 41 8b 44 24 68 41 2b 44 24 6c 01 de 29 f0 83 f8 03 0f 8e a0 00 00 00 48 63 de 49 03 9c 24 e0 00 00 00 48 85 db 0f 84 72 fe ff ff <8b> 03 41 89 46 08 b8 01 00 00 00 e9 43 fd ff ff 0f 1f 40 00 48
      > RIP  [<ffffffff815b54a4>] skb_flow_dissect+0x314/0x3e0
      >  RSP <ffff88007d0d9c48>
      > ---[ end trace 6d42c834c72c002e ]---
      >
      >
      > Faulting instruction is
      >
      >    0:	8b 03                	mov    (%rbx),%eax
      >
      > rbx is slab poison (-20) so this looks like a use-after-free here...
      >
      >                         flow->ports = *ports;
      >  314:   8b 03                   mov    (%rbx),%eax
      >  316:   41 89 46 08             mov    %eax,0x8(%r14)
      >
      > in the inlined skb_header_pointer in skb_flow_dissect
      >
      > 	Dave
      >
      
      commit 96442e42 (tuntap: choose the txq based on rxq) added
      a use after free.
      
      Cache rxhash in a temp variable before calling netif_rx_ni()
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49974420
  30. 12 12月, 2012 1 次提交
  31. 08 12月, 2012 1 次提交
  32. 04 12月, 2012 2 次提交
    • M
      tun: only queue packets on device · 5d097109
      Michael S. Tsirkin 提交于
      Historically tun supported two modes of operation:
      - in default mode, a small number of packets would get queued
        at the device, the rest would be queued in qdisc
      - in one queue mode, all packets would get queued at the device
      
      This might have made sense up to a point where we made the
      queue depth for both modes the same and set it to
      a huge value (500) so unless the consumer
      is stuck the chance of losing packets is small.
      
      Thus in practice both modes behave the same, but the
      default mode has some problems:
      - if packets are never consumed, fragments are never orphaned
        which cases a DOS for sender using zero copy transmit
      - overrun errors are hard to diagnose: fifo error is incremented
        only once so you can not distinguish between
        userspace that is stuck and a transient failure,
        tcpdump on the device does not show any traffic
      
      Userspace solves this simply by enabling IFF_ONE_QUEUE
      but there seems to be little point in not doing the
      right thing for everyone, by default.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d097109
    • J
      tuntap: attach queue 0 before registering netdevice · eb0fb363
      Jason Wang 提交于
      We attach queue 0 after registering netdevice currently. This leads to call
      netif_set_real_num_{tx|rx}_queues() after registering the netdevice. Since we
      allow tun/tap has a maximum of 1024 queues, this may lead a huge number of
      uevents to be injected to userspace since we create 2048 kobjects and then
      remove 2046. Solve this problem by attaching queue 0 and set the real number of
      queues before registering netdevice.
      Reported-by: NJiri Slaby <jslaby@suse.cz>
      Tested-by: NJiri Slaby <jslaby@suse.cz>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb0fb363