1. 11 1月, 2013 1 次提交
  2. 22 12月, 2012 1 次提交
  3. 18 12月, 2012 2 次提交
  4. 15 12月, 2012 1 次提交
    • J
      tuntap: fix ambigious multiqueue API · 4008e97f
      Jason Wang 提交于
      The current multiqueue API is ambigious which may confuse both user and LSM to
      do things correctly:
      
      - Both TUNSETIFF and TUNSETQUEUE could be used to create the queues of a tuntap
        device.
      - TUNSETQUEUE were used to disable and enable a specific queue of the
        device. But since the state of tuntap were completely removed from the queue,
        it could be used to attach to another device (there's no such kind of
        requirement currently, and it needs new kind of LSM policy.
      - TUNSETQUEUE could be used to attach to a persistent device without any
        queues. This kind of attching bypass the necessary checking during TUNSETIFF
        and may lead unexpected result.
      
      So this patch tries to make a cleaner and simpler API by:
      
      - Only allow TUNSETIFF to create queues.
      - TUNSETQUEUE could be only used to disable and enabled the queues of a device,
        and the state of the tuntap device were not detachd from the queues when it
        was disabled, so TUNSETQUEUE could be only used after TUNSETIFF and with the
         same device.
      
      This is done by introducing a list which keeps track of all queues which were
      disabled. The queue would be moved between this list and tfiles[] array when it
      was enabled/disabled. A pointer of the tun_struct were also introdued to track
      the device it belongs to when it was disabled.
      
      After the change, the isolation between management and application could be done
      through: TUNSETIFF were only called by management software and TUNSETQUEUE were
      only called by application.For LSM/SELinux, the things left is to do proper
      check during tun_set_queue() if needed.
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4008e97f
  5. 14 12月, 2012 1 次提交
    • E
      tuntap: dont use skb after netif_rx_ni(skb) · 49974420
      Eric Dumazet 提交于
      On Wed, 2012-12-12 at 23:16 -0500, Dave Jones wrote:
      > Since todays net merge, I see this when I start openvpn..
      >
      > general protection fault: 0000 [#1] PREEMPT SMP
      > Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack ip6table_filter ip6_tables xfs iTCO_wdt iTCO_vendor_support snd_emu10k1 snd_util_mem snd_ac97_codec coretemp ac97_bus microcode snd_hwdep snd_seq pcspkr snd_pcm snd_page_alloc snd_timer lpc_ich i2c_i801 snd_rawmidi mfd_core snd_seq_device snd e1000e soundcore emu10k1_gp gameport i82975x_edac edac_core vhost_net tun macvtap macvlan kvm_intel kvm binfmt_misc nfsd auth_rpcgss nfs_acl lockd sunrpc btrfs libcrc32c zlib_deflate firewire_ohci sata_sil firewire_core crc_itu_t radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core floppy
      > CPU 0
      > Pid: 1381, comm: openvpn Not tainted 3.7.0+ #14                  /D975XBX
      > RIP: 0010:[<ffffffff815b54a4>]  [<ffffffff815b54a4>] skb_flow_dissect+0x314/0x3e0
      > RSP: 0018:ffff88007d0d9c48  EFLAGS: 00010206
      > RAX: 000000000000055d RBX: 6b6b6b6b6b6b6b4b RCX: 1471030a0180040a
      > RDX: 0000000000000005 RSI: 00000000ffffffe0 RDI: ffff8800ba83fa80
      > RBP: ffff88007d0d9cb8 R08: 0000000000000000 R09: 0000000000000000
      > R10: 0000000000000000 R11: 0000000000000101 R12: ffff8800ba83fa80
      > R13: 0000000000000008 R14: ffff88007d0d9cc8 R15: ffff8800ba83fa80
      > FS:  00007f6637104800(0000) GS:ffff8800bf600000(0000) knlGS:0000000000000000
      > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      > CR2: 00007f563f5b01c4 CR3: 000000007d140000 CR4: 00000000000007f0
      > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      > Process openvpn (pid: 1381, threadinfo ffff88007d0d8000, task ffff8800a540cd60)
      > Stack:
      >  ffff8800ba83fa80 0000000000000296 0000000000000000 0000000000000000
      >  ffff88007d0d9cc8 ffffffff815bcff4 ffff88007d0d9ce8 ffffffff815b1831
      >  ffff88007d0d9ca8 00000000703f6364 ffff8800ba83fa80 0000000000000000
      > Call Trace:
      >  [<ffffffff815bcff4>] ? netif_rx+0x114/0x4c0
      >  [<ffffffff815b1831>] ? skb_copy_datagram_from_iovec+0x61/0x290
      >  [<ffffffff815b672a>] __skb_get_rxhash+0x1a/0xd0
      >  [<ffffffffa03b9538>] tun_get_user+0x418/0x810 [tun]
      >  [<ffffffff8135f468>] ? delay_tsc+0x98/0xf0
      >  [<ffffffff8109605c>] ? __rcu_read_unlock+0x5c/0xa0
      >  [<ffffffffa03b9a41>] tun_chr_aio_write+0x81/0xb0 [tun]
      >  [<ffffffff81145011>] ? __buffer_unlock_commit+0x41/0x50
      >  [<ffffffff811db917>] do_sync_write+0xa7/0xe0
      >  [<ffffffff811dc01f>] vfs_write+0xaf/0x190
      >  [<ffffffff811dc375>] sys_write+0x55/0xa0
      >  [<ffffffff81705540>] tracesys+0xdd/0xe2
      > Code: 41 8b 44 24 68 41 2b 44 24 6c 01 de 29 f0 83 f8 03 0f 8e a0 00 00 00 48 63 de 49 03 9c 24 e0 00 00 00 48 85 db 0f 84 72 fe ff ff <8b> 03 41 89 46 08 b8 01 00 00 00 e9 43 fd ff ff 0f 1f 40 00 48
      > RIP  [<ffffffff815b54a4>] skb_flow_dissect+0x314/0x3e0
      >  RSP <ffff88007d0d9c48>
      > ---[ end trace 6d42c834c72c002e ]---
      >
      >
      > Faulting instruction is
      >
      >    0:	8b 03                	mov    (%rbx),%eax
      >
      > rbx is slab poison (-20) so this looks like a use-after-free here...
      >
      >                         flow->ports = *ports;
      >  314:   8b 03                   mov    (%rbx),%eax
      >  316:   41 89 46 08             mov    %eax,0x8(%r14)
      >
      > in the inlined skb_header_pointer in skb_flow_dissect
      >
      > 	Dave
      >
      
      commit 96442e42 (tuntap: choose the txq based on rxq) added
      a use after free.
      
      Cache rxhash in a temp variable before calling netif_rx_ni()
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49974420
  6. 12 12月, 2012 1 次提交
  7. 08 12月, 2012 1 次提交
  8. 04 12月, 2012 2 次提交
    • M
      tun: only queue packets on device · 5d097109
      Michael S. Tsirkin 提交于
      Historically tun supported two modes of operation:
      - in default mode, a small number of packets would get queued
        at the device, the rest would be queued in qdisc
      - in one queue mode, all packets would get queued at the device
      
      This might have made sense up to a point where we made the
      queue depth for both modes the same and set it to
      a huge value (500) so unless the consumer
      is stuck the chance of losing packets is small.
      
      Thus in practice both modes behave the same, but the
      default mode has some problems:
      - if packets are never consumed, fragments are never orphaned
        which cases a DOS for sender using zero copy transmit
      - overrun errors are hard to diagnose: fifo error is incremented
        only once so you can not distinguish between
        userspace that is stuck and a transient failure,
        tcpdump on the device does not show any traffic
      
      Userspace solves this simply by enabling IFF_ONE_QUEUE
      but there seems to be little point in not doing the
      right thing for everyone, by default.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d097109
    • J
      tuntap: attach queue 0 before registering netdevice · eb0fb363
      Jason Wang 提交于
      We attach queue 0 after registering netdevice currently. This leads to call
      netif_set_real_num_{tx|rx}_queues() after registering the netdevice. Since we
      allow tun/tap has a maximum of 1024 queues, this may lead a huge number of
      uevents to be injected to userspace since we create 2048 kobjects and then
      remove 2046. Solve this problem by attaching queue 0 and set the real number of
      queues before registering netdevice.
      Reported-by: NJiri Slaby <jslaby@suse.cz>
      Tested-by: NJiri Slaby <jslaby@suse.cz>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb0fb363
  9. 27 11月, 2012 2 次提交
  10. 24 11月, 2012 1 次提交
  11. 20 11月, 2012 1 次提交
  12. 03 11月, 2012 1 次提交
  13. 01 11月, 2012 6 次提交
    • J
      tuntap: choose the txq based on rxq · 96442e42
      Jason Wang 提交于
      This patch implements a simple multiqueue flow steering policy - tx follows rx
      for tun/tap. The idea is simple, it just choose the txq based on which rxq it
      comes. The flow were identified through the rxhash of a skb, and the hash to
      queue mapping were recorded in a hlist with an ageing timer to retire the
      mapping. The mapping were created when tun receives packet from userspace, and
      was quired in .ndo_select_queue().
      
      I run co-current TCP_CRR test and didn't see any mapping manipulation helpers in
      perf top, so the overhead could be negelected.
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96442e42
    • J
      tuntap: add ioctl to attach or detach a file form tuntap device · cde8b15f
      Jason Wang 提交于
      Sometimes usespace may need to active/deactive a queue, this could be done by
      detaching and attaching a file from tuntap device.
      
      This patch introduces a new ioctls - TUNSETQUEUE which could be used to do
      this. Flag IFF_ATTACH_QUEUE were introduced to do attaching while
      IFF_DETACH_QUEUE were introduced to do the detaching.
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cde8b15f
    • J
      tuntap: multiqueue support · c8d68e6b
      Jason Wang 提交于
      This patch converts tun/tap to a multiqueue devices and expose the multiqueue
      queues as multiple file descriptors to userspace. Internally, each tun_file were
      abstracted as a queue, and an array of pointers to tun_file structurs were
      stored in tun_structure device, so multiple tun_files were allowed to be
      attached to the device as multiple queues.
      
      When choosing txq, we first try to identify a flow through its rxhash, if it
      does not have such one, we could try recorded rxq and then use them to choose
      the transmit queue. This policy may be changed in the future.
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c8d68e6b
    • J
      tuntap: RCUify dereferencing between tun_struct and tun_file · 6e914fc7
      Jason Wang 提交于
      RCU were introduced in this patch to synchronize the dereferences between
      tun_struct and tun_file. All tun_{get|put} were replaced with RCU, the
      dereference from one to other must be done under rtnl lock or rcu read critical
      region.
      
      This is needed for the following patches since the one of the goal of multiqueue
      tuntap is to allow adding or removing queues during workload. Without RCU,
      control path would hold tx locks when adding or removing queues (which may cause
      sme delay) and it's hard to change the number of queues without stopping the net
      device. With the help of rcu, there's also no need for tun_file hold an refcnt
      to tun_struct.
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e914fc7
    • J
      tuntap: move socket to tun_file · 54f968d6
      Jason Wang 提交于
      Current tuntap makes use of the socket receive queue as its tx queue. To
      implement multiple tx queues for tuntap and enable the ability of adding and
      removing queues during workload, the first step is to move the socket related
      structures to tun_file. Then we could let multiple fds/sockets to be attached to
      the tuntap.
      
      This patch removes tun_sock and moves socket related structures from tun_sock or
      tun_struct to tun_file. Two exceptions are tap_filter and sock_fprog, they are
      still kept in tun_structure since they are used to filter packets for the net
      device instead of per transmit queue (at least I see no requirements for
      them). After those changes, socket were created and destroyed during file open
      and close (instead of device creation and destroy), the socket structures could
      be dereferenced from tun_file instead of the file of tun_struct structure
      itself.
      
      For persisent device, since we purge during datching and wouldn't queue any
      packets when no interface were attached, there's no behaviod changes before and
      after this patch, so the changes were transparent to the userspace. To keep the
      attributes such as sndbuf, socket filter and vnet header, those would be
      re-initialize after a new interface were attached to an persist device.
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54f968d6
    • J
      tuntap: log the unsigned informaiton with %u · 1e588338
      Jason Wang 提交于
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e588338
  14. 26 10月, 2012 2 次提交
    • D
      cgroup: net_cls: Rework update socket logic · 6a328d8c
      Daniel Wagner 提交于
      The cgroup logic part of net_cls is very similar as the one in
      net_prio. Let's stream line the net_cls logic with the net_prio one.
      
      The net_prio update logic was changed by following commit (note there
      were some changes necessary later on)
      
      commit 406a3c63
      Author: John Fastabend <john.r.fastabend@intel.com>
      Date:   Fri Jul 20 10:39:25 2012 +0000
      
          net: netprio_cgroup: rework update socket logic
      
          Instead of updating the sk_cgrp_prioidx struct field on every send
          this only updates the field when a task is moved via cgroup
          infrastructure.
      
          This allows sockets that may be used by a kernel worker thread
          to be managed. For example in the iscsi case today a user can
          put iscsid in a netprio cgroup and control traffic will be sent
          with the correct sk_cgrp_prioidx value set but as soon as data
          is sent the kernel worker thread isssues a send and sk_cgrp_prioidx
          is updated with the kernel worker threads value which is the
          default case.
      
          It seems more correct to only update the field when the user
          explicitly sets it via control group infrastructure. This allows
          the users to manage sockets that may be used with other threads.
      
      Since classid is now updated when the task is moved between the
      cgroups, we don't have to call sock_update_classid() from various
      places to ensure we always using the latest classid value.
      
      [v2: Use iterate_fd() instead of open coding]
      Signed-off-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Cc:  Li Zefan <lizefan@huawei.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: <netdev@vger.kernel.org>
      Cc: <cgroups@vger.kernel.org>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a328d8c
    • D
      cgroup: net_cls: Pass in task to sock_update_classid() · fd9a08a7
      Daniel Wagner 提交于
      sock_update_classid() assumes that the update operation always are
      applied on the current task. sock_update_classid() needs to know on
      which tasks to work on in order to be able to migrate task between
      cgroups using the struct cgroup_subsys attach() callback.
      Signed-off-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: <netdev@vger.kernel.org>
      Cc: <cgroups@vger.kernel.org>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd9a08a7
  15. 15 9月, 2012 1 次提交
  16. 15 8月, 2012 1 次提交
  17. 10 8月, 2012 1 次提交
  18. 31 7月, 2012 1 次提交
  19. 30 7月, 2012 1 次提交
  20. 23 7月, 2012 2 次提交
  21. 21 7月, 2012 1 次提交
    • M
      tun: fix a crash bug and a memory leak · b09e786b
      Mikulas Patocka 提交于
      This patch fixes a crash
      tun_chr_close -> netdev_run_todo -> tun_free_netdev -> sk_release_kernel ->
      sock_release -> iput(SOCK_INODE(sock))
      introduced by commit 1ab5ecb9
      
      The problem is that this socket is embedded in struct tun_struct, it has
      no inode, iput is called on invalid inode, which modifies invalid memory
      and optionally causes a crash.
      
      sock_release also decrements sockets_in_use, this causes a bug that
      "sockets: used" field in /proc/*/net/sockstat keeps on decreasing when
      creating and closing tun devices.
      
      This patch introduces a flag SOCK_EXTERNALLY_ALLOCATED that instructs
      sock_release to not free the inode and not decrement sockets_in_use,
      fixing both memory corruption and sockets_in_use underflow.
      
      It should be backported to 3.3 an 3.4 stabke.
      Signed-off-by: NMikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Cc: stable@kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b09e786b
  22. 17 7月, 2012 1 次提交
  23. 11 5月, 2012 1 次提交
    • J
      drivers/net: Convert compare_ether_addr to ether_addr_equal · 2e42e474
      Joe Perches 提交于
      Use the new bool function ether_addr_equal to add
      some clarity and reduce the likelihood for misuse
      of compare_ether_addr for sorting.
      
      Done via cocci script:
      
      $ cat compare_ether_addr.cocci
      @@
      expression a,b;
      @@
      -	!compare_ether_addr(a, b)
      +	ether_addr_equal(a, b)
      
      @@
      expression a,b;
      @@
      -	compare_ether_addr(a, b)
      +	!ether_addr_equal(a, b)
      
      @@
      expression a,b;
      @@
      -	!ether_addr_equal(a, b) == 0
      +	ether_addr_equal(a, b)
      
      @@
      expression a,b;
      @@
      -	!ether_addr_equal(a, b) != 0
      +	!ether_addr_equal(a, b)
      
      @@
      expression a,b;
      @@
      -	ether_addr_equal(a, b) == 0
      +	!ether_addr_equal(a, b)
      
      @@
      expression a,b;
      @@
      -	ether_addr_equal(a, b) != 0
      +	ether_addr_equal(a, b)
      
      @@
      expression a,b;
      @@
      -	!!ether_addr_equal(a, b)
      +	ether_addr_equal(a, b)
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e42e474
  24. 29 3月, 2012 1 次提交
  25. 13 3月, 2012 1 次提交
    • S
      tun: don't hold network namespace by tun sockets · 1ab5ecb9
      Stanislav Kinsbursky 提交于
      v3: added previously removed sock_put() to the tun_release() callback, because
      sk_release_kernel() doesn't drop the socket reference.
      
      v2: sk_release_kernel() used for socket release. Dummy tun_release() is
      required for sk_release_kernel() ---> sock_release() ---> sock->ops->release()
      call.
      
      TUN was designed to destroy it's socket on network namesapce shutdown. But this
      will never happen for persistent device, because it's socket holds network
      namespace.
      This patch removes of holding network namespace by TUN socket and replaces it
      by creating socket in init_net and then changing it's net it to desired one. On
      shutdown socket is moved back to init_net prior to final put.
      Signed-off-by: NStanislav Kinsbursky <skinsbursky@parallels.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ab5ecb9
  26. 16 2月, 2012 1 次提交
  27. 23 11月, 2011 1 次提交
  28. 17 11月, 2011 2 次提交
  29. 18 8月, 2011 1 次提交