1. 04 6月, 2009 1 次提交
    • H
      tun: Only wake up writers · c722c625
      Herbert Xu 提交于
      When I added socket accounting to tun I inadvertently introduced
      spurious wake-up events that kills qemu performance.  The problem
      occurs when qemu polls on the tun fd for read, and then transmits
      packets.  For each packet transmitted, we will wake up qemu even
      if it only cares about read events.
      
      Now this affects all sockets, but it is only a new problem for
      tun.  So this patch tries to fix it for tun first and we can then
      look at the problem in general.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c722c625
  2. 10 5月, 2009 1 次提交
    • D
      tun: add tun_flags, owner, group attributes in sysfs · 980c9e8c
      David Woodhouse 提交于
      This patch adds three attribute files in /sys/class/net/$dev/ for tun
      devices; allowing userspace to obtain the information which TUNGETIFF
      offers, and more, but without having to attach to the device in question
      (which may not be possible if it's in use).
      
      It also fixes a bug which has been present in the TUNGETIFF ioctl since
      its inception, where it would never set IFF_TUN or IFF_TAP according to
      the device type. (Look carefully at the code which I remove from
      tun_get_iff() and how the new tun_flags() helper is subtly different).
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      980c9e8c
  3. 27 4月, 2009 1 次提交
    • D
      tun: add IFF_TUN_EXCL flag to avoid opening a persistent device. · f85ba780
      David Woodhouse 提交于
      When creating a certain types of VPN, NetworkManager will first attempt
      to find an available tun device by iterating through 'vpn%d' until it
      finds one that isn't already busy. Then it'll set that to be persistent
      and owned by the otherwise unprivileged user that the VPN dæmon itself
      runs as.
      
      There's a race condition here -- during the period where the vpn%d
      device is created and we're waiting for the VPN dæmon to actually
      connect and use it, if we try to create _another_ device we could end up
      re-using the same one -- because trying to open it again doesn't get
      -EBUSY as it would while it's _actually_ busy.
      
      So solve this, we add an IFF_TUN_EXCL flag which causes tun_set_iff() to
      fail if it would be opening an existing persistent tundevice -- so that
      we can make sure we're getting an entirely _new_ device.
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f85ba780
  4. 21 4月, 2009 2 次提交
  5. 20 4月, 2009 2 次提交
    • H
      tun: Fix sk_sleep races when attaching/detaching · c40af84a
      Herbert Xu 提交于
      As the sk_sleep wait queue actually lives in tfile, which may be
      detached from the tun device, bad things will happen when we use
      sk_sleep after detaching.
      
      Since the tun device is the persistent data structure here (when
      requested by the user), it makes much more sense to have the wait
      queue live there.  There is no reason to have it in tfile at all
      since the only time we can wait is if we have a tun attached.
      In fact we already have a wait queue in tun_struct, so we might
      as well use it.
      Reported-by: NEric W. Biederman <ebiederm@xmission.com>
      Tested-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Tested-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c40af84a
    • H
      tun: Only free a netdev when all tun descriptors are closed · 9c3fea6a
      Herbert Xu 提交于
      The commit c70f1829 ("tun: Fix
      races between tun_net_close and free_netdev") fixed a race where
      an asynchronous deletion of a tun device can hose a poll(2) on
      a tun fd attached to that device.
      
      However, this came at the cost of moving the tun wait queue into
      the tun file data structure.  The problem with this is that it
      imposes restrictions on when and where the tun device can access
      the wait queue since the tun file may change at any time due to
      detaching and reattaching.
      
      In particular, now that we need to use the wait queue on the
      receive path it becomes difficult to properly synchronise this
      with the detachment of the tun device.
      
      This patch solves the original race in a different way.  Since
      the race is only because the underlying memory gets freed, we
      can prevent it simply by ensuring that we don't do that until
      all tun descriptors ever attached to the device (even if they
      have since be detached because they may still be sitting in poll)
      have been closed.
      
      This is done by using reference counting the attached tun file
      descriptors.  The refcount in tun->sk has been reappropriated
      for this purpose since it was already being used for that, albeit
      from the opposite angle.
      
      Note that we no longer zero tfile->tun since tun_get will return
      NULL anyway after the refcount on tfile hits zero.  Instead it
      represents whether this device has ever been attached to a device.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c3fea6a
  6. 14 4月, 2009 1 次提交
  7. 15 2月, 2009 1 次提交
  8. 09 2月, 2009 1 次提交
    • A
      tun: Fix unicast filter overflow · cfbf84fc
      Alex Williamson 提交于
      Tap devices can make use of a small MAC filter set via the
      TUNSETTXFILTER ioctl.  The filter has a set of exact matches
      plus a hash for imperfect filtering of additional multicast
      addresses.  The current code is unbalanced, adding unicast
      addresses to the multicast hash, but only checking the hash
      against multicast addresses.  This results in the filter
      dropping unicast addresses that overflow the exact filter.
      The fix is simply to disable the filter by leaving count set
      to zero if we find non-multicast addresses after the exact
      match table is filled.
      Signed-off-by: NAlex Williamson <alex.williamson@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cfbf84fc
  9. 06 2月, 2009 1 次提交
    • H
      tun: Limit amount of queued packets per device · 33dccbb0
      Herbert Xu 提交于
      Unlike a normal socket path, the tuntap device send path does
      not have any accounting.  This means that the user-space sender
      may be able to pin down arbitrary amounts of kernel memory by
      continuing to send data to an end-point that is congested.
      
      Even when this isn't an issue because of limited queueing at
      most end points, this can also be a problem because its only
      response to congestion is packet loss.  That is, when those
      local queues at the end-point fills up, the tuntap device will
      start wasting system time because it will continue to send
      data there which simply gets dropped straight away.
      
      Of course one could argue that everybody should do congestion
      control end-to-end, unfortunately there are people in this world
      still hooked on UDP, and they don't appear to be going away
      anywhere fast.  In fact, we've always helped them by performing
      accounting in our UDP code, the sole purpose of which is to
      provide congestion feedback other than through packet loss.
      
      This patch attempts to apply the same bandaid to the tuntap device.
      It creates a pseudo-socket object which is used to account our
      packets just as a normal socket does for UDP.  Of course things
      are a little complex because we're actually reinjecting traffic
      back into the stack rather than out of the stack.
      
      The stack complexities however should have been resolved by preceding
      patches.  So this one can simply start using skb_set_owner_w.
      
      For now the accounting is essentially disabled by default for
      backwards compatibility.  In particular, we set the cap to INT_MAX.
      This is so that existing applications don't get confused by the
      sudden arrival EAGAIN errors.
      
      In future we may wish (or be forced to) do this by default.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33dccbb0
  10. 03 2月, 2009 1 次提交
    • M
      tun: Check supplemental groups in TUN/TAP driver. · 1bded710
      Michael Tokarev 提交于
      Michael Tokarev wrote:
      []
      > 2, and this is the main one: How about supplementary groups?
      >
      > Here I have a valid usage case: a group of testers running various
      > versions of windows using KVM (kernel virtual machine), 1 at a time,
      > to test some software.  kvm is set up to use bridge with a tap device
      > (there should be a way to connect to the machine).  Anyone on that group
      > has to be able to start/stop the virtual machines.
      >
      > My first attempt - pretty obvious when I saw -g option of tunctl - is
      > to add group ownership for the tun device and add a supplementary group
      > to each user (their primary group should be different).  But that fails,
      > since kernel only checks for egid, not any other group ids.
      >
      > What's the reasoning to not allow supplementary groups and to only check
      > for egid?
      Signed-off-by: NMichael Tokarev <mjt@tls.msk.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1bded710
  11. 01 2月, 2009 1 次提交
  12. 22 1月, 2009 10 次提交
  13. 05 1月, 2009 1 次提交
  14. 30 12月, 2008 1 次提交
  15. 21 11月, 2008 1 次提交
  16. 20 11月, 2008 1 次提交
  17. 14 11月, 2008 2 次提交
  18. 04 11月, 2008 1 次提交
  19. 02 11月, 2008 1 次提交
    • A
      saner FASYNC handling on file close · 233e70f4
      Al Viro 提交于
      As it is, all instances of ->release() for files that have ->fasync()
      need to remember to evict file from fasync lists; forgetting that
      creates a hole and we actually have a bunch that *does* forget.
      
      So let's keep our lives simple - let __fput() check FASYNC in
      file->f_flags and call ->fasync() there if it's been set.  And lose that
      crap in ->release() instances - leaving it there is still valid, but we
      don't have to bother anymore.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      233e70f4
  20. 28 10月, 2008 1 次提交
  21. 16 8月, 2008 2 次提交
  22. 23 7月, 2008 1 次提交
  23. 15 7月, 2008 1 次提交
    • M
      tun: Fix/rewrite packet filtering logic · f271b2cc
      Max Krasnyansky 提交于
      Please see the following thread to get some context on this
      	http://marc.info/?l=linux-netdev&m=121564433018903&w=2
      
      Basically the issue is that current multi-cast filtering stuff in
      the TUN/TAP driver is seriously broken.
      Original patch went in without proper review and ACK. It was broken and
      confusing to start with and subsequent patches broke it completely.
      To give you an idea of what's broken here are some of the issues:
      
      - Very confusing comments throughout the code that imply that the
      character device is a network interface in its own right, and that packets
      are passed between the two nics. Which is completely wrong.
      
      - Wrong set of ioctls is used for setting up filters. They look like
      shortcuts for manipulating state of the tun/tap network interface but
      in reality manipulate the state of the TX filter.
      
      - ioctls that were originally used for setting address of the the TX filter
      got "fixed" and now set the address of the network interface itself. Which
      made filter totaly useless.
      
      - Filtering is done too late. Instead of filtering early on, to avoid
      unnecessary wakeups, filtering is done in the read() call.
      
      The list goes on and on :)
      
      So the patch cleans all that up. It introduces simple and clean interface for
      setting up TX filters (TUNSETTXFILTER + tun_filter spec) and does filtering
      before enqueuing the packets.
      
      TX filtering is useful in the scenarios where TAP is part of a bridge, in
      which case it gets all broadcast, multicast and potentially other packets when
      the bridge is learning. So for example Ethernet tunnelling app may want to
      setup TX filters to avoid tunnelling multicast traffic. QEMU and other
      hypervisors can push RX filtering that is currently done in the guest into the
      host context therefore saving wakeups and unnecessary data transfer.
      Signed-off-by: NMax Krasnyansky <maxk@qualcomm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f271b2cc
  24. 11 7月, 2008 1 次提交
  25. 03 7月, 2008 3 次提交
    • R
      tun: Allow GSO using virtio_net_hdr · f43798c2
      Rusty Russell 提交于
      Add a IFF_VNET_HDR flag.  This uses the same ABI as virtio_net
      (ie. prepending struct virtio_net_hdr to packets) to indicate GSO and
      checksum information.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Acked-by: NMax Krasnyansky <maxk@qualcomm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f43798c2
    • R
      tun: TUNSETFEATURES to set gso features. · 5228ddc9
      Rusty Russell 提交于
      ethtool is useful for setting (some) device fields, but it's
      root-only.  Finer feature control is available through a tun-specific
      ioctl.
      
      (Includes Mark McLoughlin <markmc@redhat.com>'s fix to hold rtnl sem).
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Acked-by: NMax Krasnyansky <maxk@qualcomm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5228ddc9
    • R
      tun: Interface to query tun/tap features. · 07240fd0
      Rusty Russell 提交于
      The problem with introducing checksum offload and gso to tun is they
      need to set dev->features to enable GSO and/or checksumming, which is
      supposed to be done before register_netdevice(), ie. as part of
      TUNSETIFF.
      
      Unfortunately, TUNSETIFF has always just ignored flags it doesn't
      understand, so there's no good way of detecting whether the kernel
      supports new IFF_ flags.
      
      This patch implements a TUNGETFEATURES ioctl which returns all the valid IFF
      flags.  It could be extended later to include other features.
      
      Here's an example program which uses it:
      
      #include <linux/if_tun.h>
      #include <sys/types.h>
      #include <sys/ioctl.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      #include <err.h>
      #include <stdio.h>
      
      static struct {
      	unsigned int flag;
      	const char *name;
      } known_flags[] = {
      	{ IFF_TUN, "TUN" },
      	{ IFF_TAP, "TAP" },
      	{ IFF_NO_PI, "NO_PI" },
      	{ IFF_ONE_QUEUE, "ONE_QUEUE" },
      };
      
      int main()
      {
      	unsigned int features, i;
      
      	int netfd = open("/dev/net/tun", O_RDWR);
      	if (netfd < 0)
      		err(1, "Opening /dev/net/tun");
      
      	if (ioctl(netfd, TUNGETFEATURES, &features) != 0) {
      		printf("Kernel does not support TUNGETFEATURES, guessing\n");
      		features = (IFF_TUN|IFF_TAP|IFF_NO_PI|IFF_ONE_QUEUE);
      	}
      	printf("Available features are: ");
      	for (i = 0; i < sizeof(known_flags)/sizeof(known_flags[0]); i++) {
      		if (features & known_flags[i].flag) {
      			features &= ~known_flags[i].flag;
      			printf("%s ", known_flags[i].name);
      		}
      	}
      	if (features)
      		printf("(UNKNOWN %#x)", features);
      	printf("\n");
      	return 0;
      }
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Acked-by: NMax Krasnyansky <maxk@qualcomm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07240fd0