1. 26 5月, 2010 1 次提交
    • K
      driver core: add devname module aliases to allow module on-demand auto-loading · 578454ff
      Kay Sievers 提交于
      This adds:
        alias: devname:<name>
      to some common kernel modules, which will allow the on-demand loading
      of the kernel module when the device node is accessed.
      
      Ideally all these modules would be compiled-in, but distros seems too
      much in love with their modularization that we need to cover the common
      cases with this new facility. It will allow us to remove a bunch of pretty
      useless init scripts and modprobes from init scripts.
      
      The static device node aliases will be carried in the module itself. The
      program depmod will extract this information to a file in the module directory:
        $ cat /lib/modules/2.6.34-00650-g537b60d1-dirty/modules.devname
        # Device nodes to trigger on-demand module loading.
        microcode cpu/microcode c10:184
        fuse fuse c10:229
        ppp_generic ppp c108:0
        tun net/tun c10:200
        dm_mod mapper/control c10:235
      
      Udev will pick up the depmod created file on startup and create all the
      static device nodes which the kernel modules specify, so that these modules
      get automatically loaded when the device node is accessed:
        $ /sbin/udevd --debug
        ...
        static_dev_create_from_modules: mknod '/dev/cpu/microcode' c10:184
        static_dev_create_from_modules: mknod '/dev/fuse' c10:229
        static_dev_create_from_modules: mknod '/dev/ppp' c108:0
        static_dev_create_from_modules: mknod '/dev/net/tun' c10:200
        static_dev_create_from_modules: mknod '/dev/mapper/control' c10:235
        udev_rules_apply_static_dev_perms: chmod '/dev/net/tun' 0666
        udev_rules_apply_static_dev_perms: chmod '/dev/fuse' 0666
      
      A few device nodes are switched to statically allocated numbers, to allow
      the static nodes to work. This might also useful for systems which still run
      a plain static /dev, which is completely unsafe to use with any dynamic minor
      numbers.
      
      Note:
      The devname aliases must be limited to the *common* and *single*instance*
      device nodes, like the misc devices, and never be used for conceptually limited
      systems like the loop devices, which should rather get fixed properly and get a
      control node for losetup to talk to, instead of creating a random number of
      device nodes in advance, regardless if they are ever used.
      
      This facility is to hide the mess distros are creating with too modualized
      kernels, and just to hide that these modules are not compiled-in, and not to
      paper-over broken concepts. Thanks! :)
      
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Alasdair G Kergon <agk@redhat.com>
      Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
      Cc: Ian Kent <raven@themaw.net>
      Signed-Off-By: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      578454ff
  2. 24 5月, 2010 1 次提交
  3. 18 5月, 2010 1 次提交
  4. 14 5月, 2010 1 次提交
    • J
      drivers/net: Remove unnecessary returns from void function()s · a4b77097
      Joe Perches 提交于
      This patch removes from drivers/net/ all the unnecessary
      return; statements that precede the last closing brace of
      void functions.
      
      It does not remove the returns that are immediately
      preceded by a label as gcc doesn't like that.
      
      It also does not remove null void functions with return.
      
      Done via:
      $ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
        xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'
      
      with some cleanups by hand.
      
      Compile tested x86 allmodconfig only.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4b77097
  5. 10 5月, 2010 1 次提交
  6. 03 5月, 2010 1 次提交
    • M
      tun: add ioctl to modify vnet header size · d9d52b51
      Michael S. Tsirkin 提交于
      virtio added mergeable buffers mode where 2 bytes of extra info is put
      after vnet header but before actual data (tun does not need this data).
      In hindsight, it would have been better to add the new info *before* the
      packet: as it is, users need a lot of tricky code to skip the extra 2
      bytes in the middle of the iovec, and in fact applications seem to get
      it wrong, and only work with specific iovec layout.  The fact we might
      need to split iovec also means we might in theory overflow iovec max
      size.
      
      This patch adds a simpler way for applications to handle this,
      and future proofs the interface against further extensions,
      by making the size of the virtio net header configurable
      from userspace. As a result, tun driver will simply
      skip the extra 2 bytes on both input and output.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      d9d52b51
  7. 02 5月, 2010 1 次提交
    • E
      net: sock_def_readable() and friends RCU conversion · 43815482
      Eric Dumazet 提交于
      sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we
      need two atomic operations (and associated dirtying) per incoming
      packet.
      
      RCU conversion is pretty much needed :
      
      1) Add a new structure, called "struct socket_wq" to hold all fields
      that will need rcu_read_lock() protection (currently: a
      wait_queue_head_t and a struct fasync_struct pointer).
      
      [Future patch will add a list anchor for wakeup coalescing]
      
      2) Attach one of such structure to each "struct socket" created in
      sock_alloc_inode().
      
      3) Respect RCU grace period when freeing a "struct socket_wq"
      
      4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct
      socket_wq"
      
      5) Change sk_sleep() function to use new sk->sk_wq instead of
      sk->sk_sleep
      
      6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside
      a rcu_read_lock() section.
      
      7) Change all sk_has_sleeper() callers to :
        - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock)
        - Use wq_has_sleeper() to eventually wakeup tasks.
        - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock)
      
      8) sock_wake_async() is modified to use rcu protection as well.
      
      9) Exceptions :
        macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq"
      instead of dynamically allocated ones. They dont need rcu freeing.
      
      Some cleanups or followups are probably needed, (possible
      sk_callback_lock conversion to a spinlock for example...).
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43815482
  8. 21 4月, 2010 1 次提交
  9. 14 4月, 2010 1 次提交
    • M
      tun: orphan an skb on tx · 0110d6f2
      Michael S. Tsirkin 提交于
      The following situation was observed in the field:
      tap1 sends packets, tap2 does not consume them, as a result
      tap1 can not be closed. This happens because
      tun/tap devices can hang on to skbs undefinitely.
      
      As noted by Herbert, possible solutions include a timeout followed by a
      copy/change of ownership of the skb, or always copying/changing
      ownership if we're going into a hostile device.
      
      This patch implements the second approach.
      
      Note: one issue still remaining is that since skbs
      keep reference to tun socket and tun socket has a
      reference to tun device, we won't flush backlog,
      instead simply waiting for all skbs to get transmitted.
      At least this is not user-triggerable, and
      this was not reported in practice, my assumption is
      other devices besides tap complete an skb
      within finite time after it has been queued.
      
      A possible solution for the second issue
      would not to have socket reference the device,
      instead, implement dev->destructor for tun, and
      wait for all skbs to complete there, but this
      needs some thought, probably too risky for 2.6.34.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Tested-by: NYan Vugenfirer <yvugenfi@redhat.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0110d6f2
  10. 18 2月, 2010 1 次提交
  11. 09 2月, 2010 1 次提交
  12. 15 1月, 2010 1 次提交
  13. 27 12月, 2009 1 次提交
  14. 07 11月, 2009 1 次提交
    • A
      net/tun: handle compat_ioctl directly · 50857e2a
      Arnd Bergmann 提交于
      The tun driver is the only code in the kernel that operates
      on a character device with struct ifreq. Change the driver
      to handle the conversion itself so we can contain the
      remaining ifreq handling in the socket layer.
      
      This also fixes a bug in the handling of invalid ioctl
      numbers on an unbound tun device. The driver treats this
      as a TUNSETIFF in native mode, but there is no way for
      the generic compat_ioctl() function to emulate this
      behaviour. Possibly the driver was only doing this
      accidentally anyway, but if any code relies on this
      misfeature, it now also works in compat mode.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50857e2a
  15. 14 10月, 2009 1 次提交
  16. 23 9月, 2009 1 次提交
  17. 20 9月, 2009 1 次提交
  18. 02 9月, 2009 1 次提交
  19. 01 9月, 2009 2 次提交
  20. 10 8月, 2009 1 次提交
    • H
      tun: Extend RTNL lock coverage over whole ioctl · 876bfd4d
      Herbert Xu 提交于
      As it is, parts of the ioctl runs under the RTNL and parts of
      it do not.  The unlocked section is still protected by the BKL,
      but there can be subtle races.  For example, Eric Biederman and
      Paul Moore observed that if two threads tried to create two tun
      devices on the same file descriptor, then unexpected results
      may occur.
      
      As there isn't anything in the ioctl that is expected to sleep
      indefinitely, we can prevent this from occurring by extending
      the RTNL lock coverage.
      
      This also allows to get rid of the BKL.
      
      Finally, I changed tun_get_iff to take a tun device in order to
      avoid calling tun_put which would dead-lock as it also tries to
      take the RTNL lock.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      876bfd4d
  21. 18 7月, 2009 1 次提交
  22. 08 7月, 2009 1 次提交
  23. 07 7月, 2009 1 次提交
  24. 06 7月, 2009 2 次提交
  25. 16 6月, 2009 1 次提交
  26. 08 6月, 2009 3 次提交
  27. 04 6月, 2009 1 次提交
    • H
      tun: Only wake up writers · c722c625
      Herbert Xu 提交于
      When I added socket accounting to tun I inadvertently introduced
      spurious wake-up events that kills qemu performance.  The problem
      occurs when qemu polls on the tun fd for read, and then transmits
      packets.  For each packet transmitted, we will wake up qemu even
      if it only cares about read events.
      
      Now this affects all sockets, but it is only a new problem for
      tun.  So this patch tries to fix it for tun first and we can then
      look at the problem in general.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c722c625
  28. 10 5月, 2009 1 次提交
    • D
      tun: add tun_flags, owner, group attributes in sysfs · 980c9e8c
      David Woodhouse 提交于
      This patch adds three attribute files in /sys/class/net/$dev/ for tun
      devices; allowing userspace to obtain the information which TUNGETIFF
      offers, and more, but without having to attach to the device in question
      (which may not be possible if it's in use).
      
      It also fixes a bug which has been present in the TUNGETIFF ioctl since
      its inception, where it would never set IFF_TUN or IFF_TAP according to
      the device type. (Look carefully at the code which I remove from
      tun_get_iff() and how the new tun_flags() helper is subtly different).
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      980c9e8c
  29. 27 4月, 2009 1 次提交
    • D
      tun: add IFF_TUN_EXCL flag to avoid opening a persistent device. · f85ba780
      David Woodhouse 提交于
      When creating a certain types of VPN, NetworkManager will first attempt
      to find an available tun device by iterating through 'vpn%d' until it
      finds one that isn't already busy. Then it'll set that to be persistent
      and owned by the otherwise unprivileged user that the VPN dæmon itself
      runs as.
      
      There's a race condition here -- during the period where the vpn%d
      device is created and we're waiting for the VPN dæmon to actually
      connect and use it, if we try to create _another_ device we could end up
      re-using the same one -- because trying to open it again doesn't get
      -EBUSY as it would while it's _actually_ busy.
      
      So solve this, we add an IFF_TUN_EXCL flag which causes tun_set_iff() to
      fail if it would be opening an existing persistent tundevice -- so that
      we can make sure we're getting an entirely _new_ device.
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f85ba780
  30. 21 4月, 2009 2 次提交
  31. 20 4月, 2009 2 次提交
    • H
      tun: Fix sk_sleep races when attaching/detaching · c40af84a
      Herbert Xu 提交于
      As the sk_sleep wait queue actually lives in tfile, which may be
      detached from the tun device, bad things will happen when we use
      sk_sleep after detaching.
      
      Since the tun device is the persistent data structure here (when
      requested by the user), it makes much more sense to have the wait
      queue live there.  There is no reason to have it in tfile at all
      since the only time we can wait is if we have a tun attached.
      In fact we already have a wait queue in tun_struct, so we might
      as well use it.
      Reported-by: NEric W. Biederman <ebiederm@xmission.com>
      Tested-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Tested-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c40af84a
    • H
      tun: Only free a netdev when all tun descriptors are closed · 9c3fea6a
      Herbert Xu 提交于
      The commit c70f1829 ("tun: Fix
      races between tun_net_close and free_netdev") fixed a race where
      an asynchronous deletion of a tun device can hose a poll(2) on
      a tun fd attached to that device.
      
      However, this came at the cost of moving the tun wait queue into
      the tun file data structure.  The problem with this is that it
      imposes restrictions on when and where the tun device can access
      the wait queue since the tun file may change at any time due to
      detaching and reattaching.
      
      In particular, now that we need to use the wait queue on the
      receive path it becomes difficult to properly synchronise this
      with the detachment of the tun device.
      
      This patch solves the original race in a different way.  Since
      the race is only because the underlying memory gets freed, we
      can prevent it simply by ensuring that we don't do that until
      all tun descriptors ever attached to the device (even if they
      have since be detached because they may still be sitting in poll)
      have been closed.
      
      This is done by using reference counting the attached tun file
      descriptors.  The refcount in tun->sk has been reappropriated
      for this purpose since it was already being used for that, albeit
      from the opposite angle.
      
      Note that we no longer zero tfile->tun since tun_get will return
      NULL anyway after the refcount on tfile hits zero.  Instead it
      represents whether this device has ever been attached to a device.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c3fea6a
  32. 14 4月, 2009 1 次提交
  33. 15 2月, 2009 1 次提交
  34. 09 2月, 2009 1 次提交
    • A
      tun: Fix unicast filter overflow · cfbf84fc
      Alex Williamson 提交于
      Tap devices can make use of a small MAC filter set via the
      TUNSETTXFILTER ioctl.  The filter has a set of exact matches
      plus a hash for imperfect filtering of additional multicast
      addresses.  The current code is unbalanced, adding unicast
      addresses to the multicast hash, but only checking the hash
      against multicast addresses.  This results in the filter
      dropping unicast addresses that overflow the exact filter.
      The fix is simply to disable the filter by leaving count set
      to zero if we find non-multicast addresses after the exact
      match table is filled.
      Signed-off-by: NAlex Williamson <alex.williamson@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cfbf84fc