1. 21 7月, 2012 1 次提交
    • M
      tun: fix a crash bug and a memory leak · b09e786b
      Mikulas Patocka 提交于
      This patch fixes a crash
      tun_chr_close -> netdev_run_todo -> tun_free_netdev -> sk_release_kernel ->
      sock_release -> iput(SOCK_INODE(sock))
      introduced by commit 1ab5ecb9
      
      The problem is that this socket is embedded in struct tun_struct, it has
      no inode, iput is called on invalid inode, which modifies invalid memory
      and optionally causes a crash.
      
      sock_release also decrements sockets_in_use, this causes a bug that
      "sockets: used" field in /proc/*/net/sockstat keeps on decreasing when
      creating and closing tun devices.
      
      This patch introduces a flag SOCK_EXTERNALLY_ALLOCATED that instructs
      sock_release to not free the inode and not decrement sockets_in_use,
      fixing both memory corruption and sockets_in_use underflow.
      
      It should be backported to 3.3 an 3.4 stabke.
      Signed-off-by: NMikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Cc: stable@kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b09e786b
  2. 17 7月, 2012 1 次提交
  3. 11 5月, 2012 1 次提交
    • J
      drivers/net: Convert compare_ether_addr to ether_addr_equal · 2e42e474
      Joe Perches 提交于
      Use the new bool function ether_addr_equal to add
      some clarity and reduce the likelihood for misuse
      of compare_ether_addr for sorting.
      
      Done via cocci script:
      
      $ cat compare_ether_addr.cocci
      @@
      expression a,b;
      @@
      -	!compare_ether_addr(a, b)
      +	ether_addr_equal(a, b)
      
      @@
      expression a,b;
      @@
      -	compare_ether_addr(a, b)
      +	!ether_addr_equal(a, b)
      
      @@
      expression a,b;
      @@
      -	!ether_addr_equal(a, b) == 0
      +	ether_addr_equal(a, b)
      
      @@
      expression a,b;
      @@
      -	!ether_addr_equal(a, b) != 0
      +	!ether_addr_equal(a, b)
      
      @@
      expression a,b;
      @@
      -	ether_addr_equal(a, b) == 0
      +	!ether_addr_equal(a, b)
      
      @@
      expression a,b;
      @@
      -	ether_addr_equal(a, b) != 0
      +	ether_addr_equal(a, b)
      
      @@
      expression a,b;
      @@
      -	!!ether_addr_equal(a, b)
      +	ether_addr_equal(a, b)
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e42e474
  4. 29 3月, 2012 1 次提交
  5. 13 3月, 2012 1 次提交
    • S
      tun: don't hold network namespace by tun sockets · 1ab5ecb9
      Stanislav Kinsbursky 提交于
      v3: added previously removed sock_put() to the tun_release() callback, because
      sk_release_kernel() doesn't drop the socket reference.
      
      v2: sk_release_kernel() used for socket release. Dummy tun_release() is
      required for sk_release_kernel() ---> sock_release() ---> sock->ops->release()
      call.
      
      TUN was designed to destroy it's socket on network namesapce shutdown. But this
      will never happen for persistent device, because it's socket holds network
      namespace.
      This patch removes of holding network namespace by TUN socket and replaces it
      by creating socket in init_net and then changing it's net it to desired one. On
      shutdown socket is moved back to init_net prior to final put.
      Signed-off-by: NStanislav Kinsbursky <skinsbursky@parallels.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ab5ecb9
  6. 16 2月, 2012 1 次提交
  7. 23 11月, 2011 1 次提交
  8. 17 11月, 2011 2 次提交
  9. 18 8月, 2011 1 次提交
  10. 28 7月, 2011 1 次提交
    • N
      net: Audit drivers to identify those needing IFF_TX_SKB_SHARING cleared · 550fd08c
      Neil Horman 提交于
      After the last patch, We are left in a state in which only drivers calling
      ether_setup have IFF_TX_SKB_SHARING set (we assume that drivers touching real
      hardware call ether_setup for their net_devices and don't hold any state in
      their skbs.  There are a handful of drivers that violate this assumption of
      course, and need to be fixed up.  This patch identifies those drivers, and marks
      them as not being able to support the safe transmission of skbs by clearning the
      IFF_TX_SKB_SHARING flag in priv_flags
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: Karsten Keil <isdn@linux-pingi.de>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Krzysztof Halasa <khc@pm.waw.pl>
      CC: "John W. Linville" <linville@tuxdriver.com>
      CC: Greg Kroah-Hartman <gregkh@suse.de>
      CC: Marcel Holtmann <marcel@holtmann.org>
      CC: Johannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      550fd08c
  11. 17 6月, 2011 1 次提交
    • N
      tun: teach the tun/tap driver to support netpoll · bebd097a
      Neil Horman 提交于
      Commit 8d8fc29d changed the behavior of slave
      devices in regards to netpoll.  Specifically it created a mutually exclusive
      relationship between being a slave and a netpoll-capable device.  This creates
      problems for KVM because guests relied on needing netconsole active on a slave
      device to a bridge.  Ideally libvirtd could just attach netconsole to the bridge
      device instead, but thats currently infeasible, because while the bridge device
      supports netpoll, it requires that all slave interface also support it, but the
      tun/tap driver currently does not.  The most direct solution is to teach tun/tap
      to support netpoll, which is implemented by the patch below.
      
      I've not tested this yet, but its pretty straightforward.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Reported-by: NRik van Riel <riel@redhat.com>
      CC: Rik van Riel <riel@redhat.com>
      CC: Maxim Krasnyansky <maxk@qualcomm.com>
      CC: Cong Wang <amwang@redhat.com>
      CC: "David S. Miller" <davem@davemloft.net>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Tested-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NWANG Cong <amwang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@conan.davemloft.net>
      bebd097a
  12. 12 6月, 2011 1 次提交
    • J
      virtio_net: introduce VIRTIO_NET_HDR_F_DATA_VALID · 10a8d94a
      Jason Wang 提交于
      There's no need for the guest to validate the checksum if it have been
      validated by host nics. So this patch introduces a new flag -
      VIRTIO_NET_HDR_F_DATA_VALID which is used to bypass the checksum
      examing in guest. The backend (tap/macvtap) may set this flag when
      met skbs with CHECKSUM_UNNECESSARY to save cpu utilization.
      
      No feature negotiation is needed as old driver just ignore this flag.
      
      Iperf shows 12%-30% performance improvement for UDP traffic. For TCP,
      when gro is on no difference as it produces skb with partial
      checksum. But when gro is disabled, 20% or even higher improvement
      could be measured by netperf.
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10a8d94a
  13. 09 6月, 2011 3 次提交
  14. 06 6月, 2011 1 次提交
  15. 06 5月, 2011 1 次提交
  16. 30 4月, 2011 1 次提交
    • D
      ethtool: cosmetic: Use ethtool ethtool_cmd_speed API · 70739497
      David Decotigny 提交于
      This updates the network drivers so that they don't access the
      ethtool_cmd::speed field directly, but use ethtool_cmd_speed()
      instead.
      
      For most of the drivers, these changes are purely cosmetic and don't
      fix any problem, such as for those 1GbE/10GbE drivers that indirectly
      call their own ethtool get_settings()/mii_ethtool_gset(). The changes
      are meant to enforce code consistency and provide robustness with
      future larger throughputs, at the expense of a few CPU cycles for each
      ethtool operation.
      
      All drivers compiled with make allyesconfig ion x86_64 have been
      updated.
      
      Tested: make allyesconfig on x86_64 + e1000e/bnx2x work
      Signed-off-by: NDavid Decotigny <decot@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      70739497
  17. 20 4月, 2011 1 次提交
    • M
      net: tun: convert to hw_features · 88255375
      Michał Mirosław 提交于
      This changes offload setting behaviour to what I think is correct:
       - offloads set via ethtool mean what admin wants to use (by default
         he wants 'em all)
       - offloads set via ioctl() mean what userspace is expecting to get
         (this limits which admin wishes are granted)
       - TUN_NOCHECKSUM is ignored, as it might cause broken packets when
         forwarded (ip_summed == CHECKSUM_UNNECESSARY means that checksum
         was verified, not that it can be ignored)
      
      If TUN_NOCHECKSUM is implemented, it should set skb->csum_* and
      skb->ip_summed (= CHECKSUM_PARTIAL) for known protocols and let others
      be verified by kernel when necessary.
      
      TUN_NOCHECKSUM handling was introduced by commit
      f43798c2:
      
          tun: Allow GSO using virtio_net_hdr
      Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88255375
  18. 04 3月, 2011 1 次提交
  19. 25 1月, 2011 1 次提交
  20. 17 12月, 2010 1 次提交
  21. 02 11月, 2010 1 次提交
  22. 31 7月, 2010 1 次提交
  23. 25 7月, 2010 1 次提交
  24. 26 5月, 2010 1 次提交
    • K
      driver core: add devname module aliases to allow module on-demand auto-loading · 578454ff
      Kay Sievers 提交于
      This adds:
        alias: devname:<name>
      to some common kernel modules, which will allow the on-demand loading
      of the kernel module when the device node is accessed.
      
      Ideally all these modules would be compiled-in, but distros seems too
      much in love with their modularization that we need to cover the common
      cases with this new facility. It will allow us to remove a bunch of pretty
      useless init scripts and modprobes from init scripts.
      
      The static device node aliases will be carried in the module itself. The
      program depmod will extract this information to a file in the module directory:
        $ cat /lib/modules/2.6.34-00650-g537b60d1-dirty/modules.devname
        # Device nodes to trigger on-demand module loading.
        microcode cpu/microcode c10:184
        fuse fuse c10:229
        ppp_generic ppp c108:0
        tun net/tun c10:200
        dm_mod mapper/control c10:235
      
      Udev will pick up the depmod created file on startup and create all the
      static device nodes which the kernel modules specify, so that these modules
      get automatically loaded when the device node is accessed:
        $ /sbin/udevd --debug
        ...
        static_dev_create_from_modules: mknod '/dev/cpu/microcode' c10:184
        static_dev_create_from_modules: mknod '/dev/fuse' c10:229
        static_dev_create_from_modules: mknod '/dev/ppp' c108:0
        static_dev_create_from_modules: mknod '/dev/net/tun' c10:200
        static_dev_create_from_modules: mknod '/dev/mapper/control' c10:235
        udev_rules_apply_static_dev_perms: chmod '/dev/net/tun' 0666
        udev_rules_apply_static_dev_perms: chmod '/dev/fuse' 0666
      
      A few device nodes are switched to statically allocated numbers, to allow
      the static nodes to work. This might also useful for systems which still run
      a plain static /dev, which is completely unsafe to use with any dynamic minor
      numbers.
      
      Note:
      The devname aliases must be limited to the *common* and *single*instance*
      device nodes, like the misc devices, and never be used for conceptually limited
      systems like the loop devices, which should rather get fixed properly and get a
      control node for losetup to talk to, instead of creating a random number of
      device nodes in advance, regardless if they are ever used.
      
      This facility is to hide the mess distros are creating with too modualized
      kernels, and just to hide that these modules are not compiled-in, and not to
      paper-over broken concepts. Thanks! :)
      
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Alasdair G Kergon <agk@redhat.com>
      Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
      Cc: Ian Kent <raven@themaw.net>
      Signed-Off-By: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      578454ff
  25. 24 5月, 2010 1 次提交
  26. 18 5月, 2010 1 次提交
  27. 14 5月, 2010 1 次提交
    • J
      drivers/net: Remove unnecessary returns from void function()s · a4b77097
      Joe Perches 提交于
      This patch removes from drivers/net/ all the unnecessary
      return; statements that precede the last closing brace of
      void functions.
      
      It does not remove the returns that are immediately
      preceded by a label as gcc doesn't like that.
      
      It also does not remove null void functions with return.
      
      Done via:
      $ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
        xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'
      
      with some cleanups by hand.
      
      Compile tested x86 allmodconfig only.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4b77097
  28. 10 5月, 2010 1 次提交
  29. 03 5月, 2010 1 次提交
    • M
      tun: add ioctl to modify vnet header size · d9d52b51
      Michael S. Tsirkin 提交于
      virtio added mergeable buffers mode where 2 bytes of extra info is put
      after vnet header but before actual data (tun does not need this data).
      In hindsight, it would have been better to add the new info *before* the
      packet: as it is, users need a lot of tricky code to skip the extra 2
      bytes in the middle of the iovec, and in fact applications seem to get
      it wrong, and only work with specific iovec layout.  The fact we might
      need to split iovec also means we might in theory overflow iovec max
      size.
      
      This patch adds a simpler way for applications to handle this,
      and future proofs the interface against further extensions,
      by making the size of the virtio net header configurable
      from userspace. As a result, tun driver will simply
      skip the extra 2 bytes on both input and output.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      d9d52b51
  30. 02 5月, 2010 1 次提交
    • E
      net: sock_def_readable() and friends RCU conversion · 43815482
      Eric Dumazet 提交于
      sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we
      need two atomic operations (and associated dirtying) per incoming
      packet.
      
      RCU conversion is pretty much needed :
      
      1) Add a new structure, called "struct socket_wq" to hold all fields
      that will need rcu_read_lock() protection (currently: a
      wait_queue_head_t and a struct fasync_struct pointer).
      
      [Future patch will add a list anchor for wakeup coalescing]
      
      2) Attach one of such structure to each "struct socket" created in
      sock_alloc_inode().
      
      3) Respect RCU grace period when freeing a "struct socket_wq"
      
      4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct
      socket_wq"
      
      5) Change sk_sleep() function to use new sk->sk_wq instead of
      sk->sk_sleep
      
      6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside
      a rcu_read_lock() section.
      
      7) Change all sk_has_sleeper() callers to :
        - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock)
        - Use wq_has_sleeper() to eventually wakeup tasks.
        - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock)
      
      8) sock_wake_async() is modified to use rcu protection as well.
      
      9) Exceptions :
        macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq"
      instead of dynamically allocated ones. They dont need rcu freeing.
      
      Some cleanups or followups are probably needed, (possible
      sk_callback_lock conversion to a spinlock for example...).
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43815482
  31. 21 4月, 2010 1 次提交
  32. 14 4月, 2010 1 次提交
    • M
      tun: orphan an skb on tx · 0110d6f2
      Michael S. Tsirkin 提交于
      The following situation was observed in the field:
      tap1 sends packets, tap2 does not consume them, as a result
      tap1 can not be closed. This happens because
      tun/tap devices can hang on to skbs undefinitely.
      
      As noted by Herbert, possible solutions include a timeout followed by a
      copy/change of ownership of the skb, or always copying/changing
      ownership if we're going into a hostile device.
      
      This patch implements the second approach.
      
      Note: one issue still remaining is that since skbs
      keep reference to tun socket and tun socket has a
      reference to tun device, we won't flush backlog,
      instead simply waiting for all skbs to get transmitted.
      At least this is not user-triggerable, and
      this was not reported in practice, my assumption is
      other devices besides tap complete an skb
      within finite time after it has been queued.
      
      A possible solution for the second issue
      would not to have socket reference the device,
      instead, implement dev->destructor for tun, and
      wait for all skbs to complete there, but this
      needs some thought, probably too risky for 2.6.34.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Tested-by: NYan Vugenfirer <yvugenfi@redhat.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0110d6f2
  33. 18 2月, 2010 1 次提交
  34. 09 2月, 2010 1 次提交
  35. 15 1月, 2010 1 次提交
  36. 27 12月, 2009 1 次提交
  37. 07 11月, 2009 1 次提交
    • A
      net/tun: handle compat_ioctl directly · 50857e2a
      Arnd Bergmann 提交于
      The tun driver is the only code in the kernel that operates
      on a character device with struct ifreq. Change the driver
      to handle the conversion itself so we can contain the
      remaining ifreq handling in the socket layer.
      
      This also fixes a bug in the handling of invalid ioctl
      numbers on an unbound tun device. The driver treats this
      as a TUNSETIFF in native mode, but there is no way for
      the generic compat_ioctl() function to emulate this
      behaviour. Possibly the driver was only doing this
      accidentally anyway, but if any code relies on this
      misfeature, it now also works in compat mode.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50857e2a