1. 23 3月, 2010 1 次提交
  2. 17 3月, 2010 1 次提交
    • T
      rps: Receive Packet Steering · 0a9627f2
      Tom Herbert 提交于
      This patch implements software receive side packet steering (RPS).  RPS
      distributes the load of received packet processing across multiple CPUs.
      
      Problem statement: Protocol processing done in the NAPI context for received
      packets is serialized per device queue and becomes a bottleneck under high
      packet load.  This substantially limits pps that can be achieved on a single
      queue NIC and provides no scaling with multiple cores.
      
      This solution queues packets early on in the receive path on the backlog queues
      of other CPUs.   This allows protocol processing (e.g. IP and TCP) to be
      performed on packets in parallel.   For each device (or each receive queue in
      a multi-queue device) a mask of CPUs is set to indicate the CPUs that can
      process packets. A CPU is selected on a per packet basis by hashing contents
      of the packet header (e.g. the TCP or UDP 4-tuple) and using the result to index
      into the CPU mask.  The IPI mechanism is used to raise networking receive
      softirqs between CPUs.  This effectively emulates in software what a multi-queue
      NIC can provide, but is generic requiring no device support.
      
      Many devices now provide a hash over the 4-tuple on a per packet basis
      (e.g. the Toeplitz hash).  This patch allow drivers to set the HW reported hash
      in an skb field, and that value in turn is used to index into the RPS maps.
      Using the HW generated hash can avoid cache misses on the packet when
      steering it to a remote CPU.
      
      The CPU mask is set on a per device and per queue basis in the sysfs variable
      /sys/class/net/<device>/queues/rx-<n>/rps_cpus.  This is a set of canonical
      bit maps for receive queues in the device (numbered by <n>).  If a device
      does not support multi-queue, a single variable is used for the device (rx-0).
      
      Generally, we have found this technique increases pps capabilities of a single
      queue device with good CPU utilization.  Optimal settings for the CPU mask
      seem to depend on architectures and cache hierarcy.  Below are some results
      running 500 instances of netperf TCP_RR test with 1 byte req. and resp.
      Results show cumulative transaction rate and system CPU utilization.
      
      e1000e on 8 core Intel
         Without RPS: 108K tps at 33% CPU
         With RPS:    311K tps at 64% CPU
      
      forcedeth on 16 core AMD
         Without RPS: 156K tps at 15% CPU
         With RPS:    404K tps at 49% CPU
      
      bnx2x on 16 core AMD
         Without RPS  567K tps at 61% CPU (4 HW RX queues)
         Without RPS  738K tps at 96% CPU (8 HW RX queues)
         With RPS:    854K tps at 76% CPU (4 HW RX queues)
      
      Caveats:
      - The benefits of this patch are dependent on architecture and cache hierarchy.
      Tuning the masks to get best performance is probably necessary.
      - This patch adds overhead in the path for processing a single packet.  In
      a lightly loaded server this overhead may eliminate the advantages of
      increased parallelism, and possibly cause some relative performance degradation.
      We have found that masks that are cache aware (share same caches with
      the interrupting CPU) mitigate much of this.
      - The RPS masks can be changed dynamically, however whenever the mask is changed
      this introduces the possibility of generating out of order packets.  It's
      probably best not change the masks too frequently.
      Signed-off-by: NTom Herbert <therbert@google.com>
      
       include/linux/netdevice.h |   32 ++++-
       include/linux/skbuff.h    |    3 +
       net/core/dev.c            |  335 +++++++++++++++++++++++++++++++++++++--------
       net/core/net-sysfs.c      |  225 ++++++++++++++++++++++++++++++-
       net/core/skbuff.c         |    2 +
       5 files changed, 538 insertions(+), 59 deletions(-)
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a9627f2
  3. 20 2月, 2010 1 次提交
  4. 26 11月, 2009 1 次提交
  5. 31 10月, 2009 1 次提交
  6. 28 10月, 2009 1 次提交
  7. 08 10月, 2009 1 次提交
    • J
      wext: refactor · 3d23e349
      Johannes Berg 提交于
      Refactor wext to
       * split out iwpriv handling
       * split out iwspy handling
       * split out procfs support
       * allow cfg80211 to have wireless extensions compat code
         w/o CONFIG_WIRELESS_EXT
      
      After this, drivers need to
       - select WIRELESS_EXT	- for wext support
       - select WEXT_PRIV	- for iwpriv support
       - select WEXT_SPY	- for iwspy support
      
      except cfg80211 -- which gets new hooks in wext-core.c
      and can then get wext handlers without CONFIG_WIRELESS_EXT.
      
      Wireless extensions procfs support is auto-selected
      based on PROC_FS and anything that requires the wext core
      (i.e. WIRELESS_EXT or CFG80211_WEXT).
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      3d23e349
  8. 05 10月, 2009 2 次提交
    • J
      wext: let get_wireless_stats() sleep · a160ee69
      Johannes Berg 提交于
      A number of drivers (recently including cfg80211-based ones)
      assume that all wireless handlers, including statistics, can
      sleep and they often also implicitly assume that the rtnl is
      held around their invocation. This is almost always true now
      except when reading from sysfs:
      
        BUG: sleeping function called from invalid context at kernel/mutex.c:280
        in_atomic(): 1, irqs_disabled(): 0, pid: 10450, name: head
        2 locks held by head/10450:
         #0:  (&buffer->mutex){+.+.+.}, at: [<c10ceb99>] sysfs_read_file+0x24/0xf4
         #1:  (dev_base_lock){++.?..}, at: [<c12844ee>] wireless_show+0x1a/0x4c
        Pid: 10450, comm: head Not tainted 2.6.32-rc3 #1
        Call Trace:
         [<c102301c>] __might_sleep+0xf0/0xf7
         [<c1324355>] mutex_lock_nested+0x1a/0x33
         [<f8cea53b>] wdev_lock+0xd/0xf [cfg80211]
         [<f8cea58f>] cfg80211_wireless_stats+0x45/0x12d [cfg80211]
         [<c13118d6>] get_wireless_stats+0x16/0x1c
         [<c12844fe>] wireless_show+0x2a/0x4c
      
      Fix this by using the rtnl instead of dev_base_lock.
      Reported-by: NMiles Lane <miles.lane@gmail.com>
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a160ee69
    • A
      net: export device speed and duplex via sysfs · d519e17e
      Andy Gospodarek 提交于
      This patch exports the link-speed (in Mbps) and duplex of an interface
      via sysfs.  This eliminates the need to use ethtool just to check the
      link-speed.  Not requiring 'ethtool' and not relying on the SIOCETHTOOL
      ioctl should be helpful in an embedded environment where space is at a
      premium as well.
      
      NOTE: This patch also intentionally allows non-root users to check the link
      speed and duplex -- something not possible with ethtool.
      
      Here's some sample output:
      
      # cat /sys/class/net/eth0/speed
      100
      # cat /sys/class/net/eth0/duplex
      half
      # ethtool eth0
      Settings for eth0:
              Supported ports: [ TP ]
              Supported link modes:   10baseT/Half 10baseT/Full
                                      100baseT/Half 100baseT/Full
                                      1000baseT/Half 1000baseT/Full
              Supports auto-negotiation: Yes
              Advertised link modes:  Not reported
              Advertised auto-negotiation: No
              Speed: 100Mb/s
              Duplex: Half
              Port: Twisted Pair
              PHYAD: 1
              Transceiver: internal
              Auto-negotiation: off
              Supports Wake-on: g
              Wake-on: g
              Current message level: 0x000000ff (255)
              Link detected: yes
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d519e17e
  9. 29 9月, 2009 1 次提交
  10. 16 9月, 2009 1 次提交
  11. 06 8月, 2009 1 次提交
  12. 27 5月, 2009 1 次提交
  13. 19 5月, 2009 1 次提交
  14. 10 3月, 2009 1 次提交
  15. 03 3月, 2009 1 次提交
  16. 26 11月, 2008 1 次提交
  17. 20 11月, 2008 1 次提交
  18. 11 11月, 2008 1 次提交
  19. 28 10月, 2008 1 次提交
  20. 23 9月, 2008 1 次提交
    • S
      net: network device name ifalias support · 0b815a1a
      Stephen Hemminger 提交于
      This patch add support for keeping an additional character alias
      associated with an network interface. This is useful for maintaining
      the SNMP ifAlias value which is a user defined value. Routers use this
      to hold information like which circuit or line it is connected to. It
      is just an arbitrary text label on the network device.
      
      There are two exposed interfaces with this patch, the value can be
      read/written either via netlink or sysfs.
      
      This could be maintained just by the snmp daemon, but it is more
      generally useful for other management tools, and the kernel is good
      place to act as an agreed upon interface to store it.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0b815a1a
  21. 15 7月, 2008 1 次提交
  22. 18 6月, 2008 1 次提交
    • J
      bonding: Allow setting max_bonds to zero · b8a9787e
      Jay Vosburgh 提交于
      	Permit bonding to function rationally if max_bonds is set to
      zero.  This will load the module, but create no master devices (which can
      be created via sysfs).
      
      	Requires some change to bond_create_sysfs; currently, the
      netdev sysfs directory is determined from the first bonding device created,
      but this is no longer possible.  Instead, an interface from net/core is
      created to create and destroy files in net_class.
      
      	Based on a patch submitted by Phil Oester <kernel@linuxaces.com>.
      Modified by Jay Vosburgh to fix the sysfs issue mentioned above and to
      update the documentation.
      Signed-off-by: NPhil Oester <kernel@linuxace.com>
      Signed-off-by: NJay Vosburgh <fubar@us.ibm.com>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      b8a9787e
  23. 22 5月, 2008 1 次提交
  24. 03 5月, 2008 1 次提交
    • D
      netns: Fix device renaming for sysfs · aaf8cdc3
      Daniel Lezcano 提交于
      When a netdev is moved across namespaces with the
      'dev_change_net_namespace' function, the 'device_rename' function is
      used to fixup kobject and refresh the sysfs tree. The device_rename
      function will call kobject_rename and this one will check if there is
      an object with the same name and this is the case because we are
      renaming the object with the same name.
      
      The use of 'device_rename' seems for me wrong because we usually don't
      rename it but just move it across namespaces. As we just want to do a
      mini "netdev_[un]register", IMO the functions
      'netdev_[un]register_kobject' should be used instead, like an usual
      network device [un]registering.
      
      This patch replace device_rename by netdev_unregister_kobject,
      followed by netdev_register_kobject.
      
      The netdev_register_kobject will call device_initialize and will raise
      a warning indicating the device was already initialized. In order to
      fix that, I split the device initialization into a separate function
      and use it together with 'netdev_register_kobject' into
      register_netdevice. So we can safely call 'netdev_register_kobject' in
      'dev_change_net_namespace'.
      
      This fix will allow to properly use the sysfs per namespace which is
      coming from -mm tree.
      Signed-off-by: NDaniel Lezcano <dlezcano@fr.ibm.com>
      Acked-by: NBenjamin Thery <benjamin.thery@bull.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aaf8cdc3
  25. 21 4月, 2008 1 次提交
  26. 29 1月, 2008 2 次提交
  27. 24 10月, 2007 1 次提交
  28. 13 10月, 2007 1 次提交
    • K
      Driver core: change add_uevent_var to use a struct · 7eff2e7a
      Kay Sievers 提交于
      This changes the uevent buffer functions to use a struct instead of a
      long list of parameters. It does no longer require the caller to do the
      proper buffer termination and size accounting, which is currently wrong
      in some places. It fixes a known bug where parts of the uevent
      environment are overwritten because of wrong index calculations.
      
      Many thanks to Mathieu Desnoyers for finding bugs and improving the
      error handling.
      Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      
      7eff2e7a
  29. 11 10月, 2007 2 次提交
    • E
      [NET]: Fix running without sysfs · 8b41d188
      Eric W. Biederman 提交于
      When sysfs support is compiled out the kernel still keeps and maintains
      the kobject tree.  So it is not safe to skip our kobject reference counting or
      to avoid becoming members of the kobject tree.  It is safe to not add
      the networking specific sysfs attributes.
      
      This patch removes the sysfs special cases from net/core/dev.c
      renames functions from netdev_sysfs_xxxx to netdev_kobject_xxxx
      and always compiles in net-sysfs.c
      
      net-sysfs.c is modified with a CONFIG_SYSFS guard around the parts
      that are actually sysfs specific.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b41d188
    • S
      [NET]: Make NAPI polling independent of struct net_device objects. · bea3348e
      Stephen Hemminger 提交于
      Several devices have multiple independant RX queues per net
      device, and some have a single interrupt doorbell for several
      queues.
      
      In either case, it's easier to support layouts like that if the
      structure representing the poll is independant from the net
      device itself.
      
      The signature of the ->poll() call back goes from:
      
      	int foo_poll(struct net_device *dev, int *budget)
      
      to
      
      	int foo_poll(struct napi_struct *napi, int budget)
      
      The caller is returned the number of RX packets processed (or
      the number of "NAPI credits" consumed if you want to get
      abstract).  The callee no longer messes around bumping
      dev->quota, *budget, etc. because that is all handled in the
      caller upon return.
      
      The napi_struct is to be embedded in the device driver private data
      structures.
      
      Furthermore, it is the driver's responsibility to disable all NAPI
      instances in it's ->stop() device close handler.  Since the
      napi_struct is privatized into the driver's private data structures,
      only the driver knows how to get at all of the napi_struct instances
      it may have per-device.
      
      With lots of help and suggestions from Rusty Russell, Roland Dreier,
      Michael Chan, Jeff Garzik, and Jamal Hadi Salim.
      
      Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra,
      Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan.
      
      [ Ported to current tree and all drivers converted.  Integrated
        Stephen's follow-on kerneldoc additions, and restored poll_list
        handling to the old style to fix mutual exclusion issues.  -DaveM ]
      Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bea3348e
  30. 20 5月, 2007 1 次提交
  31. 28 4月, 2007 2 次提交
  32. 26 4月, 2007 1 次提交
  33. 11 2月, 2007 1 次提交
  34. 10 2月, 2007 1 次提交
  35. 08 2月, 2007 1 次提交
  36. 26 9月, 2006 1 次提交
    • J
      [PATCH] WE-21 support (core API) · baef1865
      John W. Linville 提交于
      This is version 21 of the Wireless Extensions. Changelog :
      	o finishes migrating the ESSID API (remove the +1)
      	o netdev->get_wireless_stats is no more
      	o long/short retry
      
      This is a redacted version of a patch originally submitted by Jean
      Tourrilhes.  I removed most of the additions, in order to minimize
      future support requirements for nl80211 (or other WE successor).
      
      CC: Jean Tourrilhes <jt@hpl.hp.com>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      baef1865