1. 06 4月, 2010 2 次提交
  2. 04 4月, 2010 2 次提交
  3. 03 4月, 2010 1 次提交
  4. 02 4月, 2010 3 次提交
  5. 31 3月, 2010 2 次提交
  6. 29 3月, 2010 2 次提交
  7. 28 3月, 2010 2 次提交
  8. 26 3月, 2010 1 次提交
  9. 24 3月, 2010 1 次提交
  10. 23 3月, 2010 1 次提交
  11. 22 3月, 2010 5 次提交
  12. 19 3月, 2010 3 次提交
  13. 17 3月, 2010 4 次提交
    • J
      net: core: add IFLA_STATS64 support · 10708f37
      Jan Engelhardt 提交于
      `ip -s link` shows interface counters truncated to 32 bit. This is
      because interface statistics are transported only in 32-bit quantity
      to userspace. This commit adds a new IFLA_STATS64 attribute that
      exports them in full 64 bit.
      
      References: http://lkml.indiana.edu/hypermail/linux/kernel/0307.3/0215.htmlSigned-off-by: NJan Engelhardt <jengelh@medozas.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10708f37
    • E
      net: remove rcu locking from fib_rules_event() · 2fb3573d
      Eric Dumazet 提交于
      We hold RTNL at this point and dont use RCU variants of list traversals,
      we dont need rcu_read_lock()/rcu_read_unlock()
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2fb3573d
    • T
      rps: Receive Packet Steering · 0a9627f2
      Tom Herbert 提交于
      This patch implements software receive side packet steering (RPS).  RPS
      distributes the load of received packet processing across multiple CPUs.
      
      Problem statement: Protocol processing done in the NAPI context for received
      packets is serialized per device queue and becomes a bottleneck under high
      packet load.  This substantially limits pps that can be achieved on a single
      queue NIC and provides no scaling with multiple cores.
      
      This solution queues packets early on in the receive path on the backlog queues
      of other CPUs.   This allows protocol processing (e.g. IP and TCP) to be
      performed on packets in parallel.   For each device (or each receive queue in
      a multi-queue device) a mask of CPUs is set to indicate the CPUs that can
      process packets. A CPU is selected on a per packet basis by hashing contents
      of the packet header (e.g. the TCP or UDP 4-tuple) and using the result to index
      into the CPU mask.  The IPI mechanism is used to raise networking receive
      softirqs between CPUs.  This effectively emulates in software what a multi-queue
      NIC can provide, but is generic requiring no device support.
      
      Many devices now provide a hash over the 4-tuple on a per packet basis
      (e.g. the Toeplitz hash).  This patch allow drivers to set the HW reported hash
      in an skb field, and that value in turn is used to index into the RPS maps.
      Using the HW generated hash can avoid cache misses on the packet when
      steering it to a remote CPU.
      
      The CPU mask is set on a per device and per queue basis in the sysfs variable
      /sys/class/net/<device>/queues/rx-<n>/rps_cpus.  This is a set of canonical
      bit maps for receive queues in the device (numbered by <n>).  If a device
      does not support multi-queue, a single variable is used for the device (rx-0).
      
      Generally, we have found this technique increases pps capabilities of a single
      queue device with good CPU utilization.  Optimal settings for the CPU mask
      seem to depend on architectures and cache hierarcy.  Below are some results
      running 500 instances of netperf TCP_RR test with 1 byte req. and resp.
      Results show cumulative transaction rate and system CPU utilization.
      
      e1000e on 8 core Intel
         Without RPS: 108K tps at 33% CPU
         With RPS:    311K tps at 64% CPU
      
      forcedeth on 16 core AMD
         Without RPS: 156K tps at 15% CPU
         With RPS:    404K tps at 49% CPU
      
      bnx2x on 16 core AMD
         Without RPS  567K tps at 61% CPU (4 HW RX queues)
         Without RPS  738K tps at 96% CPU (8 HW RX queues)
         With RPS:    854K tps at 76% CPU (4 HW RX queues)
      
      Caveats:
      - The benefits of this patch are dependent on architecture and cache hierarchy.
      Tuning the masks to get best performance is probably necessary.
      - This patch adds overhead in the path for processing a single packet.  In
      a lightly loaded server this overhead may eliminate the advantages of
      increased parallelism, and possibly cause some relative performance degradation.
      We have found that masks that are cache aware (share same caches with
      the interrupting CPU) mitigate much of this.
      - The RPS masks can be changed dynamically, however whenever the mask is changed
      this introduces the possibility of generating out of order packets.  It's
      probably best not change the masks too frequently.
      Signed-off-by: NTom Herbert <therbert@google.com>
      
       include/linux/netdevice.h |   32 ++++-
       include/linux/skbuff.h    |    3 +
       net/core/dev.c            |  335 +++++++++++++++++++++++++++++++++++++--------
       net/core/net-sysfs.c      |  225 ++++++++++++++++++++++++++++++-
       net/core/skbuff.c         |    2 +
       5 files changed, 538 insertions(+), 59 deletions(-)
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a9627f2
    • J
      NET: netpoll, fix potential NULL ptr dereference · 21edbb22
      Jiri Slaby 提交于
      Stanse found that one error path in netpoll_setup dereferences npinfo
      even though it is NULL. Avoid that by adding new label and go to that
      instead.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: Daniel Borkmann <danborkmann@googlemail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Acked-by: chavey@google.com
      Acked-by: NMatt Mackall <mpm@selenic.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21edbb22
  14. 10 3月, 2010 2 次提交
  15. 09 3月, 2010 1 次提交
  16. 08 3月, 2010 1 次提交
  17. 06 3月, 2010 4 次提交
    • J
      ethtool: Add direct access to ops->get_sset_count · d17792eb
      Jeff Garzik 提交于
      On 03/04/2010 09:26 AM, Ben Hutchings wrote:
      > On Thu, 2010-03-04 at 00:51 -0800, Jeff Kirsher wrote:
      >> From: Jeff Garzik<jgarzik@redhat.com>
      >>
      >> This patch is an alternative approach for accessing string
      >> counts, vs. the drvinfo indirect approach.  This way the drvinfo
      >> space doesn't run out, and we don't break ABI later.
      > [...]
      >> --- a/net/core/ethtool.c
      >> +++ b/net/core/ethtool.c
      >> @@ -214,6 +214,10 @@ static noinline int ethtool_get_drvinfo(struct net_device *dev, void __user *use
      >>   	info.cmd = ETHTOOL_GDRVINFO;
      >>   	ops->get_drvinfo(dev,&info);
      >>
      >> +	/*
      >> +	 * this method of obtaining string set info is deprecated;
      >> +	 * consider using ETHTOOL_GSSET_INFO instead
      >> +	 */
      >
      > This comment belongs on the interface (ethtool.h) not the
      > implementation.
      
      Debatable -- the current comment is located at the callsite of
      ops->get_sset_count(), which is where an implementor might think to add
      a new call.  Not all the numeric fields in ethtool_drvinfo are obtained
      from ->get_sset_count().
      
      Hence the "some" in the attached patch to include/linux/ethtool.h,
      addressing your comment.
      
      > [...]
      >> +static noinline int ethtool_get_sset_info(struct net_device *dev,
      >> +                                          void __user *useraddr)
      >> +{
      > [...]
      >> +	/* calculate size of return buffer */
      >> +	for (i = 0; i<  64; i++)
      >> +		if (sset_mask&  (1ULL<<  i))
      >> +			n_bits++;
      > [...]
      >
      > We have a function for this:
      >
      > 	n_bits = hweight64(sset_mask);
      
      Agreed.
      
      I've attached a follow-up patch, which should enable my/Jeff's kernel
      patch to be applied, followed by this one.
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d17792eb
    • J
      ethtool: Add direct access to ops->get_sset_count · 723b2f57
      Jeff Garzik 提交于
      This patch is an alternative approach for accessing string
      counts, vs. the drvinfo indirect approach.  This way the drvinfo
      space doesn't run out, and we don't break ABI later.
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      Signed-off-by: NPeter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      723b2f57
    • Z
      net: backlog functions rename · a3a858ff
      Zhu Yi 提交于
      sk_add_backlog -> __sk_add_backlog
      sk_add_backlog_limited -> sk_add_backlog
      Signed-off-by: NZhu Yi <yi.zhu@intel.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3a858ff
    • Z
      net: add limit for socket backlog · 8eae939f
      Zhu Yi 提交于
      We got system OOM while running some UDP netperf testing on the loopback
      device. The case is multiple senders sent stream UDP packets to a single
      receiver via loopback on local host. Of course, the receiver is not able
      to handle all the packets in time. But we surprisingly found that these
      packets were not discarded due to the receiver's sk->sk_rcvbuf limit.
      Instead, they are kept queuing to sk->sk_backlog and finally ate up all
      the memory. We believe this is a secure hole that a none privileged user
      can crash the system.
      
      The root cause for this problem is, when the receiver is doing
      __release_sock() (i.e. after userspace recv, kernel udp_recvmsg ->
      skb_free_datagram_locked -> release_sock), it moves skbs from backlog to
      sk_receive_queue with the softirq enabled. In the above case, multiple
      busy senders will almost make it an endless loop. The skbs in the
      backlog end up eat all the system memory.
      
      The issue is not only for UDP. Any protocols using socket backlog is
      potentially affected. The patch adds limit for socket backlog so that
      the backlog size cannot be expanded endlessly.
      Reported-by: NAlex Shi <alex.shi@intel.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru
      Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
      Cc: Sridhar Samudrala <sri@us.ibm.com>
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Cc: Allan Stephens <allan.stephens@windriver.com>
      Cc: Andrew Hendry <andrew.hendry@gmail.com>
      Signed-off-by: NZhu Yi <yi.zhu@intel.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8eae939f
  18. 01 3月, 2010 1 次提交
  19. 28 2月, 2010 1 次提交
  20. 27 2月, 2010 1 次提交
    • P
      rtnetlink: support specifying device flags on device creation · 3729d502
      Patrick McHardy 提交于
      commit e8469ed959c373c2ff9e6f488aa5a14971aebe1f
      Author: Patrick McHardy <kaber@trash.net>
      Date:   Tue Feb 23 20:41:30 2010 +0100
      
      Support specifying the initial device flags when creating a device though
      rtnl_link. Devices allocated by rtnl_create_link() are marked as INITIALIZING
      in order to surpress netlink registration notifications. To complete setup,
      rtnl_configure_link() must be called, which performs the device flag changes
      and invokes the deferred notifiers if everything went well.
      
      Two examples:
      
      # add macvlan to eth0
      #
      $ ip link add link eth0 up allmulticast on type macvlan
      
      [LINK]11: macvlan0@eth0: <BROADCAST,MULTICAST,ALLMULTI,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
          link/ether 26:f8:84:02:f9:2a brd ff:ff:ff:ff:ff:ff
      [ROUTE]ff00::/8 dev macvlan0  table local  metric 256  mtu 1500 advmss 1440 hoplimit 0
      [ROUTE]fe80::/64 dev macvlan0  proto kernel  metric 256  mtu 1500 advmss 1440 hoplimit 0
      [LINK]11: macvlan0@eth0: <BROADCAST,MULTICAST,ALLMULTI,UP,LOWER_UP> mtu 1500
          link/ether 26:f8:84:02:f9:2a
      [ADDR]11: macvlan0    inet6 fe80::24f8:84ff:fe02:f92a/64 scope link
             valid_lft forever preferred_lft forever
      [ROUTE]local fe80::24f8:84ff:fe02:f92a via :: dev lo  table local  proto none  metric 0  mtu 16436 advmss 16376 hoplimit 0
      [ROUTE]default via fe80::215:e9ff:fef0:10f8 dev macvlan0  proto kernel  metric 1024  mtu 1500 advmss 1440 hoplimit 0
      [NEIGH]fe80::215:e9ff:fef0:10f8 dev macvlan0 lladdr 00:15:e9:f0:10:f8 router STALE
      [ROUTE]2001:6f8:974::/64 dev macvlan0  proto kernel  metric 256  expires 0sec mtu 1500 advmss 1440 hoplimit 0
      [PREFIX]prefix 2001:6f8:974::/64 dev macvlan0 onlink autoconf valid 14400 preferred 131084
      [ADDR]11: macvlan0    inet6 2001:6f8:974:0:24f8:84ff:fe02:f92a/64 scope global dynamic
             valid_lft 86399sec preferred_lft 14399sec
      
      # add VLAN to eth1, eth1 is down
      #
      $ ip link add link eth1 up type vlan id 1000
      RTNETLINK answers: Network is down
      
      <no events>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3729d502