1. 25 11月, 2010 1 次提交
    • T
      xps: Transmit Packet Steering · 1d24eb48
      Tom Herbert 提交于
      This patch implements transmit packet steering (XPS) for multiqueue
      devices.  XPS selects a transmit queue during packet transmission based
      on configuration.  This is done by mapping the CPU transmitting the
      packet to a queue.  This is the transmit side analogue to RPS-- where
      RPS is selecting a CPU based on receive queue, XPS selects a queue
      based on the CPU (previously there was an XPS patch from Eric
      Dumazet, but that might more appropriately be called transmit completion
      steering).
      
      Each transmit queue can be associated with a number of CPUs which will
      use the queue to send packets.  This is configured as a CPU mask on a
      per queue basis in:
      
      /sys/class/net/eth<n>/queues/tx-<n>/xps_cpus
      
      The mappings are stored per device in an inverted data structure that
      maps CPUs to queues.  In the netdevice structure this is an array of
      num_possible_cpu structures where each structure holds and array of
      queue_indexes for queues which that CPU can use.
      
      The benefits of XPS are improved locality in the per queue data
      structures.  Also, transmit completions are more likely to be done
      nearer to the sending thread, so this should promote locality back
      to the socket on free (e.g. UDP).  The benefits of XPS are dependent on
      cache hierarchy, application load, and other factors.  XPS would
      nominally be configured so that a queue would only be shared by CPUs
      which are sharing a cache, the degenerative configuration woud be that
      each CPU has it's own queue.
      
      Below are some benchmark results which show the potential benfit of
      this patch.  The netperf test has 500 instances of netperf TCP_RR test
      with 1 byte req. and resp.
      
      bnx2x on 16 core AMD
         XPS (16 queues, 1 TX queue per CPU)  1234K at 100% CPU
         No XPS (16 queues)                   996K at 100% CPU
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d24eb48
  2. 16 11月, 2010 3 次提交
  3. 09 11月, 2010 2 次提交
  4. 26 10月, 2010 4 次提交
  5. 21 10月, 2010 4 次提交
    • S
      napi: unexport napi_reuse_skb · d0c2b0d2
      stephen hemminger 提交于
      The function napi_reuse_skb is only used inside core.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0c2b0d2
    • J
      vlan: Centralize handling of hardware acceleration. · 3701e513
      Jesse Gross 提交于
      Currently each driver that is capable of vlan hardware acceleration
      must be aware of the vlan groups that are configured and then pass
      the stripped tag to a specialized receive function.  This is
      
      different from other types of hardware offload in that it places a
      significant amount of knowledge in the driver itself rather keeping
      it in the networking core.
      
      This makes vlan offloading function more similarly to other forms
      of offloading (such as checksum offloading or TSO) by doing the
      following:
      * On receive, stripped vlans are passed directly to the network
      core, without attempting to check for vlan groups or reconstructing
      the header if no group
      * vlans are made less special by folding the logic into the main
      receive routines
      * On transmit, the device layer will add the vlan header in software
      if the hardware doesn't support it, instead of spreading that logic
      out in upper layers, such as bonding.
      
      There are a number of advantages to this:
      * Fixes all bugs with drivers incorrectly dropping vlan headers at once.
      * Avoids having to disable VLAN acceleration when in promiscuous mode
      (good for bridging since it always puts devices in promiscuous mode).
      * Keeps VLAN tag separate until given to ultimate consumer, which
      avoids needing to do header reconstruction as in tg3 unless absolutely
      necessary.
      * Consolidates common code in core networking.
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3701e513
    • J
      vlan: Avoid hash table lookup to find group. · 65ac6a5f
      Jesse Gross 提交于
      A struct net_device always maps to zero or one vlan groups and we
      always know the device when we are looking up a group.  We currently
      do a hash table lookup on the device to find the group but it is
      much simpler to just store a pointer.
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      65ac6a5f
    • J
      vlan: Enable software emulation for vlan accleration. · 7b9c6090
      Jesse Gross 提交于
      Currently users of hardware vlan accleration need to know whether
      the device supports it before generating packets.  However, vlan
      acceleration will soon be available in a more flexible manner so
      knowing ahead of time becomes much more difficult.  This adds
      a software fallback path for vlan packets on devices without the
      necessary offloading support, similar to other types of hardware
      accleration.
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b9c6090
  6. 20 10月, 2010 1 次提交
  7. 13 10月, 2010 1 次提交
    • E
      net: percpu net_device refcount · 29b4433d
      Eric Dumazet 提交于
      We tried very hard to remove all possible dev_hold()/dev_put() pairs in
      network stack, using RCU conversions.
      
      There is still an unavoidable device refcount change for every dst we
      create/destroy, and this can slow down some workloads (routers or some
      app servers, mmap af_packet)
      
      We can switch to a percpu refcount implementation, now dynamic per_cpu
      infrastructure is mature. On a 64 cpus machine, this consumes 256 bytes
      per device.
      
      On x86, dev_hold(dev) code :
      
      before
              lock    incl 0x280(%ebx)
      after:
              movl    0x260(%ebx),%eax
              incl    fs:(%eax)
      
      Stress bench :
      
      (Sending 160.000.000 UDP frames,
      IP route cache disabled, dual E5540 @2.53GHz,
      32bit kernel, FIB_TRIE)
      
      Before:
      
      real    1m1.662s
      user    0m14.373s
      sys     12m55.960s
      
      After:
      
      real    0m51.179s
      user    0m15.329s
      sys     10m15.942s
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29b4433d
  8. 12 10月, 2010 1 次提交
    • E
      neigh: speedup neigh_hh_init() · 34d101dd
      Eric Dumazet 提交于
      When a new dst is used to send a frame, neigh_resolve_output() tries to
      associate an struct hh_cache to this dst, calling neigh_hh_init() with
      the neigh rwlock write locked.
      
      Most of the time, hh_cache is already known and linked into neighbour,
      so we find it and increment its refcount.
      
      This patch changes the logic so that we call neigh_hh_init() with
      neighbour lock read locked only, so that fast path can be run in
      parallel by concurrent cpus.
      
      This brings part of the speedup we got with commit c7d4426a
      (introduce DST_NOCACHE flag) for non cached dsts, even for cached ones,
      removing one of the contention point that routers hit on multiqueue
      enabled machines.
      
      Further improvements would need to use a seqlock instead of an rwlock to
      protect neigh->ha[], to not dirty neigh too often and remove two atomic
      ops.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34d101dd
  9. 06 10月, 2010 1 次提交
    • E
      net: add a core netdev->rx_dropped counter · caf586e5
      Eric Dumazet 提交于
      In various situations, a device provides a packet to our stack and we
      drop it before it enters protocol stack :
      - softnet backlog full (accounted in /proc/net/softnet_stat)
      - bad vlan tag (not accounted)
      - unknown/unregistered protocol (not accounted)
      
      We can handle a per-device counter of such dropped frames at core level,
      and automatically adds it to the device provided stats (rx_dropped), so
      that standard tools can be used (ifconfig, ip link, cat /proc/net/dev)
      
      This is a generalization of commit 8990f468 (net: rx_dropped
      accounting), thus reverting it.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      caf586e5
  10. 05 10月, 2010 1 次提交
  11. 30 9月, 2010 2 次提交
  12. 28 9月, 2010 3 次提交
  13. 27 9月, 2010 1 次提交
  14. 24 9月, 2010 1 次提交
  15. 20 9月, 2010 1 次提交
  16. 17 9月, 2010 1 次提交
    • E
      net: shrinks struct net_device · cd13539b
      Eric Dumazet 提交于
      commit ab95bfe0 (net: replace hooks in __netif_receive_skb) added
      rx_handler at wrong place, between two cache line aligned objects,
      creating a big hole (a full cache line)
      
      Move rx_handler and rx_handler_data before rx_queue, filling existing
      hole.
      
      Move master field in the cache line(s) used in receive path.
      
      This saves 64 bytes (or L1_CACHE_BYTES), and avoids two possible
      cache misses in receive path.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd13539b
  17. 16 9月, 2010 1 次提交
  18. 02 9月, 2010 1 次提交
  19. 01 9月, 2010 1 次提交
  20. 24 8月, 2010 1 次提交
  21. 23 8月, 2010 1 次提交
  22. 17 8月, 2010 1 次提交
  23. 25 7月, 2010 1 次提交
    • S
      sysfs: add attribute to indicate hw address assignment type · c1f79426
      Stefan Assmann 提交于
      Add addr_assign_type to struct net_device and expose it via sysfs.
      This new attribute has the purpose of giving user-space the ability to
      distinguish between different assignment types of MAC addresses.
      
      For example user-space can treat NICs with randomly generated MAC
      addresses differently than NICs that have permanent (locally assigned)
      MAC addresses.
      For the former udev could write a persistent net rule by matching the
      device path instead of the MAC address.
      There's also the case of devices that 'steal' MAC addresses from slave
      devices. In which it is also be beneficial for user-space to be aware
      of the fact.
      
      This patch also introduces a helper function to assist adoption of
      drivers that generate MAC addresses randomly.
      Signed-off-by: NStefan Assmann <sassmann@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c1f79426
  24. 20 7月, 2010 1 次提交
  25. 19 7月, 2010 2 次提交
  26. 10 7月, 2010 2 次提交
    • B
      net: Document that dev_get_stats() returns the given pointer · d7753516
      Ben Hutchings 提交于
      Document that dev_get_stats() returns the same stats pointer it was
      given.  Remove const qualification from the returned pointer since the
      caller may do what it likes with that structure.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7753516
    • B
      net: Get rid of rtnl_link_stats64 / net_device_stats union · 3cfde79c
      Ben Hutchings 提交于
      In commit be1f3c2c "net: Enable 64-bit
      net device statistics on 32-bit architectures" I redefined struct
      net_device_stats so that it could be used in a union with struct
      rtnl_link_stats64, avoiding the need for explicit copying or
      conversion between the two.  However, this is unsafe because there is
      no locking required and no lock consistently held around calls to
      dev_get_stats() and use of the statistics structure it returns.
      
      In commit 28172739 "net: fix 64 bit
      counters on 32 bit arches" Eric Dumazet dealt with that problem by
      requiring callers of dev_get_stats() to provide storage for the
      result.  This means that the net_device::stats64 field and the padding
      in struct net_device_stats are now redundant, so remove them.
      
      Update the comment on net_device_ops::ndo_get_stats64 to reflect its
      new usage.
      
      Change dev_txq_stats_fold() to use struct rtnl_link_stats64, since
      that is what all its callers are really using and it is no longer
      going to be compatible with struct net_device_stats.
      
      Eric Dumazet suggested the separate function for the structure
      conversion.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3cfde79c