1. 30 1月, 2013 15 次提交
    • C
      gianfar: Optimize struct gfar_priv_tx_q for two cache lines · 0cd3fdea
      Claudiu Manoil 提交于
      Resize and regroup structure members to eliminate memory holes and
      to pack the structure into 2 cache lines (from 3).
      tx_ring_size was resized from 4 to 2 bytes and few members were re-grouped
      in order to eliminate byte holes and achieve compactness.
      Where possible, few members were grouped according to their usage and access
      order (i.e. start_xmit vs. clean_tx_ring members), less important members
      were pushed at the end.
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0cd3fdea
    • E
      ipv6: Fix inet6_csk_bind_conflict so it builds with user namespaces enabled · 243bb4c6
      Eric W. Biederman 提交于
      When attempting to build linux-next with user namespaces enabled I ran
      into this fun build error.
      
        CC      net/ipv6/inet6_connection_sock.o
      .../net/ipv6/inet6_connection_sock.c: In function ‘inet6_csk_bind_conflict’:
      .../net/ipv6/inet6_connection_sock.c:37:12: error: incompatible types when initializing type ‘int’ using
       type ‘kuid_t’
      .../net/ipv6/inet6_connection_sock.c:54:30: error: incompatible type for argument 1 of ‘uid_eq’
      .../include/linux/uidgid.h:48:20: note: expected ‘kuid_t’ but argument is of type ‘int’
      make[3]: *** [net/ipv6/inet6_connection_sock.o] Error 1
      make[2]: *** [net/ipv6] Error 2
      make[2]: *** Waiting for unfinished jobs....
      
      Using kuid_t instead of int to hold the uid fixes this.
      
      Cc: Tom Herbert <therbert@google.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      243bb4c6
    • C
      pktgen: support net namespace · 4e58a027
      Cong Wang 提交于
      v3: make pktgen_threads list per-namespace
      v2: remove a useless check
      
      This patch add net namespace to pktgen, so that
      we can use pktgen in different namespaces.
      
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <amwang@redhat.com>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e58a027
    • F
      net: fec: add napi support to improve proformance · dc975382
      Frank Li 提交于
      Add napi support
      
      Before this patch
      
       iperf -s -i 1
       ------------------------------------------------------------
       Server listening on TCP port 5001
       TCP window size: 85.3 KByte (default)
       ------------------------------------------------------------
       [  4] local 10.192.242.153 port 5001 connected with 10.192.242.138 port 50004
       [ ID] Interval       Transfer     Bandwidth
       [  4]  0.0- 1.0 sec  41.2 MBytes   345 Mbits/sec
       [  4]  1.0- 2.0 sec  43.7 MBytes   367 Mbits/sec
       [  4]  2.0- 3.0 sec  42.8 MBytes   359 Mbits/sec
       [  4]  3.0- 4.0 sec  43.7 MBytes   367 Mbits/sec
       [  4]  4.0- 5.0 sec  42.7 MBytes   359 Mbits/sec
       [  4]  5.0- 6.0 sec  43.8 MBytes   367 Mbits/sec
       [  4]  6.0- 7.0 sec  43.0 MBytes   361 Mbits/sec
      
      After this patch
       [  4]  2.0- 3.0 sec  51.6 MBytes   433 Mbits/sec
       [  4]  3.0- 4.0 sec  51.8 MBytes   435 Mbits/sec
       [  4]  4.0- 5.0 sec  52.2 MBytes   438 Mbits/sec
       [  4]  5.0- 6.0 sec  52.1 MBytes   437 Mbits/sec
       [  4]  6.0- 7.0 sec  52.1 MBytes   437 Mbits/sec
       [  4]  7.0- 8.0 sec  52.3 MBytes   439 Mbits/sec
      Signed-off-by: NFrank Li <Frank.Li@freescale.com>
      Signed-off-by: NFugang Duan <B38611@freescale.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc975382
    • B
      ethoc: Cleanup driver format · 72aa8e1b
      Barry Grussling 提交于
      Cleanup the format of ethoc.c to meet network driver style as
      per checkpatch.pl.
      Signed-off-by: NBarry Grussling <barry@grussling.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      72aa8e1b
    • D
      ip_gre: When TOS is inherited, use configured TOS value for non-IP packets · 040468a0
      David Ward 提交于
      A GRE tunnel can be configured so that outgoing tunnel packets inherit
      the value of the TOS field from the inner IP header. In doing so, when
      a non-IP packet is transmitted through the tunnel, the TOS field will
      always be set to 0.
      
      Instead, the user should be able to configure a different TOS value as
      the fallback to use for non-IP packets. This is helpful when the non-IP
      packets are all control packets and should be handled by routers outside
      the tunnel as having Internet Control precedence. One example of this is
      the NHRP packets that control a DMVPN-compatible mGRE tunnel; they are
      encapsulated directly by GRE and do not contain an inner IP header.
      
      Under the existing behavior, the IFLA_GRE_TOS parameter must be set to
      '1' for the TOS value to be inherited. Now, only the least significant
      bit of this parameter must be set to '1', and when a non-IP packet is
      sent through the tunnel, the upper 6 bits of this same parameter will be
      copied into the TOS field. (The ECN bits get masked off as before.)
      
      This behavior is backwards-compatible with existing configurations and
      iproute2 versions.
      Signed-off-by: NDavid Ward <david.ward@ll.mit.edu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      040468a0
    • J
      ipv4: introduce address lifetime · 5c766d64
      Jiri Pirko 提交于
      There are some usecase when lifetime of ipv4 addresses might be helpful.
      For example:
      1) initramfs networkmanager uses a DHCP daemon to learn network
      configuration parameters
      2) initramfs networkmanager addresses, routes and DNS configuration
      3) initramfs networkmanager is requested to stop
      4) initramfs networkmanager stops all daemons including dhclient
      5) there are addresses and routes configured but no daemon running. If
      the system doesn't start networkmanager for some reason, addresses and
      routes will be used forever, which violates RFC 2131.
      
      This patch is essentially a backport of ivp6 address lifetime mechanism
      for ipv4 addresses.
      
      Current "ip" tool supports this without any patch (since it does not
      distinguish between ipv4 and ipv6 addresses in this perspective.
      
      Also, this should be back-compatible with all current netlink users.
      Reported-by: NPavel Šimerda <psimerda@redhat.com>
      Signed-off-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5c766d64
    • D
      Merge branch 'ipfrags' · 5a1dc317
      David S. Miller 提交于
      Jesper Dangaard Brouer says:
      
      ====================
      This patchset is V2, with some trivial code fixes, which were noticed
      by DaveM. It is still a partly respin of my fragmentation optimization
      patches: http://thread.gmane.org/gmane.linux.network/250914
      
      This is not the complete patchset, from the gmane link above. In this
      patchset, I primarily focus on adjusting cacheline for better SMP/NUMA
      performance.
      
      Once this patchset have been agreed upon, I will continue and respin
      the rest of my patches.
      
      This time around, I have created a frag DoS generator, via the tool
      trafgen (http://netsniff-ng.org/).  To create a stable DoS scenario
      (no longer relying on frame dropping due to disabled flow-control).
      
      Two 10G interfaces are under-test, and uses Ethernet flow-control.  A
      third interface is used for generating the DoS attack (this interface
      is also 10G, but it does not need to be, as 500Kpps DoS is enough).
      
      Test types summary (netperf):
       Test-20G64K     == 2x10G with 65K fragments
       Test-20G3F      == 2x10G with 3x fragments (3*1472 bytes)
       Test-20G64K+DoS == Same as 20G64K with frag DoS
       Test-20G3F+DoS  == Same as 20G3F  with frag DoS
      
      Patch list:
       Patch-01 - net: cacheline adjust struct netns_frags for better frag performance
       Patch-02 - net: cacheline adjust struct inet_frags for better frag performance
       Patch-03 - net: cacheline adjust struct inet_frag_queue
       Patch-04 - net: frag helper functions for mem limit tracking
       Patch-05 - net: use lib/percpu_counter API for fragmentation mem accounting
       Patch-06 - net: frag, move LRU list maintenance outside of rwlock
      
      Performance table summary:
      
       Test-type:  Test-20G64K    Test-20G3F  20G64K+DoS   20G3F+DoS
       ----------  -----------    ----------  ----------   ---------
        net-next:  15114.5 Mbit/s   8954.21     2444.28     3918.01 Mbit/s
        Patch-01:  16075.8 Mbit/s   8976.18     2621.49     4072.79 Mbit/s
        Patch-02:  17806.9 Mbit/s   9280.32     2478.62     4274.59 Mbit/s
        Patch-03:  17317.4 Mbit/s   9308.62     2546.05     4336.59 Mbit/s
        Patch-04:  17635.9 Mbit/s   9256.16     2535.25     4327.63 Mbit/s
        Patch-05:  18027.0 Mbit/s   9918.99     2492.62     3621.68 Mbit/s
        Patch-06:  18486.7 Mbit/s  10723.20     3657.85     4560.64 Mbit/s
      
       I cannot explain the under-DoS regression that patch-05/percpu_counter
       introduces.  But patch-06/LRU-lock corrects the situation again.
      
      Below is a testlab setup description, with links to the trafgen DoS
      packet config used.
      
      Testlab
      =======
      
      Server setup
      ------------
      The machine acting as a server:
       - 2x CPU (E5-2630)
       - Thus a NUMA arch/machine
       - 4x 10Gbit/s ports
       - NICs 2x Intel Dual port 82599 based (driver ixgbe)
      
      Setup:
       - Interfaces uses Ethernet flow control
       - Flush all iptables
       - Remove all iptables related module.
       - Kill irqbalance
       - Pin each 10G NIC port to a *single* CPU each
      
      Pinning can easily be done by command hacks::
      
       for x in /proc/irq/*/eth8*/../smp_affinity_list ; do echo 1 > $x; done
       for x in /proc/irq/*/eth9*/../smp_affinity_list ; do echo 3 > $x; done
       for x in /proc/irq/*/eth31*/../smp_affinity_list; do echo 6 > $x; done
       for x in /proc/irq/*/eth32*/../smp_affinity_list; do echo 8 > $x; done
      
      Notice NUMA setting: The CPU to NIC tying is carefully choosen
      according to the NUMA node setup.  Thus, NICs connected to a PCI-e
      slot that is connected to a physical CPU socket are tied together.
      
      Choosing only a single CPU per NIC (port) is just to ease provoking
      and debugging this performance issue. (In real setups, you can choose
      more CPU, just remember the NUMA node in the equation).
      
      Tools
      -----
      
      Netperf is used, with option -T to ensure CPU binding.
      The netserver processes, are NAPI pinned::
      
       numactl -m0 -c0 netserver
       numactl -m1 -c 1 netserver -p 1337
      
      I now have a frag DoS generator, created via the tool:
        trafgen (see: http://netsniff-ng.org/)
      
      Trafgen packet config file:
       http://people.netfilter.org/hawk/frag_work/trafgen/frag_packet03_small_frag.txf
      
      Notice, I'm using features of trafgen, recently developed by Daniel
      Borkmann, thus you need the latest git tree to use my trafgen packet
      config.
      
       git://github.com/borkmann/netsniff-ng.git
      
      Command line:
       trafgen --dev eth51 --conf frag_packet03_small_frag.txf -V -k 100 --cpus 2
      
      Tests types
      -----------
      
      Test(20G64K) UDP-64K 2x 10Gbit/s with no DoS traffic:
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
       export SIZE=$((65507)); export TIME=$((20)); export LOG=/tmp/netperf.log ;\
       netperf -p 1337 -H 192.168.31.2 -T7,7 -t UDP_STREAM -l $TIME -- -m $SIZE >> ${LOG}.31 &\
       netperf         -H 192.168.81.2 -T2,2 -t UDP_STREAM -l $TIME -- -m $SIZE >> ${LOG}.81 && \
       wait $! && tail -n3 ${LOG}.* && \
       tail -n3 ${LOG}.{31,81} | awk 'BEGIN{sum=0;} /212992        / {sum+=$4; print " +"$4} /==/ {print " file:"$2} END{print "sum:"sum" Mbit/s"}'
      
      Test(20G3F) UDP-3xfrags 2x 10Gbit/s with no DoS traffic:
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
       export SIZE=$((3*1472)); export TIME=$((20)); export LOG=/tmp/netperf.log ;\
       netperf -p 1337 -H 192.168.31.2 -T7,7 -t UDP_STREAM -l $TIME -- -m $SIZE >> ${LOG}.31 &\
       netperf         -H 192.168.81.2 -T2,2 -t UDP_STREAM -l $TIME -- -m $SIZE >> ${LOG}.81 && \
       wait $! && tail -n3 ${LOG}.* && \
      tail -n3 ${LOG}.{31,81} | awk 'BEGIN{sum=0;} /212992        / {sum+=$4; print " +"$4} /==/ {print " file:"$2} END{print "sum:"sum" Mbit/s"}'
      
      Awk script for summming results:
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      tail -n3 ${LOG}.{31,81} | awk 'BEGIN{sum=0;} /212992        / {sum+=$4; print " +"$4} /==/ {print " file:"$2} END{print "sum:"sum" Mbit/s"}'
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a1dc317
    • J
      net: frag, move LRU list maintenance outside of rwlock · 3ef0eb0d
      Jesper Dangaard Brouer 提交于
      Updating the fragmentation queues LRU (Least-Recently-Used) list,
      required taking the hash writer lock.  However, the LRU list isn't
      tied to the hash at all, so we can use a separate lock for it.
      Original-idea-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3ef0eb0d
    • J
      net: use lib/percpu_counter API for fragmentation mem accounting · 6d7b857d
      Jesper Dangaard Brouer 提交于
      Replace the per network namespace shared atomic "mem" accounting
      variable, in the fragmentation code, with a lib/percpu_counter.
      
      Getting percpu_counter to scale to the fragmentation code usage
      requires some tweaks.
      
      At first view, percpu_counter looks superfast, but it does not
      scale on multi-CPU/NUMA machines, because the default batch size
      is too small, for frag code usage.  Thus, I have adjusted the
      batch size by using __percpu_counter_add() directly, instead of
      percpu_counter_sub() and percpu_counter_add().
      
      The batch size is increased to 130.000, based on the largest 64K
      fragment memory usage.  This does introduce some imprecise
      memory accounting, but its does not need to be strict for this
      use-case.
      
      It is also essential, that the percpu_counter, does not
      share cacheline with other writers, to make this scale.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d7b857d
    • J
      net: frag helper functions for mem limit tracking · d433673e
      Jesper Dangaard Brouer 提交于
      This change is primarily a preparation to ease the extension of memory
      limit tracking.
      
      The change does reduce the number atomic operation, during freeing of
      a frag queue.  This does introduce a some performance improvement, as
      these atomic operations are at the core of the performance problems
      seen on NUMA systems.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d433673e
    • J
      net: cacheline adjust struct inet_frag_queue · 6e34a8b3
      Jesper Dangaard Brouer 提交于
      Fragmentation code cacheline adjusting of struct inet_frag_queue.
      
      Take advantage of the size of struct timer_list, and move all but
      spinlock_t lock, below the timer struct.  On 64-bit 'lru_list',
      'list' and 'refcnt', fits exactly into the next cacheline, and a
      new cacheline starts at 'fragments'.
      
      The netns_frags *net pointer is moved to the end of the struct,
      because its used in a compare, with "next/close-by" elements of
      which this struct is embedded into.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e34a8b3
    • J
      net: cacheline adjust struct inet_frags for better frag performance · 5f8e1e8b
      Jesper Dangaard Brouer 提交于
      The globally shared rwlock, of struct inet_frags, shares
      cacheline with the 'rnd' number, which is used by the hash
      calculations.  Fix this, as this obviously is a bad idea, as
      unnecessary cache-misses will occur when accessing the 'rnd'
      number.
      
      Also small note that, moving function ptr (*match) up in struct,
      is to avoid it lands on the next cacheline (on 64-bit).
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f8e1e8b
    • J
      net: cacheline adjust struct netns_frags for better frag performance · cd39a789
      Jesper Dangaard Brouer 提交于
      This small cacheline adjustment of struct netns_frags improves
      performance significantly for the fragmentation code.
      
      Struct members 'lru_list' and 'mem' are both hot elements, and it
      hurts performance, due to cacheline bouncing at every call point,
      when they share a cacheline.  Also notice, how mem is placed
      together with 'high_thresh' and 'low_thresh', as they are used in
      the compare operations together.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd39a789
    • F
      net: ks8851: convert to threaded IRQ · 656a05c8
      Felipe Balbi 提交于
      just as it should have been. It also helps
      removing the, now unnecessary, workqueue.
      Signed-off-by: NFelipe Balbi <balbi@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      656a05c8
  2. 29 1月, 2013 13 次提交
  3. 28 1月, 2013 12 次提交