提交 · 8258bd2713c3e42bc0e5664cbede0e07587c125f · openanolis / cloud-kernel

01 2月, 2013 5 次提交

net/mlx4_en: Fix vlan mask for ethtool steering rules · 8258bd27

由 Hadar Hen Zion 提交于 1月 30, 2013

The vlan mask field should be validated and assigned according to the field
size which is 12 bits. Also replace the numeric 0xfff mask with existing kernel
macro.
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8258bd27

net/mlx4_en: Validate VLAN IDs provided in ethtool flow steering rules · 69d7126b

由 Hadar Hen Zion 提交于 1月 30, 2013

When attaching flow steering rules via Ethtool accept only valid vlans IDs e.g
in the range: [0,4095].
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

69d7126b

net/mlx4_en: Fix ip/udp steering rules multicast mac when attached via ethtool · f90a3673

由 Hadar Hen Zion 提交于 1月 30, 2013

Destination mac is a mandatory specification for ip/udp steering rules.
When attaching multicast steering rules via ethtool the unicast mac of the
interface was added to the rule specification instead of the multicast mac.
The following commit sets the corresponding multicast mac for the rule multicast ip.
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f90a3673

net/mlx4_core: Set correctly allow_loopback flag · 248c62aa

由 Hadar Hen Zion 提交于 1月 30, 2013

The allow_loopback flag was wrongly set using arithmetic bit operation, change
the code to use logical bit operation.
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

248c62aa

net/mlx4_core: Directly expose fields of HW flow steering rule control segment · 015465f8

由 Hadar Hen Zion 提交于 1月 30, 2013

Some of the fields for struct mlx4_net_trans_rule_hw_ctrl were packed into u32
and accessed through bit field operations. Expose and access them directly as
u8.
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

015465f8

31 1月, 2013 15 次提交

net/vxlan: Add ethtool drvinfo · 1b13c97f

由 Yan Burman 提交于 1月 29, 2013

Implement ethtool get_drvinfo.
Signed-off-by: NYan Burman <yanb@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1b13c97f

ipv6 anycast: Convert ipv6_sk_ac_lock to spinlock. · c33e7b05

由 YOSHIFUJI Hideaki / 吉藤英明提交于 1月 30, 2013

Since all users are write-lock, it does not make sense to use
rwlock here.  Use simple spinlock.
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c33e7b05

Y
ipv6 flowlabel: Convert np->ipv6_fl_list to RCU. · 18367681
由 YOSHIFUJI Hideaki / 吉藤英明提交于 1月 30, 2013
```
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
18367681

ipv6 flowlabel: Convert hash list to RCU. · d3aedd5e

由 YOSHIFUJI Hideaki / 吉藤英明提交于 1月 30, 2013

Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3aedd5e

Y
ipv6 flowlabel: Ensure to take lock when modifying np->ip6_sk_fl_list. · f256dc59
由 YOSHIFUJI Hideaki / 吉藤英明提交于 1月 30, 2013
```
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
f256dc59

x86: bpf_jit_comp: add pkt_type support · 3b58908a

由 Eric Dumazet 提交于 1月 30, 2013

Supporting access to skb->pkt_type is a bit tricky if we want
to have a generic code, allowing pkt_type to be moved in struct sk_buff

pkt_type is a bit field, so compiler cannot really help us to find
its offset. Let's use a helper for this : It will throw a one time
message if pkt_type no longer starts at a byte boundary or is
no longer a 3bit field.
Reported-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b58908a

qlcnic: Bump up the version to 5.1.33 · 45acd3a0

由 Jitendra Kalsaria 提交于 1月 30, 2013

Signed-off-by: NJitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

45acd3a0

qlcnic: make pci_error_handlers const · fec9dd15

由 Stephen Hemminger 提交于 1月 30, 2013

Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Signed-off-by: NJitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fec9dd15

qlcnic: Fix RX/TX checksum setting for some adapter types · 22fd5ab4

由 Manish chopra 提交于 1月 30, 2013

Signed-off-by: NManish chopra <manish.chopra@qlogic.com>
Signed-off-by: NJitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22fd5ab4

qlcnic: Fix minidump in NPAR mode · 4d53f40f

由 Shahed Shaikh 提交于 1月 30, 2013

Signed-off-by: NShahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: NJitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4d53f40f

qlcnic: driver LRO bug fix · 283c1c68

由 Manish chopra 提交于 1月 30, 2013

o ipv4 address was not getting programmed properly because of
  improper byte order conversion
Signed-off-by: NManish chopra <manish.chopra@qlogic.com>
Signed-off-by: NJitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

283c1c68

qlcnic: Free irq for mailbox interrupts · cdc84dda

由 Manish chopra 提交于 1月 30, 2013

Signed-off-by: NManish chopra <manish.chopra@qlogic.com>
Signed-off-by: NJitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cdc84dda

qlcnic: Fix bug in reading HW reset template · 1403f43a

由 Manish chopra 提交于 1月 30, 2013

Signed-off-by: NManish chopra <manish.chopra@qlogic.com>
Signed-off-by: NJitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1403f43a

qlcnic: Fix sparse check endian warnings · 069048f1

由 Shahed Shaikh 提交于 1月 30, 2013

Signed-off-by: NShahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: NJitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

069048f1

bond: have random dev address by default instead of zeroes · 409cc1f8

由 Jiri Pirko 提交于 1月 30, 2013

Makes more sense to have randomly generated address by default than to
have all zeroes. It also allows user to for example put the bond into
bridge without need to have any slaves in it.

Also note that this changes only behaviour of bonds with no slaves. Once
the first slave device is enslaved, its address will be used (no change
here).

Also, fix dev_assign_type values on the way.
Reported-by: NPavel Šimerda <psimerda@redhat.com>
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NJay Vosburgh <fubar@us.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

409cc1f8

30 1月, 2013 20 次提交

net: disallow drivers with buggy VLAN accel to register_netdevice() · d2ed273d

由 Michał Mirosław 提交于 1月 29, 2013

Instead of jumping aroung bugs that are easily fixed just don't let them in:
affected drivers should be either fixed or have NETIF_F_HW_VLAN_FILTER
removed from advertised features.

Quick grep in drivers/net shows two drivers that have NETIF_F_HW_VLAN_FILTER
but not ndo_vlan_rx_add/kill_vid(), but those are false-positives (features
are commented out).

OTOH two drivers have ndo_vlan_rx_add/kill_vid() implemented but don't
advertise NETIF_F_HW_VLAN_FILTER. Those are:

+ethernet/cisco/enic/enic_main.c
+ethernet/qlogic/qlcnic/qlcnic_main.c
Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d2ed273d

Y
netfilter ipset: Use ipv6_addr_equal() where appropriate. · 29e3b160
由 YOSHIFUJI Hideaki / 吉藤英明提交于 1月 29, 2013
```
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
29e3b160
Y
netfilter ip6table_mangle: Use ipv6_addr_equal() where appropriate. · d9e85655
由 YOSHIFUJI Hideaki / 吉藤英明提交于 1月 29, 2013
```
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
d9e85655

xfrm: Convert xfrm_addr_cmp() to boolean xfrm_addr_equal(). · 70e94e66

由 YOSHIFUJI Hideaki / 吉藤英明提交于 1月 29, 2013

All users of xfrm_addr_cmp() use its result as boolean.
Introduce xfrm_addr_equal() (which is equal to !xfrm_addr_cmp())
and convert all users.
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

70e94e66

Y
xfrm: Use ipv6_addr_equal() where appropriate. · ff88b30c
由 YOSHIFUJI Hideaki / 吉藤英明提交于 1月 29, 2013
```
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
ff88b30c
Y
ipv6 mcast: Use ipv6_addr_equal() in ip6_mc_source(). · 07c2fecc
由 YOSHIFUJI Hideaki / 吉藤英明提交于 1月 29, 2013
```
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
07c2fecc
D
Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge · b53c47dd
由 David S. Miller 提交于 1月 29, 2013
```
Included changes:
- fix recently introduced output behaviour
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
b53c47dd

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · f1e7b73a

由 David S. Miller 提交于 1月 29, 2013

Bring in the 'net' tree so that we can get some ipv4/ipv6 bug
fixes that some net-next work will build upon.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f1e7b73a

ipv6: add anti-spoofing checks for 6to4 and 6rd · 218774dc

由 Hannes Frederic Sowa 提交于 1月 29, 2013

This patch adds anti-spoofing checks in sit.c as specified in RFC3964
section 5.2 for 6to4 and RFC5969 section 12 for 6rd. I left out the
checks which could easily be implemented with netfilter.

Specifically this patch adds following logic (based loosely on the
pseudocode in RFC3964 section 5.2):

if prefix (inner_src_v6) == rd6_prefix (2002::/16 is the default)
        and outer_src_v4 != embedded_ipv4 (inner_src_v6)
                drop
if prefix (inner_dst_v6) == rd6_prefix (or 2002::/16 is the default)
        and outer_dst_v4 != embedded_ipv4 (inner_dst_v6)
                drop
accept

To accomplish the specified security checks proposed by above RFCs,
it is still necessary to employ uRPF filters with netfilter. These new
checks only kick in if the employed addresses are within the 2002::/16 or
another range specified by the 6rd-prefix (which defaults to 2002::/16).

Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

218774dc

gianfar: Pack struct gfar_priv_grp into three cachelines · ee873fda

由 Claudiu Manoil 提交于 1月 29, 2013

* remove unused members(!): imask, ievent
* move space consuming interrupt name strings (int_name_* members) to
external structures, unessential for the driver's hot path
* keep high priority hot path data within the first 2 cache lines

This reduces struct gfar_priv_grp from 6 to 3 cache lines.
(Also fixed checkpatch warnings for the old code, in the process.)
Signed-off-by: NClaudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ee873fda

gianfar: Cleanup gfar_parse_group() code · 5fedcc14

由 Claudiu Manoil 提交于 1月 29, 2013

Factor out redundant code (improve readability, source code size).
Signed-off-by: NClaudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5fedcc14

gianfar: Optimize struct gfar_priv_tx_q for two cache lines · 0cd3fdea

由 Claudiu Manoil 提交于 1月 29, 2013

Resize and regroup structure members to eliminate memory holes and
to pack the structure into 2 cache lines (from 3).
tx_ring_size was resized from 4 to 2 bytes and few members were re-grouped
in order to eliminate byte holes and achieve compactness.
Where possible, few members were grouped according to their usage and access
order (i.e. start_xmit vs. clean_tx_ring members), less important members
were pushed at the end.
Signed-off-by: NClaudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0cd3fdea

ipv6: Fix inet6_csk_bind_conflict so it builds with user namespaces enabled · 243bb4c6

由 Eric W. Biederman 提交于 1月 29, 2013

When attempting to build linux-next with user namespaces enabled I ran
into this fun build error.

  CC      net/ipv6/inet6_connection_sock.o
.../net/ipv6/inet6_connection_sock.c: In function ‘inet6_csk_bind_conflict’:
.../net/ipv6/inet6_connection_sock.c:37:12: error: incompatible types when initializing type ‘int’ using
 type ‘kuid_t’
.../net/ipv6/inet6_connection_sock.c:54:30: error: incompatible type for argument 1 of ‘uid_eq’
.../include/linux/uidgid.h:48:20: note: expected ‘kuid_t’ but argument is of type ‘int’
make[3]: *** [net/ipv6/inet6_connection_sock.o] Error 1
make[2]: *** [net/ipv6] Error 2
make[2]: *** Waiting for unfinished jobs....

Using kuid_t instead of int to hold the uid fixes this.

Cc: Tom Herbert <therbert@google.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

243bb4c6

pktgen: support net namespace · 4e58a027

由 Cong Wang 提交于 1月 28, 2013

v3: make pktgen_threads list per-namespace
v2: remove a useless check

This patch add net namespace to pktgen, so that
we can use pktgen in different namespaces.

Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <amwang@redhat.com>
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e58a027

net: fec: add napi support to improve proformance · dc975382

由 Frank Li 提交于 1月 28, 2013

Add napi support

Before this patch

 iperf -s -i 1
 ------------------------------------------------------------
 Server listening on TCP port 5001
 TCP window size: 85.3 KByte (default)
 ------------------------------------------------------------
 [  4] local 10.192.242.153 port 5001 connected with 10.192.242.138 port 50004
 [ ID] Interval       Transfer     Bandwidth
 [  4]  0.0- 1.0 sec  41.2 MBytes   345 Mbits/sec
 [  4]  1.0- 2.0 sec  43.7 MBytes   367 Mbits/sec
 [  4]  2.0- 3.0 sec  42.8 MBytes   359 Mbits/sec
 [  4]  3.0- 4.0 sec  43.7 MBytes   367 Mbits/sec
 [  4]  4.0- 5.0 sec  42.7 MBytes   359 Mbits/sec
 [  4]  5.0- 6.0 sec  43.8 MBytes   367 Mbits/sec
 [  4]  6.0- 7.0 sec  43.0 MBytes   361 Mbits/sec

After this patch
 [  4]  2.0- 3.0 sec  51.6 MBytes   433 Mbits/sec
 [  4]  3.0- 4.0 sec  51.8 MBytes   435 Mbits/sec
 [  4]  4.0- 5.0 sec  52.2 MBytes   438 Mbits/sec
 [  4]  5.0- 6.0 sec  52.1 MBytes   437 Mbits/sec
 [  4]  6.0- 7.0 sec  52.1 MBytes   437 Mbits/sec
 [  4]  7.0- 8.0 sec  52.3 MBytes   439 Mbits/sec
Signed-off-by: NFrank Li <Frank.Li@freescale.com>
Signed-off-by: NFugang Duan <B38611@freescale.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc975382

ethoc: Cleanup driver format · 72aa8e1b

由 Barry Grussling 提交于 1月 27, 2013

Cleanup the format of ethoc.c to meet network driver style as
per checkpatch.pl.
Signed-off-by: NBarry Grussling <barry@grussling.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72aa8e1b

ip_gre: When TOS is inherited, use configured TOS value for non-IP packets · 040468a0

由 David Ward 提交于 1月 27, 2013

A GRE tunnel can be configured so that outgoing tunnel packets inherit
the value of the TOS field from the inner IP header. In doing so, when
a non-IP packet is transmitted through the tunnel, the TOS field will
always be set to 0.

Instead, the user should be able to configure a different TOS value as
the fallback to use for non-IP packets. This is helpful when the non-IP
packets are all control packets and should be handled by routers outside
the tunnel as having Internet Control precedence. One example of this is
the NHRP packets that control a DMVPN-compatible mGRE tunnel; they are
encapsulated directly by GRE and do not contain an inner IP header.

Under the existing behavior, the IFLA_GRE_TOS parameter must be set to
'1' for the TOS value to be inherited. Now, only the least significant
bit of this parameter must be set to '1', and when a non-IP packet is
sent through the tunnel, the upper 6 bits of this same parameter will be
copied into the TOS field. (The ECN bits get masked off as before.)

This behavior is backwards-compatible with existing configurations and
iproute2 versions.
Signed-off-by: NDavid Ward <david.ward@ll.mit.edu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

040468a0

ipv4: introduce address lifetime · 5c766d64

由 Jiri Pirko 提交于 1月 24, 2013

There are some usecase when lifetime of ipv4 addresses might be helpful.
For example:
1) initramfs networkmanager uses a DHCP daemon to learn network
configuration parameters
2) initramfs networkmanager addresses, routes and DNS configuration
3) initramfs networkmanager is requested to stop
4) initramfs networkmanager stops all daemons including dhclient
5) there are addresses and routes configured but no daemon running. If
the system doesn't start networkmanager for some reason, addresses and
routes will be used forever, which violates RFC 2131.

This patch is essentially a backport of ivp6 address lifetime mechanism
for ipv4 addresses.

Current "ip" tool supports this without any patch (since it does not
distinguish between ipv4 and ipv6 addresses in this perspective.

Also, this should be back-compatible with all current netlink users.
Reported-by: NPavel Šimerda <psimerda@redhat.com>
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c766d64

Merge branch 'ipfrags' · 5a1dc317

由 David S. Miller 提交于 1月 29, 2013

Jesper Dangaard Brouer says:

====================
This patchset is V2, with some trivial code fixes, which were noticed
by DaveM. It is still a partly respin of my fragmentation optimization
patches: http://thread.gmane.org/gmane.linux.network/250914

This is not the complete patchset, from the gmane link above. In this
patchset, I primarily focus on adjusting cacheline for better SMP/NUMA
performance.

Once this patchset have been agreed upon, I will continue and respin
the rest of my patches.

This time around, I have created a frag DoS generator, via the tool
trafgen (http://netsniff-ng.org/).  To create a stable DoS scenario
(no longer relying on frame dropping due to disabled flow-control).

Two 10G interfaces are under-test, and uses Ethernet flow-control.  A
third interface is used for generating the DoS attack (this interface
is also 10G, but it does not need to be, as 500Kpps DoS is enough).

Test types summary (netperf):
 Test-20G64K     == 2x10G with 65K fragments
 Test-20G3F      == 2x10G with 3x fragments (3*1472 bytes)
 Test-20G64K+DoS == Same as 20G64K with frag DoS
 Test-20G3F+DoS  == Same as 20G3F  with frag DoS

Patch list:
 Patch-01 - net: cacheline adjust struct netns_frags for better frag performance
 Patch-02 - net: cacheline adjust struct inet_frags for better frag performance
 Patch-03 - net: cacheline adjust struct inet_frag_queue
 Patch-04 - net: frag helper functions for mem limit tracking
 Patch-05 - net: use lib/percpu_counter API for fragmentation mem accounting
 Patch-06 - net: frag, move LRU list maintenance outside of rwlock

Performance table summary:

 Test-type:  Test-20G64K    Test-20G3F  20G64K+DoS   20G3F+DoS
 ----------  -----------    ----------  ----------   ---------
  net-next:  15114.5 Mbit/s   8954.21     2444.28     3918.01 Mbit/s
  Patch-01:  16075.8 Mbit/s   8976.18     2621.49     4072.79 Mbit/s
  Patch-02:  17806.9 Mbit/s   9280.32     2478.62     4274.59 Mbit/s
  Patch-03:  17317.4 Mbit/s   9308.62     2546.05     4336.59 Mbit/s
  Patch-04:  17635.9 Mbit/s   9256.16     2535.25     4327.63 Mbit/s
  Patch-05:  18027.0 Mbit/s   9918.99     2492.62     3621.68 Mbit/s
  Patch-06:  18486.7 Mbit/s  10723.20     3657.85     4560.64 Mbit/s

 I cannot explain the under-DoS regression that patch-05/percpu_counter
 introduces.  But patch-06/LRU-lock corrects the situation again.

Below is a testlab setup description, with links to the trafgen DoS
packet config used.

Testlab
=======

Server setup
------------
The machine acting as a server:
 - 2x CPU (E5-2630)
 - Thus a NUMA arch/machine
 - 4x 10Gbit/s ports
 - NICs 2x Intel Dual port 82599 based (driver ixgbe)

Setup:
 - Interfaces uses Ethernet flow control
 - Flush all iptables
 - Remove all iptables related module.
 - Kill irqbalance
 - Pin each 10G NIC port to a *single* CPU each

Pinning can easily be done by command hacks::

 for x in /proc/irq/*/eth8*/../smp_affinity_list ; do echo 1 > $x; done
 for x in /proc/irq/*/eth9*/../smp_affinity_list ; do echo 3 > $x; done
 for x in /proc/irq/*/eth31*/../smp_affinity_list; do echo 6 > $x; done
 for x in /proc/irq/*/eth32*/../smp_affinity_list; do echo 8 > $x; done

Notice NUMA setting: The CPU to NIC tying is carefully choosen
according to the NUMA node setup.  Thus, NICs connected to a PCI-e
slot that is connected to a physical CPU socket are tied together.

Choosing only a single CPU per NIC (port) is just to ease provoking
and debugging this performance issue. (In real setups, you can choose
more CPU, just remember the NUMA node in the equation).

Tools
-----

Netperf is used, with option -T to ensure CPU binding.
The netserver processes, are NAPI pinned::

 numactl -m0 -c0 netserver
 numactl -m1 -c 1 netserver -p 1337

I now have a frag DoS generator, created via the tool:
  trafgen (see: http://netsniff-ng.org/)

Trafgen packet config file:
 http://people.netfilter.org/hawk/frag_work/trafgen/frag_packet03_small_frag.txf

Notice, I'm using features of trafgen, recently developed by Daniel
Borkmann, thus you need the latest git tree to use my trafgen packet
config.

 git://github.com/borkmann/netsniff-ng.git

Command line:
 trafgen --dev eth51 --conf frag_packet03_small_frag.txf -V -k 100 --cpus 2

Tests types
-----------

Test(20G64K) UDP-64K 2x 10Gbit/s with no DoS traffic:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 export SIZE=$((65507)); export TIME=$((20)); export LOG=/tmp/netperf.log ;\
 netperf -p 1337 -H 192.168.31.2 -T7,7 -t UDP_STREAM -l $TIME -- -m $SIZE >> ${LOG}.31 &\
 netperf         -H 192.168.81.2 -T2,2 -t UDP_STREAM -l $TIME -- -m $SIZE >> ${LOG}.81 && \
 wait $! && tail -n3 ${LOG}.* && \
 tail -n3 ${LOG}.{31,81} | awk 'BEGIN{sum=0;} /212992        / {sum+=$4; print " +"$4} /==/ {print " file:"$2} END{print "sum:"sum" Mbit/s"}'

Test(20G3F) UDP-3xfrags 2x 10Gbit/s with no DoS traffic:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 export SIZE=$((3*1472)); export TIME=$((20)); export LOG=/tmp/netperf.log ;\
 netperf -p 1337 -H 192.168.31.2 -T7,7 -t UDP_STREAM -l $TIME -- -m $SIZE >> ${LOG}.31 &\
 netperf         -H 192.168.81.2 -T2,2 -t UDP_STREAM -l $TIME -- -m $SIZE >> ${LOG}.81 && \
 wait $! && tail -n3 ${LOG}.* && \
tail -n3 ${LOG}.{31,81} | awk 'BEGIN{sum=0;} /212992        / {sum+=$4; print " +"$4} /==/ {print " file:"$2} END{print "sum:"sum" Mbit/s"}'

Awk script for summming results:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

tail -n3 ${LOG}.{31,81} | awk 'BEGIN{sum=0;} /212992        / {sum+=$4; print " +"$4} /==/ {print " file:"$2} END{print "sum:"sum" Mbit/s"}'
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a1dc317

net: frag, move LRU list maintenance outside of rwlock · 3ef0eb0d

由 Jesper Dangaard Brouer 提交于 1月 28, 2013

Updating the fragmentation queues LRU (Least-Recently-Used) list,
required taking the hash writer lock.  However, the LRU list isn't
tied to the hash at all, so we can use a separate lock for it.
Original-idea-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ef0eb0d

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功