提交 · e61a4b634a15c11725eac8e66b457ba411168c7f · openeuler / raspberrypi-kernel

11 6月, 2009 1 次提交

net: No more expensive sock_hold()/sock_put() on each tx · 2b85a34e

由 Eric Dumazet 提交于 6月 11, 2009

One of the problem with sock memory accounting is it uses
a pair of sock_hold()/sock_put() for each transmitted packet.

This slows down bidirectional flows because the receive path
also needs to take a refcount on socket and might use a different
cpu than transmit path or transmit completion path. So these
two atomic operations also trigger cache line bounces.

We can see this in tx or tx/rx workloads (media gateways for example),
where sock_wfree() can be in top five functions in profiles.

We use this sock_hold()/sock_put() so that sock freeing
is delayed until all tx packets are completed.

As we also update sk_wmem_alloc, we could offset sk_wmem_alloc
by one unit at init time, until sk_free() is called.
Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc)
to decrement initial offset and atomicaly check if any packets
are in flight.

skb_set_owner_w() doesnt call sock_hold() anymore

sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc
reached 0 to perform the final freeing.

Drawback is that a skb->truesize error could lead to unfreeable sockets, or
even worse, prematurely calling __sk_free() on a live socket.

Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s
on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt
contention point. 5 % speedup on a UDP transmit workload (depends
on number of flows), lowering TX completion cpu usage.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2b85a34e

09 6月, 2009 2 次提交
- D
  netfilter: Use frag list abstraction interfaces. · 343a9972
  由 David S. Miller 提交于 6月 09, 2009
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  343a9972
- D
  ipv6: Use frag list abstraction interfaces. · 4d9092bb
  由 David S. Miller 提交于 6月 09, 2009
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  4d9092bb
08 6月, 2009 1 次提交

netfilter: nf_ct_icmp: keep the ICMP ct entries longer · f87fb666

由 Jan Kasprzak 提交于 6月 08, 2009

Current conntrack code kills the ICMP conntrack entry as soon as
the first reply is received. This is incorrect, as we then see only
the first ICMP echo reply out of several possible duplicates as
ESTABLISHED, while the rest will be INVALID. Also this unnecessarily
increases the conntrackd traffic on H-A firewalls.

Make all the ICMP conntrack entries (including the replied ones)
last for the default of nf_conntrack_icmp{,v6}_timeout seconds.
Signed-off-by: NJan "Yenya" Kasprzak <kas@fi.muni.cz>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

f87fb666

04 6月, 2009 1 次提交
- E
  netfilter: x_tables: added hook number into match extension parameter structure. · a5e78820
  由 Evgeniy Polyakov 提交于 6月 04, 2009
```
Signed-off-by: NEvgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
```
  a5e78820
03 6月, 2009 2 次提交

net: skb->dst accessors · adf30907

由 Eric Dumazet 提交于 6月 02, 2009

Define three accessors to get/set dst attached to a skb

struct dst_entry *skb_dst(const struct sk_buff *skb)

void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;

Delete skb->dst field
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

adf30907

netfilter: conntrack: simplify event caching system · 17e6e4ea

由 Pablo Neira Ayuso 提交于 6月 02, 2009

This patch simplifies the conntrack event caching system by removing
several events:

 * IPCT_[*]_VOLATILE, IPCT_HELPINFO and IPCT_NATINFO has been deleted
   since the have no clients.
 * IPCT_COUNTER_FILLING which is a leftover of the 32-bits counter
   days.
 * IPCT_REFRESH which is not of any use since we always include the
   timeout in the messages.

After this patch, the existing events are:

 * IPCT_NEW, IPCT_RELATED and IPCT_DESTROY, that are used to identify
 addition and deletion of entries.
 * IPCT_STATUS, that notes that the status bits have changes,
 eg. IPS_SEEN_REPLY and IPS_ASSURED.
 * IPCT_PROTOINFO, that reports that internal protocol information has
 changed, eg. the TCP, DCCP and SCTP protocol state.
 * IPCT_HELPER, that a helper has been assigned or unassigned to this
 entry.
 * IPCT_MARK and IPCT_SECMARK, that reports that the mark has changed, this
 covers the case when a mark is set to zero.
 * IPCT_NATSEQADJ, to report that there's updates in the NAT sequence
 adjustment.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

17e6e4ea

02 6月, 2009 1 次提交

IPv6: Print error value when skb allocation fails · dae9de8e

由 Brian Haley 提交于 6月 02, 2009

Print-out the error value when sock_alloc_send_skb() fails in
the IPv6 neighbor discovery code - can be useful for debugging.
Signed-off-by: NBrian Haley <brian.haley@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dae9de8e

01 6月, 2009 1 次提交

IPv6: Add 'autoconf' and 'disable_ipv6' module parameters · 56d417b1

由 Brian Haley 提交于 6月 01, 2009

Add 'autoconf' and 'disable_ipv6' parameters to the IPv6 module.

The first controls if IPv6 addresses are autoconfigured from
prefixes received in Router Advertisements.  The IPv6 loopback
(::1) and link-local addresses are still configured.

The second controls if IPv6 addresses are desired at all.  No
IPv6 addresses will be added to any interfaces.
Signed-off-by: NBrian Haley <brian.haley@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56d417b1

27 5月, 2009 1 次提交

gro: Avoid unnecessary comparison after skb_gro_header · a5b1cf28

由 Herbert Xu 提交于 5月 26, 2009

For the overwhelming majority of cases, skb_gro_header's return
value cannot be NULL.  Yet we must check it because of its current
form.  This patch splits it up into multiple functions in order
to avoid this.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5b1cf28

22 5月, 2009 1 次提交

tcp: Unexport TCPv6 GRO functions · 36990673

由 Herbert Xu 提交于 5月 22, 2009

Sinec the TCPv6 GRO functions are used in the same file where
they are defined, we do not need to export them.  This was a
cut-n-paste from the IPv4 code which does need to export them.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

36990673

21 5月, 2009 2 次提交

IPv6: set RTPROT_KERNEL to initial route · 4f724279

由 Jean-Mickael Guerin 提交于 5月 20, 2009

The use of unspecified protocol in IPv6 initial route prevents quagga to
install IPv6 default route:
# show ipv6 route
S   ::/0 [1/0] via fe80::1, eth1_0
K>* ::/0 is directly connected, lo, rej
C>* ::1/128 is directly connected, lo
C>* fe80::/64 is directly connected, eth1_0

# ip -6 route
fe80::/64 dev eth1_0  proto kernel  metric 256  mtu 1500 advmss 1440
hoplimit -1
ff00::/8 dev eth1_0  metric 256  mtu 1500 advmss 1440 hoplimit -1
unreachable default dev lo  proto none  metric -1  error -101 hoplimit 255

The attached patch ensures RTPROT_KERNEL to the default initial route
and fixes the problem for quagga.
This is similar to "ipv6: protocol for address routes"
f410a1fb.

# show ipv6 route
S>* ::/0 [1/0] via fe80::1, eth1_0
C>* ::1/128 is directly connected, lo
C>* fe80::/64 is directly connected, eth1_0

# ip -6 route
fe80::/64 dev eth1_0  proto kernel  metric 256  mtu 1500 advmss 1440
hoplimit -1
fe80::/64 dev eth1_0  proto kernel  metric 256  mtu 1500 advmss 1440
hoplimit -1
ff00::/8 dev eth1_0  metric 256  mtu 1500 advmss 1440 hoplimit -1
default via fe80::1 dev eth1_0  proto zebra  metric 1024  mtu 1500
advmss 1440 hoplimit -1
unreachable default dev lo  proto kernel  metric -1  error -101 hoplimit 255
Signed-off-by: NJean-Mickael Guerin <jean-mickael.guerin@6wind.com>
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4f724279

net: Remove unused parameter from fill method in fib_rules_ops. · 04af8cf6

由 Rami Rosen 提交于 5月 20, 2009

The netlink message header (struct nlmsghdr) is an unused parameter in
fill method of fib_rules_ops struct.  This patch removes this
parameter from this method and fixes the places where this method is
called.

(include/net/fib_rules.h)
Signed-off-by: NRami Rosen <ramirose@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04af8cf6

20 5月, 2009 5 次提交

sit: stateless autoconf for isatap · 64506929

由 Sascha Hlusiak 提交于 5月 19, 2009

be sent periodically. The rs_delay can be speficied when adding the
PRL entry and defaults to 15 minutes.

The RS is sent from every link local adress that's assigned to the
tunnel interface. It's directed to the (guessed) linklocal address
of the router and is sent through the tunnel.

Better: send to ff02::2 encapsuled in unicast directed to router-v4.
Signed-off-by: NSascha Hlusiak <contact@saschahlusiak.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64506929

addrconf: refuse isatap eui64 for INADDR_ANY · 9af28511

由 Sascha Hlusiak 提交于 5月 19, 2009

A tunnel with no local ipv4 endpoint would otherwise use the
ISATAP linklocal address fe80::5efe:0:0, which is invalid. Rather not
add a linklocal address at all.
Signed-off-by: NSascha Hlusiak <contact@saschahlusiak.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9af28511

sit: ipip6_tunnel_del_prl: return err · 4b279601

由 Sascha Hlusiak 提交于 5月 19, 2009

Typo. When deleting a PRL entry, return status to userspace
instead of success.
Signed-off-by: NSascha Hlusiak <contact@saschahlusiak.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b279601

sit: strictly restrict incoming traffic to tunnel link device · 4fddbf5d

由 Sascha Hlusiak 提交于 5月 19, 2009

Check link device when looking up a tunnel. When a tunnel is
linked to a interface, traffic from a different interface must not
reach the tunnel.

This also allows creating of multiple tunnels with the same
endpoints, if the link device differs.
Signed-off-by: NSascha Hlusiak <contact@saschahlusiak.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4fddbf5d

sit: Fail to create tunnel, if it already exists · 8db99e57

由 Sascha Hlusiak 提交于 5月 19, 2009

When locating the tunnel, do not continue if it is found. Otherwise
a different tunnel with similar configuration would be returned and
parts could be overwritten.
Signed-off-by: NSascha Hlusiak <contact@saschahlusiak.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8db99e57

19 5月, 2009 1 次提交

net: FIX ipv6_forward sysctl restart · 5007392d

由 Eric W. Biederman 提交于 5月 13, 2009

Just returning -ERESTARTSYS without a signal pending is not
good that will just leak it to userspace.  We need return
-ERESTARTNOINTR so we always restart and set signal pending
so that we fall of the fast path of syscall return and setup
the system call restart.

So use restart_syscall() which does all of this for us.
Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5007392d

18 5月, 2009 2 次提交

net: remove needless (now buggy) & from dev->dev_addr · 3a6d54c5

由 Jiri Pirko 提交于 5月 11, 2009

Patch fixes issues with dev->dev_addr changing from array to pointer.
Hopefully there are no others.
Signed-off-by: NJiri Pirko <jpirko@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3a6d54c5

R
ipv4: remove an unused parameter from configure method of fib_rules_ops. · 8b3521ee
由 Rami Rosen 提交于 5月 11, 2009
```
Signed-off-by: NRami Rosen <ramirose@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
8b3521ee

08 5月, 2009 10 次提交
- J
  netfilter: xtables: consolidate comefrom debug cast access · bb70dfa5
  由 Jan Engelhardt 提交于 4月 15, 2009
```
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
```
  bb70dfa5
- J
  netfilter: xtables: remove another level of indent · 7a6b1c46
  由 Jan Engelhardt 提交于 4月 15, 2009
```
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
```
  7a6b1c46
- J
  netfilter: xtables: remove some goto · 9452258d
  由 Jan Engelhardt 提交于 4月 15, 2009
```
Combining two ifs, and goto is easily gone.
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
```
  9452258d
- J
  netfilter: xtables: reduce indent level by one · a1ff4ac8
  由 Jan Engelhardt 提交于 4月 15, 2009
```
Cosmetic only. Transformation applied:

	-if (foo) { long block; } else { short block; }
	+if (!foo) { short block; continue; } long block;
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
```
  a1ff4ac8
- J
  netfilter: xtables: consolidate open-coded logic · 98e86403
  由 Jan Engelhardt 提交于 4月 15, 2009
```
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
```
  98e86403
- J
  netfilter: xtables: fix const inconsistency · 4f2f6f23
  由 Jan Engelhardt 提交于 4月 15, 2009
```
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
```
  4f2f6f23
- J
  netfilter: xtables: remove redundant casts · ccf5bd8c
  由 Jan Engelhardt 提交于 4月 15, 2009
```
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
```
  ccf5bd8c
- J
  netfilter: xtables: use NFPROTO_ in standard targets · 4ba351cf
  由 Jan Engelhardt 提交于 4月 14, 2009
```
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
```
  4ba351cf
- J
  netfilter: queue: use NFPROTO_ for queue callsites · 4b1e27e9
  由 Jan Engelhardt 提交于 4月 14, 2009
```
af is an nfproto.
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
```
  4b1e27e9
- J
  netfilter: xtables: use NFPROTO_ for xt_proto_init callsites · 383ca5b8
  由 Jan Engelhardt 提交于 4月 14, 2009
```
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
```
  383ca5b8
05 5月, 2009 1 次提交

netfilter: ip6t_ipv6header: fix match on packets ending with NEXTHDR_NONE · b98b4947

由 Christoph Paasch 提交于 5月 05, 2009

As packets ending with NEXTHDR_NONE don't have a last extension header,
the check for the length needs to be after the check for NEXTHDR_NONE.
Signed-off-by: NChristoph Paasch <christoph.paasch@gmail.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

b98b4947

29 4月, 2009 1 次提交

netfilter: revised locking for x_tables · 942e4a2b

由 Stephen Hemminger 提交于 4月 28, 2009

The x_tables are organized with a table structure and a per-cpu copies
of the counters and rules. On older kernels there was a reader/writer
lock per table which was a performance bottleneck. In 2.6.30-rc, this
was converted to use RCU and the counters/rules which solved the performance
problems for do_table but made replacing rules much slower because of
the necessary RCU grace period.

This version uses a per-cpu set of spinlocks and counters to allow to
table processing to proceed without the cache thrashing of a global
reader lock and keeps the same performance for table updates.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

942e4a2b

27 4月, 2009 2 次提交

gro: Fix COMPLETE checksum handling · 36e7b1b8

由 Herbert Xu 提交于 4月 27, 2009

On a brand new GRO skb, we cannot call ip_hdr since the header
may lie in the non-linear area.  This patch adds the helper
skb_gro_network_header to handle this.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

36e7b1b8

snmp: add missing counters for RFC 4293 · edf391ff

由 Neil Horman 提交于 4月 27, 2009

The IP MIB (RFC 4293) defines stats for InOctets, OutOctets, InMcastOctets and
OutMcastOctets:
http://tools.ietf.org/html/rfc4293
But it seems we don't track those in any way that easy to separate from other
protocols.  This patch adds those missing counters to the stats file. Tested
successfully by me

With help from Eric Dumazet.
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

edf391ff

20 4月, 2009 1 次提交

syncookies: remove last_synq_overflow from struct tcp_sock · a0f82f64

由 Florian Westphal 提交于 4月 19, 2009

last_synq_overflow eats 4 or 8 bytes in struct tcp_sock, even
though it is only used when a listening sockets syn queue
is full.

We can (ab)use rx_opt.ts_recent_stamp to store the same information;
it is not used otherwise as long as a socket is in listen state.

Move linger2 around to avoid splitting struct mtu_probe
across cacheline boundary on 32 bit arches.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a0f82f64

14 4月, 2009 1 次提交

ipv6:remove useless check · ce8632ba

由 Yang Hongyang 提交于 4月 13, 2009

After switch (rthdr->type) {...},the check below is completely useless.Because:
if the type is 2,then hdrlen must be 2 and segments_left must be 1,clearly the
check is redundant;if the type is not 2,then goto sticky_done,the check is useless
too.
Signed-off-by: NYang Hongyang <yanghy@cn.fujitsu.com>
Reviewed-by: NShan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce8632ba

11 4月, 2009 1 次提交

ipv6: Fix NULL pointer dereference with time-wait sockets · 499923c7

由 Vlad Yasevich 提交于 4月 09, 2009

Commit b2f5e7cd
(ipv6: Fix conflict resolutions during ipv6 binding)
introduced a regression where time-wait sockets were
not treated correctly.  This resulted in the following:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000062
IP: [<ffffffff805d7d61>] ipv4_rcv_saddr_equal+0x61/0x70
...
Call Trace:
[<ffffffffa033847b>] ipv6_rcv_saddr_equal+0x1bb/0x250 [ipv6]
[<ffffffffa03505a8>] inet6_csk_bind_conflict+0x88/0xd0 [ipv6]
[<ffffffff805bb18e>] inet_csk_get_port+0x1ee/0x400
[<ffffffffa0319b7f>] inet6_bind+0x1cf/0x3a0 [ipv6]
[<ffffffff8056d17c>] ? sockfd_lookup_light+0x3c/0xd0
[<ffffffff8056ed49>] sys_bind+0x89/0x100
[<ffffffff80613ea2>] ? trace_hardirqs_on_thunk+0x3a/0x3c
[<ffffffff8020bf9b>] system_call_fastpath+0x16/0x1b
Tested-by: NBrian Haley <brian.haley@hp.com>
Tested-by: NEd Tomlinson <edt@aei.ca>
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

499923c7

07 4月, 2009 1 次提交

xfrm: fix fragmentation on inter family tunnels · d1d88e5d

由 Steffen Klassert 提交于 4月 06, 2009

If an ipv4 packet (not locally generated with IP_DF flag not set) bigger
than mtu size is supposed to go via a xfrm ipv6 tunnel, the packetsize
check in xfrm4_tunnel_check_size() is omited and ipv6 drops the packet
without sending a notice to the original sender of the ipv4 packet.

Another issue is that ipv4 connection tracking does reassembling of
incomming fragmented packets. If such a reassembled packet is supposed to
go via a xfrm ipv6 tunnel it will be droped, even if the original sender
did proper fragmentation.

According to RFC 2473 (section 7) tunnel ipv6 packets resulting from the
encapsulation of an original packet are considered as locally generated
packets. If such a packet passed the checks in xfrm{4,6}_tunnel_check_size()
fragmentation is allowed according to RFC 2473 (section 7.1/7.2).

This patch sets skb->local_df in xfrm6_prepare_output() to achieve
fragmentation in this case.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1d88e5d

06 4月, 2009 1 次提交

netfilter: ip6tables regression fix · 49a88d18

由 Eric Dumazet 提交于 4月 06, 2009

Commit 78454473 (netfilter: iptables: lock free counters) broke
ip6_tables by unconditionally returning ENOMEM in alloc_counters(),
Reported-by: NGraham Murray <graham@gmurray.org.uk>
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

49a88d18