提交 · 6ce9e7b5fe3195d1ae6e3a0753d4ddcac5cd699e · openeuler / raspberrypi-kernel

03 9月, 2009 1 次提交

ip: Report qdisc packet drops · 6ce9e7b5

由 Eric Dumazet 提交于 9月 02, 2009

Christoph Lameter pointed out that packet drops at qdisc level where not
accounted in SNMP counters. Only if application sets IP_RECVERR, drops
are reported to user (-ENOBUFS errors) and SNMP counters updated.

IP_RECVERR is used to enable extended reliable error message passing,
but these are not needed to update system wide SNMP stats.

This patch changes things a bit to allow SNMP counters to be updated,
regardless of IP_RECVERR being set or not on the socket.

Example after an UDP tx flood
# netstat -s 
...
IP:
    1487048 outgoing packets dropped
...
Udp:
...
    SndbufErrors: 1487048


send() syscalls, do however still return an OK status, to not
break applications.

Note : send() manual page explicitly says for -ENOBUFS error :

 "The output queue for a network interface was full.
  This generally indicates that the interface has stopped sending,
  but may be caused by transient congestion.
  (Normally, this does not occur in Linux. Packets are just silently
  dropped when a device queue overflows.) "

This is not true for IP_RECVERR enabled sockets : a send() syscall
that hit a qdisc drop returns an ENOBUFS error.

Many thanks to Christoph, David, and last but not least, Alexey !
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6ce9e7b5

12 7月, 2009 1 次提交

net: ip_push_pending_frames() fix · e51a67a9

由 Eric Dumazet 提交于 7月 08, 2009

After commit 2b85a34e
(net: No more expensive sock_hold()/sock_put() on each tx)
we do not take any more references on sk->sk_refcnt on outgoing packets.

I forgot to delete two __sock_put() from ip_push_pending_frames()
and ip6_push_pending_frames().
Reported-by: NEmil S Tantilov <emils.tantilov@gmail.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Tested-by: NEmil S Tantilov <emils.tantilov@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e51a67a9

11 6月, 2009 1 次提交

net: No more expensive sock_hold()/sock_put() on each tx · 2b85a34e

由 Eric Dumazet 提交于 6月 11, 2009

One of the problem with sock memory accounting is it uses
a pair of sock_hold()/sock_put() for each transmitted packet.

This slows down bidirectional flows because the receive path
also needs to take a refcount on socket and might use a different
cpu than transmit path or transmit completion path. So these
two atomic operations also trigger cache line bounces.

We can see this in tx or tx/rx workloads (media gateways for example),
where sock_wfree() can be in top five functions in profiles.

We use this sock_hold()/sock_put() so that sock freeing
is delayed until all tx packets are completed.

As we also update sk_wmem_alloc, we could offset sk_wmem_alloc
by one unit at init time, until sk_free() is called.
Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc)
to decrement initial offset and atomicaly check if any packets
are in flight.

skb_set_owner_w() doesnt call sock_hold() anymore

sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc
reached 0 to perform the final freeing.

Drawback is that a skb->truesize error could lead to unfreeable sockets, or
even worse, prematurely calling __sk_free() on a live socket.

Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s
on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt
contention point. 5 % speedup on a UDP transmit workload (depends
on number of flows), lowering TX completion cpu usage.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2b85a34e

09 6月, 2009 1 次提交
- D
  ipv4: Use frag list abstraction interfaces. · d7fcf1a5
  由 David S. Miller 提交于 6月 09, 2009
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  d7fcf1a5
03 6月, 2009 2 次提交

net: skb->dst accessors · adf30907

由 Eric Dumazet 提交于 6月 02, 2009

Define three accessors to get/set dst attached to a skb

struct dst_entry *skb_dst(const struct sk_buff *skb)

void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;

Delete skb->dst field
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

adf30907

net: skb->rtable accessor · 511c3f92

由 Eric Dumazet 提交于 6月 02, 2009

Define skb_rtable(const struct sk_buff *skb) accessor to get rtable from skb

Delete skb->rtable field

Setting rtable is not allowed, just set dst instead as rtable is an alias.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

511c3f92

27 4月, 2009 1 次提交

snmp: add missing counters for RFC 4293 · edf391ff

由 Neil Horman 提交于 4月 27, 2009

The IP MIB (RFC 4293) defines stats for InOctets, OutOctets, InMcastOctets and
OutMcastOctets:
http://tools.ietf.org/html/rfc4293
But it seems we don't track those in any way that easy to separate from other
protocols.  This patch adds those missing counters to the stats file. Tested
successfully by me

With help from Eric Dumazet.
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

edf391ff

16 2月, 2009 1 次提交

ip: support for TX timestamps on UDP and RAW sockets · 51f31cab

由 Patrick Ohly 提交于 2月 12, 2009

Instructions for time stamping outgoing packets are take from the
socket layer and later copied into the new skb.
Signed-off-by: NPatrick Ohly <patrick.ohly@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

51f31cab

25 11月, 2008 2 次提交

net: avoid a pair of dst_hold()/dst_release() in ip_push_pending_frames() · a21bba94

由 Eric Dumazet 提交于 11月 24, 2008

We can reduce pressure on dst entry refcount that slowdown UDP transmit
path on SMP machines. This pressure is visible on RTP servers when
delivering content to mediagateways, especially big ones, handling
thousand of streams. Several cpus send UDP frames to the same
destination, hence use the same dst entry.

This patch makes ip_push_pending_frames() steal the refcount its
callers had to take when filling inet->cork.dst.

This doesnt avoid all refcounting, but still gives speedups on SMP,
on UDP/RAW transmit path.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a21bba94

net: avoid a pair of dst_hold()/dst_release() in ip_append_data() · 2e77d89b

由 Eric Dumazet 提交于 11月 24, 2008

We can reduce pressure on dst entry refcount that slowdown UDP transmit
path on SMP machines. This pressure is visible on RTP servers when
delivering content to mediagateways, especially big ones, handling
thousand of streams. Several cpus send UDP frames to the same
destination, hence use the same dst entry.

This patch makes ip_append_data() eventually steal the refcount its
callers had to take on the dst entry.

This doesnt avoid all refcounting, but still gives speedups on SMP,
on UDP/RAW transmit path
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2e77d89b

03 11月, 2008 1 次提交
- J
  net: clean up net/ipv4/ah4.c esp4.c fib_semantics.c inet_connection_sock.c inetpeer.c ip_output.c · d9319100
  由 Jianjun Kong 提交于 11月 03, 2008
```
Signed-off-by: NJianjun Kong <jianjun@zeuux.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  d9319100
01 10月, 2008 1 次提交

ipv4: Make Netfilter's ip_route_me_harder() non-local address compatible · 86b08d86

由 KOVACS Krisztian 提交于 10月 01, 2008

Netfilter's ip_route_me_harder() tries to re-route packets either
generated or re-routed by Netfilter. This patch changes
ip_route_me_harder() to handle packets from non-locally-bound sockets
with IP_TRANSPARENT set as local and to set the appropriate flowi
flags when re-doing the routing lookup.
Signed-off-by: NKOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86b08d86

26 7月, 2008 1 次提交

net: convert BUG_TRAP to generic WARN_ON · 547b792c

由 Ilpo Järvinen 提交于 7月 25, 2008

Removes legacy reinvent-the-wheel type thing. The generic
machinery integrates much better to automated debugging aids
such as kerneloops.org (and others), and is unambiguous due to
better naming. Non-intuively BUG_TRAP() is actually equal to
WARN_ON() rather than BUG_ON() though some might actually be
promoted to BUG_ON() but I left that to future.

I could make at least one BUILD_BUG_ON conversion.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

547b792c

17 7月, 2008 1 次提交

mib: add net to IP_INC_STATS · 5e38e270

由 Pavel Emelyanov 提交于 7月 16, 2008

All the callers already have either the net itself, or the place
where to get it from.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e38e270

15 7月, 2008 1 次提交

icmp: add struct net argument to icmp_out_count · 0388b004

由 Pavel Emelyanov 提交于 7月 14, 2008

This routine deals with ICMP statistics, but doesn't have a
struct net at hands, so add one.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0388b004

12 6月, 2008 1 次提交

net: remove CVS keywords · 0b040829

由 Adrian Bunk 提交于 6月 10, 2008

This patch removes CVS keywords that weren't updated for a long time
from comments.
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0b040829

30 4月, 2008 1 次提交

[IPv4] UFO: prevent generation of chained skb destined to UFO device · be9164e7

由 Kostya B 提交于 4月 29, 2008

Problem: ip_append_data() could wrongly generate a chained skb for
devices which support UFO.  When sk_write_queue is not empty
(e.g. MSG_MORE), __instead__ of appending data into the next nr_frag
of the queued skb, a new chained skb is created.

I would normally assume UFO device should get data in nr_frags and not
in frag_list.  Later the udp4_hwcsum_outgoing() resets csum to NONE
and skb_gso_segment() has oops.

Proposal:
1. Even length is less than mtu, employ ip_ufo_append_data()
and append data to the __existed__ skb in the sk_write_queue.

2. ip_ufo_append_data() is fixed due to a wrong manipulation of
peek-ing and later enqueue-ing of the same skb.  Now, enqueuing is
always performed, because on error the further
ip_flush_pending_frames() would release the queued skb.
Signed-off-by: NKostya B <bkostya@hotmail.com>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be9164e7

26 3月, 2008 1 次提交

[NET] NETNS: Omit sock->sk_net without CONFIG_NET_NS. · 3b1e0a65

由 YOSHIFUJI Hideaki 提交于 3月 26, 2008

Introduce per-sock inlines: sock_net(), sock_net_set()
and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

3b1e0a65

25 3月, 2008 2 次提交

Y
[IPV4,IPV6]: Share cork.rt between IPv4 and IPv6. · c8cdaf99
由 YOSHIFUJI Hideaki 提交于 3月 10, 2008
```
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
```
c8cdaf99

[NETNS]: Process IP layer in the context of the correct namespace. · cb84663e

由 Denis V. Lunev 提交于 3月 24, 2008

Replace all the rest of the init_net with a proper net on the IP layer.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cb84663e

06 3月, 2008 1 次提交

[IPV4]: Add 'rtable' field in struct sk_buff to alias 'dst' and avoid casts · ee6b9673

由 Eric Dumazet 提交于 3月 05, 2008

(Anonymous) unions can help us to avoid ugly casts.

A common cast it the (struct rtable *)skb->dst one.

Defining an union like  :
union {
     struct dst_entry *dst;
     struct rtable *rtable;
};
permits to use skb->rtable in place.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ee6b9673

01 2月, 2008 2 次提交

[NET]: Introducing socket mark socket option. · 4a19ec58

由 Laszlo Attila Toth 提交于 1月 30, 2008

A userspace program may wish to set the mark for each packets its send
without using the netfilter MARK target. Changing the mark can be used
for mark based routing without netfilter or for packet filtering.

It requires CAP_NET_ADMIN capability.
Signed-off-by: NLaszlo Attila Toth <panther@balabit.hu>
Acked-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4a19ec58

[INET]: Prevent out-of-sync truesize on ip_fragment slow path · 29ffe1a5

由 Herbert Xu 提交于 1月 28, 2008

When ip_fragment has to hit the slow path the value of skb->truesize
may go out of sync because we would have updated it without changing
the packet length. This violates the constraints on truesize.

This patch postpones the update of skb->truesize to prevent this.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

29ffe1a5

29 1月, 2008 6 次提交

[NETNS]: Add namespace for ICMP replying code. · dde1bc0e

由 Denis V. Lunev 提交于 1月 22, 2008

All needed API is done, the namespace is available when required from
the device on the DST entry from the incoming packet. So, just replace
init_net with proper namespace.

Other protocols will follow.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dde1bc0e

[NETNS]: Add namespace parameter to ip_route_output_key. · f206351a

由 Denis V. Lunev 提交于 1月 22, 2008

Needed to propagate it down to the ip_route_output_flow.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f206351a

[NETNS]: Add namespace parameter to ip_route_output_flow. · f1b050bf

由 Denis V. Lunev 提交于 1月 22, 2008

Needed to propagate it down to the __ip_route_output_key.

Signed_off_by: Denis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f1b050bf

[NET]: Remove obsolete comment · a067d9ac

由 Ilpo Järvinen 提交于 1月 05, 2008

It seems that ip_build_xmit is no longer used in here and
ip_append_data is used.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a067d9ac

[NETFILTER]: Introduce NF_INET_ hook values · 6e23ae2a

由 Patrick McHardy 提交于 11月 19, 2007

The IPv4 and IPv6 hook values are identical, yet some code tries to figure
out the "correct" value by looking at the address family. Introduce NF_INET_*
values for both IPv4 and IPv6. The old values are kept in a #ifndef __KERNEL__
section for userspace compatibility.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e23ae2a

[IPV4]: Add ip_local_out · c439cb2e

由 Herbert Xu 提交于 1月 11, 2008

Most callers of the LOCAL_OUT chain will set the IP packet length and
header checksum before doing so.  They also share the same output
function dst_output.

This patch creates a new function called ip_local_out which does all
of that and converts the appropriate users over to it.

Apart from removing duplicate code, it will also help in merging the
IPsec output path once the same thing is done for IPv6.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c439cb2e

23 1月, 2008 2 次提交

[INET]: Fix truesize setting in ip_append_data · f945fa7a

由 Herbert Xu 提交于 1月 22, 2008

As it is ip_append_data only counts page fragments to the skb that
allocated it.  As such it means that the first skb gets hit with a
4K charge even though it might have only used a fraction of it while
all subsequent skb's that use the same page gets away with no charge
at all.

This bug was exposed by the UDP accounting patch.

[ The wmem_alloc bumping needs to be moved with the truesize,
  noticed by Takahiro Yasui.  -DaveM ]
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f945fa7a

[IPV4]: Add missing skb->truesize increment in ip_append_page(). · 1e34a11d

由 David S. Miller 提交于 1月 22, 2008

And as noted by Takahiro Yasui, we thus need to bump the
sk->sk_wmem_alloc at this spot as well.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1e34a11d

07 11月, 2007 1 次提交

[IPV4]: Consolidate the ip cork destruction in ip_output.c · 429f08e9

由 Pavel Emelyanov 提交于 11月 05, 2007

The ip_push_pending_frames and ip_flush_pending_frames do the
same things to flush the sock's cork. Move this into a separate
function and save ~80 bytes from the .text
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

429f08e9

24 10月, 2007 1 次提交

[NET]: Treat the sign of the result of skb_headroom() consistently · c2636b4d

由 Chuck Lever 提交于 10月 23, 2007

In some places, the result of skb_headroom() is compared to an unsigned
integer, and in others, the result is compared to a signed integer.  Make
the comparisons consistent and correct.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c2636b4d

16 10月, 2007 1 次提交

[IPV4]: Uninline netfilter okfns · 861d0486

由 Patrick McHardy 提交于 10月 15, 2007

Now that we don't pass double skb pointers to nf_hook_slow anymore, gcc
can generate tail calls for some of the netfilter hook okfn invocations,
so there is no need to inline the functions anymore. This caused huge
code bloat since we ended up with one inlined version and one out-of-line
version since we pass the address to nf_hook_slow.

Before:
   text    data     bss     dec     hex filename
8997385 1016524  524652 10538561         a0ce41 vmlinux

After:
   text    data     bss     dec     hex filename
8994009 1016524  524652 10535185         a0c111 vmlinux
-------------------------------------------------------
  -3376

All cases have been verified to generate tail-calls with and without
netfilter. The okfns in ipmr and xfrm4_input still remain inline because
gcc can't generate tail-calls for them.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

861d0486

11 10月, 2007 2 次提交

[NET]: Move hardware header operations out of netdevice. · 3b04ddde

由 Stephen Hemminger 提交于 10月 09, 2007

Since hardware header operations are part of the protocol class
not the device instance, make them into a separate object and
save memory.
Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b04ddde

[IPV4]: Add ICMPMsgStats MIB (RFC 4293) · 96793b48

由 David L Stevens 提交于 9月 17, 2007

Background: RFC 4293 deprecates existing individual, named ICMP
type counters to be replaced with the ICMPMsgStatsTable. This table
includes entries for both IPv4 and IPv6, and requires counting of all
ICMP types, whether or not the machine implements the type.

These patches "remove" (but not really) the existing counters, and
replace them with the ICMPMsgStats tables for v4 and v6.
It includes the named counters in the /proc places they were, but gets the
values for them from the new tables. It also counts packets generated
from raw socket output (e.g., OutEchoes, MLD queries, RA's from
radvd, etc).

Changes:
1) create icmpmsg_statistics mib
2) create icmpv6msg_statistics mib
3) modify existing counters to use these
4) modify /proc/net/snmp to add "IcmpMsg" with all ICMP types
        listed by number for easy SNMP parsing
5) modify /proc/net/snmp printing for "Icmp" to get the named data
        from new counters.
Signed-off-by: NDavid L Stevens <dlstevens@us.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

96793b48

14 8月, 2007 1 次提交

[IPV4]: Clean up duplicate includes in net/ipv4/ · f49f9967

由 Jesper Juhl 提交于 8月 10, 2007

This patch cleans up duplicate includes in
	net/ipv4/
Signed-off-by: NJesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f49f9967

11 7月, 2007 2 次提交

[NETFILTER]: x_tables: add TRACE target · ba9dda3a

由 Jozsef Kadlecsik 提交于 7月 07, 2007

The TRACE target can be used to follow IP and IPv6 packets through
the ruleset.
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: NPatrick NcHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ba9dda3a

[NET]: IPV6 checksum offloading in network devices · d212f87b

由 Stephen Hemminger 提交于 6月 27, 2007

The existing model for checksum offload does not correctly handle
devices that can offload IPV4 and IPV6 only. The NETIF_F_HW_CSUM flag
implies device can do any arbitrary protocol.

This patch:
 * adds NETIF_F_IPV6_CSUM for those devices
 * fixes bnx2 and tg3 devices that need it
 * add NETIF_F_IPV6_CSUM to ipv6 output (incl GSO)
 * fixes assumptions about NETIF_F_ALL_CSUM in nat
 * adjusts bridge union of checksumming computation
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d212f87b

08 6月, 2007 1 次提交

[TCP]: Honour sk_bound_dev_if in tcp_v4_send_ack · f0e48dbf

由 Patrick McHardy 提交于 6月 04, 2007

A time_wait socket inherits sk_bound_dev_if from the original socket,
but it is not used when sending ACK packets using ip_send_reply.

Fix by passing the oif to ip_send_reply in struct ip_reply_arg and
use it for output routing.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f0e48dbf