提交 · 5df65e5567a497a28067019b8ff08f98fb026629 · openanolis / cloud-kernel

02 3月, 2011 3 次提交

net: Add FLOWI_FLAG_CAN_SLEEP. · 5df65e55

由 David S. Miller 提交于 3月 01, 2011

And set is in contexts where the route resolution can sleep.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5df65e55

ipv6: Consolidate route lookup sequences. · 68d0c6d3

由 David S. Miller 提交于 3月 01, 2011

Route lookups follow a general pattern in the ipv6 code wherein
we first find the non-IPSEC route, potentially override the
flow destination address due to ipv6 options settings, and then
finally make an IPSEC search using either xfrm_lookup() or
__xfrm_lookup().

__xfrm_lookup() is used when we want to generate a blackhole route
if the key manager needs to resolve the IPSEC rules (in this case
-EREMOTE is returned and the original 'dst' is left unchanged).

Otherwise plain xfrm_lookup() is used and when asynchronous IPSEC
resolution is necessary, we simply fail the lookup completely.

All of these cases are encapsulated into two routines,
ip6_dst_lookup_flow and ip6_sk_dst_lookup_flow.  The latter of which
handles unconnected UDP datagram sockets.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

68d0c6d3

inet: Remove unused sk_sndmsg_* from UFO · 5a2ef920

由 Herbert Xu 提交于 3月 01, 2011

UFO doesn't really use the sk_sndmsg_* parameters so touching
them is pointless.  It can't use them anyway since the whole
point of UFO is to use the original pages without copying.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a2ef920

01 3月, 2011 1 次提交

net: TX timestamps for IPv6 UDP packets · a693e698

由 Anders Berggren 提交于 2月 28, 2011

Enabling TX timestamps (SO_TIMESTAMPING) for IPv6 UDP packets, in
the same fashion as for IPv4. Necessary in order for NICs such as
Intel 82580 to timestamp IPv6 packets.
Signed-off-by: NAnders Berggren <anders@halon.se>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a693e698

26 2月, 2011 1 次提交
- H
  ipv6: totlen is declared and assigned but not used · a5f5e368
  由 Hagen Paul Pfeifer 提交于 2月 25, 2011
```
Signed-off-by: NHagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  a5f5e368
05 2月, 2011 1 次提交

inetpeer: Move ICMP rate limiting state into inet_peer entries. · 92d86829

由 David S. Miller 提交于 2月 04, 2011

Like metrics, the ICMP rate limiting bits are cached state about
a destination.  So move it into the inet_peer entries.

If an inet_peer cannot be bound (the reason is memory allocation
failure or similar), the policy is to allow.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

92d86829

13 1月, 2011 1 次提交

inet6: prevent network storms caused by linux IPv6 routers · 72b43d08

由 Alexey Kuznetsov 提交于 1月 12, 2011

Linux IPv6 forwards unicast packets, which are link layer multicasts...
The hole was present since day one. I was 100% this check is there, but it is not.

The problem shows itself, f.e. when Microsoft Network Load Balancer runs on a network.
This software resolves IPv6 unicast addresses to multicast MAC addresses.
Signed-off-by: NAlexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72b43d08

20 12月, 2010 1 次提交

ipv6: Fragment locally generated tunnel-mode IPSec6 packets as needed. · ad0081e4

由 David Stevens 提交于 12月 17, 2010

This patch modifies IPsec6 to fragment IPv6 packets that are
locally generated as needed.

This version of the patch only fragments in tunnel mode, so that fragment
headers will not be obscured by ESP in transport mode.
Signed-off-by: NDavid L Stevens <dlstevens@us.ibm.com>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad0081e4

24 9月, 2010 1 次提交

net: return operator cleanup · a02cec21

由 Eric Dumazet 提交于 9月 22, 2010

Change "return (EXPR);" to "return EXPR;"

return is not a function, parentheses are not required.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a02cec21

22 9月, 2010 1 次提交

ip: fix truesize mismatch in ip fragmentation · 3d13008e

由 Eric Dumazet 提交于 9月 21, 2010

Special care should be taken when slow path is hit in ip_fragment() :

When walking through frags, we transfert truesize ownership from skb to
frags. Then if we hit a slow_path condition, we must undo this or risk
uncharging frags->truesize twice, and in the end, having negative socket
sk_wmem_alloc counter, or even freeing socket sooner than expected.

Many thanks to Nick Bowler, who provided a very clean bug report and
test program.

Thanks to Jarek for reviewing my first patch and providing a V2

While Nick bisection pointed to commit 2b85a34e (net: No more
expensive sock_hold()/sock_put() on each tx), underlying bug is older
(2.6.12-rc5)

A side effect is to extend work done in commit b2722b1c
(ip_fragment: also adjust skb->truesize for packets not owned by a
socket) to ipv6 as well.
Reported-and-bisected-by: NNick Bowler <nbowler@elliptictech.com>
Tested-by: NNick Bowler <nbowler@elliptictech.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d13008e

23 8月, 2010 1 次提交

net: Rename skb_has_frags to skb_has_frag_list · 21dc3301

由 David S. Miller 提交于 8月 23, 2010

SKBs can be "fragmented" in two ways, via a page array (called
skb_shinfo(skb)->frags[]) and via a list of SKBs (called
skb_shinfo(skb)->frag_list).

Since skb_has_frags() tests the latter, it's name is confusing
since it sounds more like it's testing the former.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

21dc3301

11 6月, 2010 1 次提交

net-next: remove useless union keyword · d8d1f30b

由 Changli Gao 提交于 6月 10, 2010

remove useless union keyword in rtable, rt6_info and dn_route.

Since there is only one member in a union, the union keyword isn't useful.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d8d1f30b

28 5月, 2010 1 次提交

ipv6: Add GSO support on forwarding path · 0aa68271

由 Herbert Xu 提交于 5月 27, 2010

Currently we disallow GSO packets on the IPv6 forward path.
This patch fixes this.

Note that I discovered that our existing GSO MTU checks (e.g.,
IPv4 forwarding) are buggy in that they skip the check altogether,
when they really should be checking gso_size + header instead.

I have also been lazy here in that I haven't bothered to segment
the GSO packet by hand before generating an ICMP message.  Someone
should add that to be 100% correct.
Reported-by: NRalf Baechle <ralf@linux-mips.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0aa68271

11 5月, 2010 1 次提交

ipv6: ip6mr: support multiple tables · d1db275d

由 Patrick McHardy 提交于 5月 11, 2010

This patch adds support for multiple independant multicast routing instances,
named "tables".

Userspace multicast routing daemons can bind to a specific table instance by
issuing a setsockopt call using a new option MRT6_TABLE. The table number is
stored in the raw socket data and affects all following ip6mr setsockopt(),
getsockopt() and ioctl() calls. By default, a single table (RT6_TABLE_DFLT)
is created with a default routing rule pointing to it. Newly created pim6reg
devices have the table number appended ("pim6regX"), with the exception of
devices created in the default table, which are named just "pim6reg" for
compatibility reasons.

Packets are directed to a specific table instance using routing rules,
similar to how regular routing rules work. Currently iif, oif and mark
are supported as keys, source and destination addresses could be supported
additionally.

Example usage:

- bind pimd/xorp/... to a specific table:

uint32_t table = 123;
setsockopt(fd, SOL_IPV6, MRT6_TABLE, &table, sizeof(table));

- create routing rules directing packets to the new table:

# ip -6 mrule add iif eth0 lookup 123
# ip -6 mrule add oif eth0 lookup 123
Signed-off-by: NPatrick McHardy <kaber@trash.net>

d1db275d

01 5月, 2010 1 次提交

ipv6: cleanup: remove unneeded null check · 83d7eb29

由 Dan Carpenter 提交于 4月 30, 2010

We dereference "sk" unconditionally elsewhere in the function.  

This was left over from:  b30bd282 "ip6_xmit: remove unnecessary NULL
ptr check".  According to that commit message, "the sk argument to 
ip6_xmit is never NULL nowadays since the skb->priority assigment 
expects a valid socket."
Signed-off-by: NDan Carpenter <error27@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

83d7eb29

24 4月, 2010 2 次提交

IPv6: Complete IPV6_DONTFRAG support · 4b340ae2

由 Brian Haley 提交于 4月 23, 2010

Finally add support to detect a local IPV6_DONTFRAG event
and return the relevant data to the user if they've enabled
IPV6_RECVPATHMTU on the socket.  The next recvmsg() will
return no data, but have an IPV6_PATHMTU as ancillary data.
Signed-off-by: NBrian Haley <brian.haley@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b340ae2

IPv6: Add dontfrag argument to relevant functions · 13b52cd4

由 Brian Haley 提交于 4月 23, 2010

Add dontfrag argument to relevant functions for
IPV6_DONTFRAG support, as well as allowing the value
to be passed-in via ancillary cmsg data.
Signed-off-by: NBrian Haley <brian.haley@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13b52cd4

22 4月, 2010 1 次提交

ipv6: allow to send packet after receiving ICMPv6 Too Big message with MTU... · f2228f78

由 Shan Wei 提交于 4月 18, 2010

ipv6: allow to send packet after receiving ICMPv6 Too Big message with MTU field less than IPV6_MIN_MTU

According to RFC2460, PMTU is set to the IPv6 Minimum Link
MTU (1280) and a fragment header should always be included
after a node receiving Too Big message reporting PMTU is
less than the IPv6 Minimum Link MTU.

After receiving a ICMPv6 Too Big message reporting PMTU is
less than the IPv6 Minimum Link MTU, sctp *can't* send any
data/control chunk that total length including IPv6 head
and IPv6 extend head is less than IPV6_MIN_MTU(1280 bytes).

The failure occured in p6_fragment(), about reason
see following(take SHUTDOWN chunk for example):
sctp_packet_transmit (SHUTDOWN chunk, len=16 byte)
|------sctp_v6_xmit (local_df=0)
   |------ip6_xmit
       |------ip6_output (dst_allfrag is ture)
           |------ip6_fragment

In ip6_fragment(), for local_df=0, drops the the packet
and returns EMSGSIZE.

The patch fixes it with adding check length of skb->len.
In this case, Ipv6 not to fragment upper protocol data,
just only add a fragment header before it.
Signed-off-by: NShan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f2228f78

19 4月, 2010 2 次提交

netfilter: xt_TEE: have cloned packet travel through Xtables too · cd58bcd9

由 Jan Engelhardt 提交于 4月 19, 2010

Since Xtables is now reentrant/nestable, the cloned packet can also go
through Xtables and be subject to rules itself.
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

cd58bcd9

netfilter: xtables: inclusion of xt_TEE · e281b198

由 Jan Engelhardt 提交于 4月 19, 2010

xt_TEE can be used to clone and reroute a packet. This can for
example be used to copy traffic at a router for logging purposes
to another dedicated machine.

References: http://www.gossamer-threads.com/lists/iptables/devel/68781Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

e281b198

16 4月, 2010 4 次提交

ipv6: fix the comment of ip6_xmit() · b5d43998

由 Shan Wei 提交于 4月 15, 2010

ip6_xmit() is used by upper transport protocol.
Signed-off-by: NShan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b5d43998

net: replace ipfragok with skb->local_df · 4e15ed4d

由 Shan Wei 提交于 4月 15, 2010

As Herbert Xu said: we should be able to simply replace ipfragok
with skb->local_df. commit f88037(sctp: Drop ipfargok in sctp_xmit function)
has droped ipfragok and set local_df value properly.

The patch kills the ipfragok parameter of .queue_xmit().
Signed-off-by: NShan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e15ed4d

ipv6: cancel to setting local_df in ip6_xmit() · 0eecb784

由 Shan Wei 提交于 4月 15, 2010

commit f88037(sctp: Drop ipfargok in sctp_xmit function)
has droped ipfragok and set local_df value properly.

So the change of commit 77e2f1(ipv6: Fix ip6_xmit to
send fragments if ipfragok is true) is not needed.
So the patch remove them.
Signed-off-by: NShan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0eecb784

ip: Fix ip_dev_loopback_xmit() · e30b38c2

由 Eric Dumazet 提交于 4月 15, 2010

Eric Paris got following trace with a linux-next kernel

[   14.203970] BUG: using smp_processor_id() in preemptible [00000000]
code: avahi-daemon/2093
[   14.204025] caller is netif_rx+0xfa/0x110
[   14.204035] Call Trace:
[   14.204064]  [<ffffffff81278fe5>] debug_smp_processor_id+0x105/0x110
[   14.204070]  [<ffffffff8142163a>] netif_rx+0xfa/0x110
[   14.204090]  [<ffffffff8145b631>] ip_dev_loopback_xmit+0x71/0xa0
[   14.204095]  [<ffffffff8145b892>] ip_mc_output+0x192/0x2c0
[   14.204099]  [<ffffffff8145d610>] ip_local_out+0x20/0x30
[   14.204105]  [<ffffffff8145d8ad>] ip_push_pending_frames+0x28d/0x3d0
[   14.204119]  [<ffffffff8147f1cc>] udp_push_pending_frames+0x14c/0x400
[   14.204125]  [<ffffffff814803fc>] udp_sendmsg+0x39c/0x790
[   14.204137]  [<ffffffff814891d5>] inet_sendmsg+0x45/0x80
[   14.204149]  [<ffffffff8140af91>] sock_sendmsg+0xf1/0x110
[   14.204189]  [<ffffffff8140dc6c>] sys_sendmsg+0x20c/0x380
[   14.204233]  [<ffffffff8100ad82>] system_call_fastpath+0x16/0x1b

While current linux-2.6 kernel doesnt emit this warning, bug is latent
and might cause unexpected failures.

ip_dev_loopback_xmit() runs in process context, preemption enabled, so
must call netif_rx_ni() instead of netif_rx(), to make sure that we
process pending software interrupt.

Same change for ip6_dev_loopback_xmit()
Reported-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e30b38c2

13 4月, 2010 2 次提交

netfilter: ipv6: add IPSKB_REROUTED exclusion to NF_HOOK/POSTROUTING invocation · 9c6eb28a

由 Jan Engelhardt 提交于 4月 13, 2010

Similar to how IPv4's ip_output.c works, have ip6_output also check
the IPSKB_REROUTED flag. It will be set from xt_TEE for cloned packets
since Xtables can currently only deal with a single packet in flight
at a time.
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
Acked-by: NDavid S. Miller <davem@davemloft.net>
[Patrick: changed to use an IP6SKB value instead of IPSKB]
Signed-off-by: NPatrick McHardy <kaber@trash.net>

9c6eb28a

netfilter: ipv6: move POSTROUTING invocation before fragmentation · 9e508490

由 Jan Engelhardt 提交于 4月 13, 2010

Patrick McHardy notes: "We used to invoke IPv4 POST_ROUTING after
fragmentation as well just to defragment the packets in conntrack
immediately afterwards, but that got changed during the
netfilter-ipsec integration. Ideally IPv6 would behave like IPv4."

This patch makes it so. Sending an oversized frame (e.g. `ping6
-s64000 -c1 ::1`) will now show up in POSTROUTING as a single skb
rather than multiple ones.
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

9e508490

30 3月, 2010 1 次提交

include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6

由 Tejun Heo 提交于 3月 24, 2010

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: NTejun Heo <tj@kernel.org>
Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

5a0e3ad6

25 3月, 2010 1 次提交

netfilter: ipv6: use NFPROTO values for NF_HOOK invocation · b2e0b385

由 Jan Engelhardt 提交于 3月 23, 2010

The semantic patch that was used:
// <smpl>
@@
@@
(NF_HOOK
|NF_HOOK_THRESH
|nf_hook
)(
-PF_INET6,
+NFPROTO_IPV6,
 ...)
// </smpl>
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>

b2e0b385

26 2月, 2010 1 次提交

ipv6: Use 1280 as min MTU for ipv6 forwarding · 14f3ad6f

由 Ulrich Weber 提交于 2月 26, 2010

Clients will set their MTU to 1280 if they receive a
ICMPV6_PKT_TOOBIG message with an MTU less than 1280.

To allow encapsulating of packets over a 1280 link
we should always accept packets with a size of 1280
for forwarding even if the path has a lower MTU and
fragment the encapsulated packets afterwards.

In case a forwarded packet is not going to be encapsulated
a ICMPV6_PKT_TOOBIG msg will still be send by ip6_fragment()
with the correct MTU.
Signed-off-by: NUlrich Weber <uweber@astaro.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

14f3ad6f

19 2月, 2010 1 次提交

ipv6: drop unused "dev" arg of icmpv6_send() · 3ffe533c

由 Alexey Dobriyan 提交于 2月 18, 2010

Dunno, what was the idea, it wasn't used for a long time.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ffe533c

07 1月, 2010 1 次提交

ip: fix mc_loop checks for tunnels with multicast outer addresses · 7ad6848c

由 Octavian Purdila 提交于 1月 06, 2010

When we have L3 tunnels with different inner/outer families
(i.e. IPV4/IPV6) which use a multicast address as the outer tunnel
destination address, multicast packets will be loopbacked back to the
sending socket even if IP*_MULTICAST_LOOP is set to disabled.

The mc_loop flag is present in the family specific part of the socket
(e.g. the IPv4 or IPv4 specific part).  setsockopt sets the inner
family mc_loop flag. When the packet is pushed through the L3 tunnel
it will eventually be processed by the outer family which if different
will check the flag in a different part of the socket then it was set.
Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7ad6848c

03 9月, 2009 1 次提交

ip: Report qdisc packet drops · 6ce9e7b5

由 Eric Dumazet 提交于 9月 02, 2009

Christoph Lameter pointed out that packet drops at qdisc level where not
accounted in SNMP counters. Only if application sets IP_RECVERR, drops
are reported to user (-ENOBUFS errors) and SNMP counters updated.

IP_RECVERR is used to enable extended reliable error message passing,
but these are not needed to update system wide SNMP stats.

This patch changes things a bit to allow SNMP counters to be updated,
regardless of IP_RECVERR being set or not on the socket.

Example after an UDP tx flood
# netstat -s 
...
IP:
    1487048 outgoing packets dropped
...
Udp:
...
    SndbufErrors: 1487048


send() syscalls, do however still return an OK status, to not
break applications.

Note : send() manual page explicitly says for -ENOBUFS error :

 "The output queue for a network interface was full.
  This generally indicates that the interface has stopped sending,
  but may be caused by transient congestion.
  (Normally, this does not occur in Linux. Packets are just silently
  dropped when a device queue overflows.) "

This is not true for IP_RECVERR enabled sockets : a send() syscall
that hit a qdisc drop returns an ENOBUFS error.

Many thanks to Christoph, David, and last but not least, Alexey !
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6ce9e7b5

02 9月, 2009 1 次提交

ipv6: ip6_push_pending_frames() should increment IPSTATS_MIB_OUTDISCARDS · 06254914

由 Eric Dumazet 提交于 9月 01, 2009

qdisc drops should be notified to IP_RECVERR enabled sockets, as done in IPV4.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

06254914

14 8月, 2009 1 次提交

inet6: Conversion from u8 to int · e651f03a

由 Gerrit Renker 提交于 8月 09, 2009

This replaces assignments of the type "int on LHS" = "u8 on RHS" with
simpler code. The LHS can express all of the unsigned right hand side
values, hence the assigned value can not be negative.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e651f03a

13 7月, 2009 2 次提交

udpv6: Remove unused skb argument of ipv6_select_ident() · 7ea2f2c5

由 Sridhar Samudrala 提交于 7月 09, 2009

- move ipv6_select_ident() inline function to ipv6.h and remove the unused
  skb argument
Signed-off-by: NSridhar Samudrala <sri@us.ibm.com>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7ea2f2c5

udpv6: Fix gso_size setting in ip6_ufo_append_data · c31d5326

由 Sridhar Samudrala 提交于 7月 09, 2009

- fix gso_size setting for ipv6 fragment to be a multiple of 8 bytes.
Signed-off-by: NSridhar Samudrala <sri@us.ibm.com>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c31d5326

12 7月, 2009 1 次提交

net: ip_push_pending_frames() fix · e51a67a9

由 Eric Dumazet 提交于 7月 08, 2009

After commit 2b85a34e
(net: No more expensive sock_hold()/sock_put() on each tx)
we do not take any more references on sk->sk_refcnt on outgoing packets.

I forgot to delete two __sock_put() from ip_push_pending_frames()
and ip6_push_pending_frames().
Reported-by: NEmil S Tantilov <emils.tantilov@gmail.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Tested-by: NEmil S Tantilov <emils.tantilov@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e51a67a9

11 6月, 2009 1 次提交

net: No more expensive sock_hold()/sock_put() on each tx · 2b85a34e

由 Eric Dumazet 提交于 6月 11, 2009

One of the problem with sock memory accounting is it uses
a pair of sock_hold()/sock_put() for each transmitted packet.

This slows down bidirectional flows because the receive path
also needs to take a refcount on socket and might use a different
cpu than transmit path or transmit completion path. So these
two atomic operations also trigger cache line bounces.

We can see this in tx or tx/rx workloads (media gateways for example),
where sock_wfree() can be in top five functions in profiles.

We use this sock_hold()/sock_put() so that sock freeing
is delayed until all tx packets are completed.

As we also update sk_wmem_alloc, we could offset sk_wmem_alloc
by one unit at init time, until sk_free() is called.
Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc)
to decrement initial offset and atomicaly check if any packets
are in flight.

skb_set_owner_w() doesnt call sock_hold() anymore

sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc
reached 0 to perform the final freeing.

Drawback is that a skb->truesize error could lead to unfreeable sockets, or
even worse, prematurely calling __sk_free() on a live socket.

Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s
on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt
contention point. 5 % speedup on a UDP transmit workload (depends
on number of flows), lowering TX completion cpu usage.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2b85a34e

09 6月, 2009 1 次提交
- D
  ipv6: Use frag list abstraction interfaces. · 4d9092bb
  由 David S. Miller 提交于 6月 09, 2009
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  4d9092bb
03 6月, 2009 1 次提交

net: skb->dst accessors · adf30907

由 Eric Dumazet 提交于 6月 02, 2009

Define three accessors to get/set dst attached to a skb

struct dst_entry *skb_dst(const struct sk_buff *skb)

void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;

Delete skb->dst field
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

adf30907

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功