提交 · e281b19897dc21c1071802808d461627d747a877 · openanolis / cloud-kernel

19 4月, 2010 1 次提交

netfilter: xtables: inclusion of xt_TEE · e281b198

由 Jan Engelhardt 提交于 4月 19, 2010

xt_TEE can be used to clone and reroute a packet. This can for
example be used to copy traffic at a router for logging purposes
to another dedicated machine.

References: http://www.gossamer-threads.com/lists/iptables/devel/68781Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

e281b198

25 3月, 2010 1 次提交

netfilter: ipv4: use NFPROTO values for NF_HOOK invocation · 9bbc768a

由 Jan Engelhardt 提交于 3月 23, 2010

The semantic patch that was used:
// <smpl>
@@
@@
(NF_HOOK
|NF_HOOK_COND
|nf_hook
)(
-PF_INET,
+NFPROTO_IPV4,
 ...)
// </smpl>
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>

9bbc768a

07 1月, 2010 1 次提交

ip: fix mc_loop checks for tunnels with multicast outer addresses · 7ad6848c

由 Octavian Purdila 提交于 1月 06, 2010

When we have L3 tunnels with different inner/outer families
(i.e. IPV4/IPV6) which use a multicast address as the outer tunnel
destination address, multicast packets will be loopbacked back to the
sending socket even if IP*_MULTICAST_LOOP is set to disabled.

The mc_loop flag is present in the family specific part of the socket
(e.g. the IPv4 or IPv4 specific part).  setsockopt sets the inner
family mc_loop flag. When the packet is pushed through the L3 tunnel
it will eventually be processed by the outer family which if different
will check the flag in a different part of the socket then it was set.
Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7ad6848c

02 12月, 2009 1 次提交

ip_fragment: also adjust skb->truesize for packets not owned by a socket · b2722b1c

由 Patrick McHardy 提交于 12月 01, 2009

When a large packet gets reassembled by ip_defrag(), the head skb
accounts for all the fragments in skb->truesize. If this packet is
refragmented again, skb->truesize is not re-adjusted to reflect only
the head size since its not owned by a socket. If the head fragment
then gets recycled and reused for another received fragment, it might
exceed the defragmentation limits due to its large truesize value.

skb_recycle_check() explicitly checks for linear skbs, so any recycled
skb should reflect its true size in skb->truesize. Change ip_fragment()
to also adjust the truesize value of skbs not owned by a socket.
Reported-and-tested-by: NBen Menchaca <ben@bigfootnetworks.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b2722b1c

24 11月, 2009 1 次提交

net/ipv4: Move && and || to end of previous line · 9d4fb27d

由 Joe Perches 提交于 11月 23, 2009

On Sun, 2009-11-22 at 16:31 -0800, David Miller wrote:
> It should be of the form:
> 	if (x &&
> 	    y)
> 
> or:
> 	if (x && y)
> 
> Fix patches, rather than complaints, for existing cases where things
> do not follow this pattern are certainly welcome.

Also collapsed some multiple tabs to single space.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d4fb27d

19 10月, 2009 1 次提交

inet: rename some inet_sock fields · c720c7e8

由 Eric Dumazet 提交于 10月 15, 2009

In order to have better cache layouts of struct sock (separate zones
for rx/tx paths), we need this preliminary patch.

Goal is to transfert fields used at lookup time in the first
read-mostly cache line (inside struct sock_common) and move sk_refcnt
to a separate cache line (only written by rx path)

This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
sport and id fields. This allows a future patch to define these
fields as macros, like sk_refcnt, without name clashes.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c720c7e8

02 10月, 2009 1 次提交

net: Use sk_mark for routing lookup in more places · 914a9ab3

由 Atis Elsts 提交于 10月 01, 2009

This patch against v2.6.31 adds support for route lookup using sk_mark in some 
more places. The benefits from this patch are the following.
First, SO_MARK option now has effect on UDP sockets too.
Second, ip_queue_xmit() and inet_sk_rebuild_header() could fail to do routing 
lookup correctly if TCP sockets with SO_MARK were used.
Signed-off-by: NAtis Elsts <atis@mikrotik.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>

914a9ab3

03 9月, 2009 1 次提交

ip: Report qdisc packet drops · 6ce9e7b5

由 Eric Dumazet 提交于 9月 02, 2009

Christoph Lameter pointed out that packet drops at qdisc level where not
accounted in SNMP counters. Only if application sets IP_RECVERR, drops
are reported to user (-ENOBUFS errors) and SNMP counters updated.

IP_RECVERR is used to enable extended reliable error message passing,
but these are not needed to update system wide SNMP stats.

This patch changes things a bit to allow SNMP counters to be updated,
regardless of IP_RECVERR being set or not on the socket.

Example after an UDP tx flood
# netstat -s 
...
IP:
    1487048 outgoing packets dropped
...
Udp:
...
    SndbufErrors: 1487048


send() syscalls, do however still return an OK status, to not
break applications.

Note : send() manual page explicitly says for -ENOBUFS error :

 "The output queue for a network interface was full.
  This generally indicates that the interface has stopped sending,
  but may be caused by transient congestion.
  (Normally, this does not occur in Linux. Packets are just silently
  dropped when a device queue overflows.) "

This is not true for IP_RECVERR enabled sockets : a send() syscall
that hit a qdisc drop returns an ENOBUFS error.

Many thanks to Christoph, David, and last but not least, Alexey !
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6ce9e7b5

28 8月, 2009 1 次提交

ipv4: make ip_append_data() handle NULL routing table · 788d908f

由 Julien TINNES 提交于 8月 27, 2009

Add a check in ip_append_data() for NULL *rtp to prevent future bugs in
callers from being exploitable.
Signed-off-by: NJulien Tinnes <julien@cr0.org>
Signed-off-by: NTavis Ormandy <taviso@sdf.lonestar.org>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

788d908f

12 7月, 2009 1 次提交

net: ip_push_pending_frames() fix · e51a67a9

由 Eric Dumazet 提交于 7月 08, 2009

After commit 2b85a34e
(net: No more expensive sock_hold()/sock_put() on each tx)
we do not take any more references on sk->sk_refcnt on outgoing packets.

I forgot to delete two __sock_put() from ip_push_pending_frames()
and ip6_push_pending_frames().
Reported-by: NEmil S Tantilov <emils.tantilov@gmail.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Tested-by: NEmil S Tantilov <emils.tantilov@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e51a67a9

11 6月, 2009 1 次提交

net: No more expensive sock_hold()/sock_put() on each tx · 2b85a34e

由 Eric Dumazet 提交于 6月 11, 2009

One of the problem with sock memory accounting is it uses
a pair of sock_hold()/sock_put() for each transmitted packet.

This slows down bidirectional flows because the receive path
also needs to take a refcount on socket and might use a different
cpu than transmit path or transmit completion path. So these
two atomic operations also trigger cache line bounces.

We can see this in tx or tx/rx workloads (media gateways for example),
where sock_wfree() can be in top five functions in profiles.

We use this sock_hold()/sock_put() so that sock freeing
is delayed until all tx packets are completed.

As we also update sk_wmem_alloc, we could offset sk_wmem_alloc
by one unit at init time, until sk_free() is called.
Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc)
to decrement initial offset and atomicaly check if any packets
are in flight.

skb_set_owner_w() doesnt call sock_hold() anymore

sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc
reached 0 to perform the final freeing.

Drawback is that a skb->truesize error could lead to unfreeable sockets, or
even worse, prematurely calling __sk_free() on a live socket.

Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s
on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt
contention point. 5 % speedup on a UDP transmit workload (depends
on number of flows), lowering TX completion cpu usage.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2b85a34e

09 6月, 2009 1 次提交
- D
  ipv4: Use frag list abstraction interfaces. · d7fcf1a5
  由 David S. Miller 提交于 6月 09, 2009
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  d7fcf1a5
03 6月, 2009 2 次提交

net: skb->dst accessors · adf30907

由 Eric Dumazet 提交于 6月 02, 2009

Define three accessors to get/set dst attached to a skb

struct dst_entry *skb_dst(const struct sk_buff *skb)

void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;

Delete skb->dst field
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

adf30907

net: skb->rtable accessor · 511c3f92

由 Eric Dumazet 提交于 6月 02, 2009

Define skb_rtable(const struct sk_buff *skb) accessor to get rtable from skb

Delete skb->rtable field

Setting rtable is not allowed, just set dst instead as rtable is an alias.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

511c3f92

27 4月, 2009 1 次提交

snmp: add missing counters for RFC 4293 · edf391ff

由 Neil Horman 提交于 4月 27, 2009

The IP MIB (RFC 4293) defines stats for InOctets, OutOctets, InMcastOctets and
OutMcastOctets:
http://tools.ietf.org/html/rfc4293
But it seems we don't track those in any way that easy to separate from other
protocols.  This patch adds those missing counters to the stats file. Tested
successfully by me

With help from Eric Dumazet.
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

edf391ff

16 2月, 2009 1 次提交

ip: support for TX timestamps on UDP and RAW sockets · 51f31cab

由 Patrick Ohly 提交于 2月 12, 2009

Instructions for time stamping outgoing packets are take from the
socket layer and later copied into the new skb.
Signed-off-by: NPatrick Ohly <patrick.ohly@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

51f31cab

25 11月, 2008 2 次提交

net: avoid a pair of dst_hold()/dst_release() in ip_push_pending_frames() · a21bba94

由 Eric Dumazet 提交于 11月 24, 2008

We can reduce pressure on dst entry refcount that slowdown UDP transmit
path on SMP machines. This pressure is visible on RTP servers when
delivering content to mediagateways, especially big ones, handling
thousand of streams. Several cpus send UDP frames to the same
destination, hence use the same dst entry.

This patch makes ip_push_pending_frames() steal the refcount its
callers had to take when filling inet->cork.dst.

This doesnt avoid all refcounting, but still gives speedups on SMP,
on UDP/RAW transmit path.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a21bba94

net: avoid a pair of dst_hold()/dst_release() in ip_append_data() · 2e77d89b

由 Eric Dumazet 提交于 11月 24, 2008

We can reduce pressure on dst entry refcount that slowdown UDP transmit
path on SMP machines. This pressure is visible on RTP servers when
delivering content to mediagateways, especially big ones, handling
thousand of streams. Several cpus send UDP frames to the same
destination, hence use the same dst entry.

This patch makes ip_append_data() eventually steal the refcount its
callers had to take on the dst entry.

This doesnt avoid all refcounting, but still gives speedups on SMP,
on UDP/RAW transmit path
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2e77d89b

03 11月, 2008 1 次提交
- J
  net: clean up net/ipv4/ah4.c esp4.c fib_semantics.c inet_connection_sock.c inetpeer.c ip_output.c · d9319100
  由 Jianjun Kong 提交于 11月 03, 2008
```
Signed-off-by: NJianjun Kong <jianjun@zeuux.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  d9319100
01 10月, 2008 1 次提交

ipv4: Make Netfilter's ip_route_me_harder() non-local address compatible · 86b08d86

由 KOVACS Krisztian 提交于 10月 01, 2008

Netfilter's ip_route_me_harder() tries to re-route packets either
generated or re-routed by Netfilter. This patch changes
ip_route_me_harder() to handle packets from non-locally-bound sockets
with IP_TRANSPARENT set as local and to set the appropriate flowi
flags when re-doing the routing lookup.
Signed-off-by: NKOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86b08d86

26 7月, 2008 1 次提交

net: convert BUG_TRAP to generic WARN_ON · 547b792c

由 Ilpo Järvinen 提交于 7月 25, 2008

Removes legacy reinvent-the-wheel type thing. The generic
machinery integrates much better to automated debugging aids
such as kerneloops.org (and others), and is unambiguous due to
better naming. Non-intuively BUG_TRAP() is actually equal to
WARN_ON() rather than BUG_ON() though some might actually be
promoted to BUG_ON() but I left that to future.

I could make at least one BUILD_BUG_ON conversion.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

547b792c

17 7月, 2008 1 次提交

mib: add net to IP_INC_STATS · 5e38e270

由 Pavel Emelyanov 提交于 7月 16, 2008

All the callers already have either the net itself, or the place
where to get it from.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e38e270

15 7月, 2008 1 次提交

icmp: add struct net argument to icmp_out_count · 0388b004

由 Pavel Emelyanov 提交于 7月 14, 2008

This routine deals with ICMP statistics, but doesn't have a
struct net at hands, so add one.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0388b004

12 6月, 2008 1 次提交

net: remove CVS keywords · 0b040829

由 Adrian Bunk 提交于 6月 10, 2008

This patch removes CVS keywords that weren't updated for a long time
from comments.
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0b040829

30 4月, 2008 1 次提交

[IPv4] UFO: prevent generation of chained skb destined to UFO device · be9164e7

由 Kostya B 提交于 4月 29, 2008

Problem: ip_append_data() could wrongly generate a chained skb for
devices which support UFO.  When sk_write_queue is not empty
(e.g. MSG_MORE), __instead__ of appending data into the next nr_frag
of the queued skb, a new chained skb is created.

I would normally assume UFO device should get data in nr_frags and not
in frag_list.  Later the udp4_hwcsum_outgoing() resets csum to NONE
and skb_gso_segment() has oops.

Proposal:
1. Even length is less than mtu, employ ip_ufo_append_data()
and append data to the __existed__ skb in the sk_write_queue.

2. ip_ufo_append_data() is fixed due to a wrong manipulation of
peek-ing and later enqueue-ing of the same skb.  Now, enqueuing is
always performed, because on error the further
ip_flush_pending_frames() would release the queued skb.
Signed-off-by: NKostya B <bkostya@hotmail.com>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be9164e7

26 3月, 2008 1 次提交

[NET] NETNS: Omit sock->sk_net without CONFIG_NET_NS. · 3b1e0a65

由 YOSHIFUJI Hideaki 提交于 3月 26, 2008

Introduce per-sock inlines: sock_net(), sock_net_set()
and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

3b1e0a65

25 3月, 2008 2 次提交

Y
[IPV4,IPV6]: Share cork.rt between IPv4 and IPv6. · c8cdaf99
由 YOSHIFUJI Hideaki 提交于 3月 10, 2008
```
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
```
c8cdaf99

[NETNS]: Process IP layer in the context of the correct namespace. · cb84663e

由 Denis V. Lunev 提交于 3月 24, 2008

Replace all the rest of the init_net with a proper net on the IP layer.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cb84663e

06 3月, 2008 1 次提交

[IPV4]: Add 'rtable' field in struct sk_buff to alias 'dst' and avoid casts · ee6b9673

由 Eric Dumazet 提交于 3月 05, 2008

(Anonymous) unions can help us to avoid ugly casts.

A common cast it the (struct rtable *)skb->dst one.

Defining an union like  :
union {
     struct dst_entry *dst;
     struct rtable *rtable;
};
permits to use skb->rtable in place.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ee6b9673

01 2月, 2008 2 次提交

[NET]: Introducing socket mark socket option. · 4a19ec58

由 Laszlo Attila Toth 提交于 1月 30, 2008

A userspace program may wish to set the mark for each packets its send
without using the netfilter MARK target. Changing the mark can be used
for mark based routing without netfilter or for packet filtering.

It requires CAP_NET_ADMIN capability.
Signed-off-by: NLaszlo Attila Toth <panther@balabit.hu>
Acked-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4a19ec58

[INET]: Prevent out-of-sync truesize on ip_fragment slow path · 29ffe1a5

由 Herbert Xu 提交于 1月 28, 2008

When ip_fragment has to hit the slow path the value of skb->truesize
may go out of sync because we would have updated it without changing
the packet length. This violates the constraints on truesize.

This patch postpones the update of skb->truesize to prevent this.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

29ffe1a5

29 1月, 2008 6 次提交

[NETNS]: Add namespace for ICMP replying code. · dde1bc0e

由 Denis V. Lunev 提交于 1月 22, 2008

All needed API is done, the namespace is available when required from
the device on the DST entry from the incoming packet. So, just replace
init_net with proper namespace.

Other protocols will follow.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dde1bc0e

[NETNS]: Add namespace parameter to ip_route_output_key. · f206351a

由 Denis V. Lunev 提交于 1月 22, 2008

Needed to propagate it down to the ip_route_output_flow.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f206351a

[NETNS]: Add namespace parameter to ip_route_output_flow. · f1b050bf

由 Denis V. Lunev 提交于 1月 22, 2008

Needed to propagate it down to the __ip_route_output_key.

Signed_off_by: Denis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f1b050bf

[NET]: Remove obsolete comment · a067d9ac

由 Ilpo Järvinen 提交于 1月 05, 2008

It seems that ip_build_xmit is no longer used in here and
ip_append_data is used.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a067d9ac

[NETFILTER]: Introduce NF_INET_ hook values · 6e23ae2a

由 Patrick McHardy 提交于 11月 19, 2007

The IPv4 and IPv6 hook values are identical, yet some code tries to figure
out the "correct" value by looking at the address family. Introduce NF_INET_*
values for both IPv4 and IPv6. The old values are kept in a #ifndef __KERNEL__
section for userspace compatibility.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e23ae2a

[IPV4]: Add ip_local_out · c439cb2e

由 Herbert Xu 提交于 1月 11, 2008

Most callers of the LOCAL_OUT chain will set the IP packet length and
header checksum before doing so.  They also share the same output
function dst_output.

This patch creates a new function called ip_local_out which does all
of that and converts the appropriate users over to it.

Apart from removing duplicate code, it will also help in merging the
IPsec output path once the same thing is done for IPv6.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c439cb2e

23 1月, 2008 2 次提交

[INET]: Fix truesize setting in ip_append_data · f945fa7a

由 Herbert Xu 提交于 1月 22, 2008

As it is ip_append_data only counts page fragments to the skb that
allocated it.  As such it means that the first skb gets hit with a
4K charge even though it might have only used a fraction of it while
all subsequent skb's that use the same page gets away with no charge
at all.

This bug was exposed by the UDP accounting patch.

[ The wmem_alloc bumping needs to be moved with the truesize,
  noticed by Takahiro Yasui.  -DaveM ]
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f945fa7a

[IPV4]: Add missing skb->truesize increment in ip_append_page(). · 1e34a11d

由 David S. Miller 提交于 1月 22, 2008

And as noted by Takahiro Yasui, we thus need to bump the
sk->sk_wmem_alloc at this spot as well.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1e34a11d

07 11月, 2007 1 次提交

[IPV4]: Consolidate the ip cork destruction in ip_output.c · 429f08e9

由 Pavel Emelyanov 提交于 11月 05, 2007

The ip_push_pending_frames and ip_flush_pending_frames do the
same things to flush the sock's cork. Move this into a separate
function and save ~80 bytes from the .text
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

429f08e9

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功