提交 · 7e32aa4d8cc075e5109170ae8c43be550cebb14a · openanolis / cloud-kernel

03 8月, 2014 3 次提交

inet: frags: use kmem_cache for inet_frag_queue · d4ad4d22

由 Nikolay Aleksandrov 提交于 8月 01, 2014

Use kmem_cache to allocate/free inet_frag_queue objects since they're
all the same size per inet_frags user and are alloced/freed in high volumes
thus making it a perfect case for kmem_cache.
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Acked-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4ad4d22

inet: frags: use INET_FRAG_EVICTED to prevent icmp messages · 2e404f63

由 Nikolay Aleksandrov 提交于 8月 01, 2014

Now that we have INET_FRAG_EVICTED we might as well use it to stop
sending icmp messages in the "frag_expire" functions instead of
stripping INET_FRAG_FIRST_IN from their flags when evicting.
Also fix the comment style in ip6_expire_frag_queue().
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Reviewed-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2e404f63

inet: frags: rename last_in to flags · 06aa8b8a

由 Nikolay Aleksandrov 提交于 8月 01, 2014

The last_in field has been used to store various flags different from
first/last frag in so give it a more descriptive name: flags.
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

06aa8b8a

28 7月, 2014 9 次提交

inet: frag: set limits and make init_net's high_thresh limit global · 1bab4c75

由 Nikolay Aleksandrov 提交于 7月 24, 2014

This patch makes init_net's high_thresh limit to be the maximum for all
namespaces, thus introducing a global memory limit threshold equal to the
sum of the individual high_thresh limits which are capped.
It also introduces some sane minimums for low_thresh as it shouldn't be
able to drop below 0 (or > high_thresh in the unsigned case), and
overall low_thresh should not ever be above high_thresh, so we make the
following relations for a namespace:
init_net:
 high_thresh - max(not capped), min(init_net low_thresh)
 low_thresh - max(init_net high_thresh), min (0)

all other namespaces:
 high_thresh = max(init_net high_thresh), min(namespace's low_thresh)
 low_thresh = max(namespace's high_thresh), min(0)

The major issue with having low_thresh > high_thresh is that we'll
schedule eviction but never evict anything and thus rely only on the
timers.
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1bab4c75

inet: frag: use seqlock for hash rebuild · ab1c724f

由 Florian Westphal 提交于 7月 24, 2014

rehash is rare operation, don't force readers to take
the read-side rwlock.

Instead, we only have to detect the (rare) case where
the secret was altered while we are trying to insert
a new inetfrag queue into the table.

If it was changed, drop the bucket lock and recompute
the hash to get the 'new' chain bucket that we have to
insert into.

Joint work with Nikolay Aleksandrov.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab1c724f

inet: frag: remove periodic secret rebuild timer · e3a57d18

由 Florian Westphal 提交于 7月 24, 2014

merge functionality into the eviction workqueue.

Instead of rebuilding every n seconds, take advantage of the upper
hash chain length limit.

If we hit it, mark table for rebuild and schedule workqueue.
To prevent frequent rebuilds when we're completely overloaded,
don't rebuild more than once every 5 seconds.

ipfrag_secret_interval sysctl is now obsolete and has been marked as
deprecated, it still can be changed so scripts won't be broken but it
won't have any effect. A comment is left above each unused secret_timer
variable to avoid confusion.

Joint work with Nikolay Aleksandrov.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e3a57d18

inet: frag: remove lru list · 3fd588eb

由 Florian Westphal 提交于 7月 24, 2014

no longer used.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3fd588eb

inet: frag: don't account number of fragment queues · 434d3054

由 Florian Westphal 提交于 7月 24, 2014

The 'nqueues' counter is protected by the lru list lock,
once thats removed this needs to be converted to atomic
counter.  Given this isn't used for anything except for
reporting it to userspace via /proc, just remove it.

We still report the memory currently used by fragment
reassembly queues.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

434d3054

inet: frag: move eviction of queues to work queue · b13d3cbf

由 Florian Westphal 提交于 7月 24, 2014

When the high_thresh limit is reached we try to toss the 'oldest'
incomplete fragment queues until memory limits are below the low_thresh
value.  This happens in softirq/packet processing context.

This has two drawbacks:

1) processors might evict a queue that was about to be completed
by another cpu, because they will compete wrt. resource usage and
resource reclaim.

2) LRU list maintenance is expensive.

But when constantly overloaded, even the 'least recently used' element is
recent, so removing 'lru' queue first is not 'fairer' than removing any
other fragment queue.

This moves eviction out of the fast path:

When the low threshold is reached, a work queue is scheduled
which then iterates over the table and removes the queues that exceed
the memory limits of the namespace. It sets a new flag called
INET_FRAG_EVICTED on the evicted queues so the proper counters will get
incremented when the queue is forcefully expired.

When the high threshold is reached, no more fragment queues are
created until we're below the limit again.

The LRU list is now unused and will be removed in a followup patch.

Joint work with Nikolay Aleksandrov.
Suggested-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b13d3cbf

inet: frag: move evictor calls into frag_find function · 86e93e47

由 Florian Westphal 提交于 7月 24, 2014

First step to move eviction handling into a work queue.

We lose two spots that accounted evicted fragments in MIB counters.

Accounting will be restored since the upcoming work-queue evictor
invokes the frag queue timer callbacks instead.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86e93e47

inet: frag: remove hash size assumptions from callers · fb3cfe6e

由 Florian Westphal 提交于 7月 24, 2014

hide actual hash size from individual users: The _find
function will now fold the given hash value into the required range.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb3cfe6e

F
inet: frag: constify match, hashfn and constructor arguments · 36c77782
由 Florian Westphal 提交于 7月 24, 2014
```
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
36c77782

05 5月, 2014 1 次提交

ipv4: fix "conntrack zones" support for defrag user check in ip_expire · 7c3d5ab1

由 Vasily Averin 提交于 5月 03, 2014

Defrag user check in ip_expire was not updated after adding support for
"conntrack zones".

This bug manifests as a RFC violation, since the router will send
the icmp time exceeeded message when using conntrack zones.
Signed-off-by: NVasily Averin <vvs@openvz.org>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

7c3d5ab1

18 12月, 2013 1 次提交

net: Add utility functions to clear rxhash · 7539fadc

由 Tom Herbert 提交于 12月 15, 2013

In several places 'skb->rxhash = 0' is being done to clear the
rxhash value in an skb.  This does not clear l4_rxhash which could
still be set so that the rxhash wouldn't be recalculated on subsequent
call to skb_get_rxhash.  This patch adds an explict function to clear
all the rxhash related information in the skb properly.

skb_clear_hash_if_not_l4 clears the rxhash only if it is not marked as
l4_rxhash.

Fixed up places where 'skb->rxhash = 0' was being called.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7539fadc

24 10月, 2013 1 次提交

ipv4: initialize ip4_frags hash secret as late as possible · e7b519ba

由 Hannes Frederic Sowa 提交于 10月 23, 2013

Defer the generation of the first hash secret for the ipv4 fragmentation
cache as late as possible.

ip4_frags.rnd gets initial seeded by inet_frags_init and regulary
reseeded by inet_frag_secret_rebuild. Either we call ipqhashfn directly
from ip_fragment.c in which case we initialize the secret directly.

If we first get called by inet_frag_secret_rebuild we install a new secret
by a manual call to get_random_bytes. This secret will be overwritten
as soon as the first call to ipqhashfn happens. This is safe because we
won't race while publishing the new secrets with anyone else.

Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e7b519ba

17 4月, 2013 1 次提交

net: drop dst before queueing fragments · 97599dc7

由 Eric Dumazet 提交于 4月 16, 2013

Commit 4a94445c (net: Use ip_route_input_noref() in input path)
added a bug in IP defragmentation handling, as non refcounted
dst could escape an RCU protected section.

Commit 64f3b9e2 (net: ip_expire() must revalidate route) fixed
the case of timeouts, but not the general problem.

Tom Parkin noticed crashes in UDP stack and provided a patch,
but further analysis permitted us to pinpoint the root cause.

Before queueing a packet into a frag list, we must drop its dst,
as this dst has limited lifetime (RCU protected)

When/if a packet is finally reassembled, we use the dst of the very
last skb, still protected by RCU and valid, as the dst of the
reassembled packet.

Use same logic in IPv6, as there is no need to hold dst references.
Reported-by: NTom Parkin <tparkin@katalix.com>
Tested-by: NTom Parkin <tparkin@katalix.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

97599dc7

25 3月, 2013 1 次提交

inet: generalize ipv4-only RFC3168 5.3 ecn fragmentation handling for future use by ipv6 · be991971

由 Hannes Frederic Sowa 提交于 3月 22, 2013

This patch just moves some code arround to make the ip4_frag_ecn_table
and IPFRAG_ECN_* constants accessible from the other reassembly engines. I
also renamed ip4_frag_ecn_table to ip_frag_ecn_table.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jesper Dangaard Brouer <jbrouer@redhat.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be991971

19 3月, 2013 1 次提交

inet: limit length of fragment queue hash table bucket lists · 5a3da1fe

由 Hannes Frederic Sowa 提交于 3月 15, 2013

This patch introduces a constant limit of the fragment queue hash
table bucket list lengths. Currently the limit 128 is choosen somewhat
arbitrary and just ensures that we can fill up the fragment cache with
empty packets up to the default ip_frag_high_thresh limits. It should
just protect from list iteration eating considerable amounts of cpu.

If we reach the maximum length in one hash bucket a warning is printed.
This is implemented on the caller side of inet_frag_find to distinguish
between the different users of inet_fragment.c.

I dropped the out of memory warning in the ipv4 fragment lookup path,
because we already get a warning by the slab allocator.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jesper Dangaard Brouer <jbrouer@redhat.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a3da1fe

16 2月, 2013 1 次提交

net: Add skb_unclone() helper function. · 14bbd6a5

由 Pravin B Shelar 提交于 2月 14, 2013

This function will be used in next GRE_GSO patch. This patch does
not change any functionality.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NEric Dumazet <edumazet@google.com>

14bbd6a5

30 1月, 2013 2 次提交

net: frag, move LRU list maintenance outside of rwlock · 3ef0eb0d

由 Jesper Dangaard Brouer 提交于 1月 28, 2013

Updating the fragmentation queues LRU (Least-Recently-Used) list,
required taking the hash writer lock.  However, the LRU list isn't
tied to the hash at all, so we can use a separate lock for it.
Original-idea-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ef0eb0d

net: frag helper functions for mem limit tracking · d433673e

由 Jesper Dangaard Brouer 提交于 1月 28, 2013

This change is primarily a preparation to ease the extension of memory
limit tracking.

The change does reduce the number atomic operation, during freeing of
a frag queue.  This does introduce a some performance improvement, as
these atomic operations are at the core of the performance problems
seen on NUMA systems.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d433673e

18 1月, 2013 1 次提交

net: increase fragment memory usage limits · c2a93660

由 Jesper Dangaard Brouer 提交于 1月 15, 2013

Increase the amount of memory usage limits for incomplete
IP fragments.

Arguing for new thresh high/low values:

 High threshold = 4 MBytes
 Low  threshold = 3 MBytes

The fragmentation memory accounting code, tries to account for the
real memory usage, by measuring both the size of frag queue struct
(inet_frag_queue (ipv4:ipq/ipv6:frag_queue)) and the SKB's truesize.

We want to be able to handle/hold-on-to enough fragments, to ensure
good performance, without causing incomplete fragments to hurt
scalability, by causing the number of inet_frag_queue to grow too much
(resulting longer searches for frag queues).

For IPv4, how much memory does the largest frag consume.

Maximum size fragment is 64K, which is approx 44 fragments with
MTU(1500) sized packets. Sizeof(struct ipq) is 200.  A 1500 byte
packet results in a truesize of 2944 (not 2048 as I first assumed)

  (44*2944)+200 = 129736 bytes

The current default high thresh of 262144 bytes, is obviously
problematic, as only two 64K fragments can fit in the queue at the
same time.

How many 64K fragment can we fit into 4 MBytes:

  4*2^20/((44*2944)+200) = 32.34 fragment in queues

An attacker could send a separate/distinct fake fragment packets per
queue, causing us to allocate one inet_frag_queue per packet, and thus
attacking the hash table and its lists.

How many frag queue do we need to store, and given a current hash size
of 64, what is the average list length.

Using one MTU sized fragment per inet_frag_queue, each consuming
(2944+200) 3144 bytes.

  4*2^20/(2944+200) = 1334 frag queues -> 21 avg list length

An attack could send small fragments, the smallest packet I could send
resulted in a truesize of 896 bytes (I'm a little surprised by this).

  4*2^20/(896+200)  = 3827 frag queues -> 59 avg list length

When increasing these number, we also need to followup with
improvements, that is going to help scalability.  Simply increasing
the hash size, is not enough as the current implementation does not
have a per hash bucket locking.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c2a93660

11 12月, 2012 1 次提交

ipv4: ip_check_defrag must not modify skb before unsharing · 1bf3751e

由 Johannes Berg 提交于 12月 09, 2012

ip_check_defrag() might be called from af_packet within the
RX path where shared SKBs are used, so it must not modify
the input SKB before it has unshared it for defragmentation.
Use skb_copy_bits() to get the IP header and only pull in
everything later.

The same is true for the other caller in macvlan as it is
called from dev->rx_handler which can also get a shared SKB.
Reported-by: NEric Leblond <eric@regit.org>
Cc: stable@vger.kernel.org
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1bf3751e

19 11月, 2012 1 次提交

net: Don't export sysctls to unprivileged users · 464dc801

由 Eric W. Biederman 提交于 11月 16, 2012

In preparation for supporting the creation of network namespaces
by unprivileged users, modify all of the per net sysctl exports
and refuse to allow them to unprivileged users.

This makes it safe for unprivileged users in general to access
per net sysctls, and allows sysctls to be exported to unprivileged
users on an individual basis as they are deemed safe.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

464dc801

20 9月, 2012 1 次提交

ipv6: unify fragment thresh handling code · 6b102865

由 Amerigo Wang 提交于 9月 18, 2012

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b102865

27 8月, 2012 1 次提交

ipv4: fix path MTU discovery with connection tracking · 5f2d04f1

由 Patrick McHardy 提交于 8月 26, 2012

IPv4 conntrack defragments incoming packet at the PRE_ROUTING hook and
(in case of forwarded packets) refragments them at POST_ROUTING
independent of the IP_DF flag. Refragmentation uses the dst_mtu() of
the local route without caring about the original fragment sizes,
thereby breaking PMTUD.

This patch fixes this by keeping track of the largest received fragment
with IP_DF set and generates an ICMP fragmentation required error during
refragmentation if that size exceeds the MTU.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Acked-by: NEric Dumazet <edumazet@google.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>

5f2d04f1

27 7月, 2012 1 次提交

ipv4: Fix input route performance regression. · c6cffba4

由 David S. Miller 提交于 7月 26, 2012

With the routing cache removal we lost the "noref" code paths on
input, and this can kill some routing workloads.

Reinstate the noref path when we hit a cached route in the FIB
nexthops.

With help from Eric Dumazet.
Reported-by: NAlexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c6cffba4

21 7月, 2012 1 次提交

ipv4: Kill ip_route_input_noref(). · 38a424e4

由 David Miller 提交于 7月 01, 2012

The "noref" argument to ip_route_input_common() is now always ignored
because we do not cache routes, and in that case we must always grab
a reference to the resulting 'dst'.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

38a424e4

28 6月, 2012 2 次提交

Revert "ipv4: tcp: dont cache unconfirmed intput dst" · c10237e0

由 David S. Miller 提交于 6月 27, 2012

This reverts commit c074da28.

This change has several unwanted side effects:

1) Sockets will cache the DST_NOCACHE route in sk->sk_rx_dst and we'll
   thus never create a real cached route.

2) All TCP traffic will use DST_NOCACHE and never use the routing
   cache at all.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c10237e0

ipv4: tcp: dont cache unconfirmed intput dst · c074da28

由 Eric Dumazet 提交于 6月 26, 2012

DDOS synflood attacks hit badly IP route cache.

On typical machines, this cache is allowed to hold up to 8 Millions dst
entries, 256 bytes for each, for a total of 2GB of memory.

rt_garbage_collect() triggers and tries to cleanup things.

Eventually route cache is disabled but machine is under fire and might
OOM and crash.

This patch exploits the new TCP early demux, to set a nocache
boolean in case incoming TCP frame is for a not yet ESTABLISHED or
TIMEWAIT socket.

This 'nocache' boolean is then used in case dst entry is not found in
route cache, to create an unhashed dst entry (DST_NOCACHE)

SYN-cookie-ACK sent use a similar mechanism (ipv4: tcp: dont cache
output dst for syncookies), so after this patch, a machine is able to
absorb a DDOS synflood attack without polluting its IP route cache.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c074da28

10 6月, 2012 1 次提交

inet: Pass inetpeer root into inet_getpeer*() interfaces. · c0efc887

由 David S. Miller 提交于 6月 09, 2012

Otherwise we reference potentially non-existing members when
ipv6 is disabled.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c0efc887

09 6月, 2012 1 次提交

inetpeer: add parameter net for inet_getpeer_v4,v6 · 54db0cc2

由 Gao feng 提交于 6月 08, 2012

add struct net as a parameter of inet_getpeer_v[4,6],
use net to replace &init_net.

and modify some places to provide net for inet_getpeer_v[4,6]
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

54db0cc2

20 5月, 2012 1 次提交

ipv4: use skb coalescing in defragmentation · 3cc49492

由 Eric Dumazet 提交于 5月 19, 2012

ip_frag_reasm() can use skb_try_coalesce() to build optimized skb,
reducing memory used by them (truesize), and reducing number of cache
line misses and overhead for the consumer.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3cc49492

18 5月, 2012 1 次提交

ip_frag: struct inet_frags match() method returns a bool · cbc264ca

由 Eric Dumazet 提交于 5月 18, 2012

- match() method returns a boolean
- return (A && B && C && D) -> return A && B && C && D
- fix indentation
Signed-off-by: NEric Dumazet <edumazet@google.com>

cbc264ca

16 5月, 2012 1 次提交

net: Convert net_ratelimit uses to net_<level>_ratelimited · e87cc472

由 Joe Perches 提交于 5月 13, 2012

Standardize the net core ratelimited logging functions.

Coalesce formats, align arguments.
Change a printk then vprintk sequence to use printf extension %pV.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e87cc472

21 4月, 2012 2 次提交

net: Convert all sysctl registrations to register_net_sysctl · ec8f23ce

由 Eric W. Biederman 提交于 4月 19, 2012

This results in code with less boiler plate that is a bit easier
to read.

Additionally stops us from using compatibility code in the sysctl
core, hastening the day when the compatibility code can be removed.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Acked-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec8f23ce

net: Kill register_sysctl_rotable · 43444757

由 Eric W. Biederman 提交于 4月 19, 2012

register_sysctl_rotable never caught on as an interesting way to
register sysctls.  My take on the situation is that what we want are
sysctls that we can only see in the initial network namespace.  What we
have implemented with register_sysctl_rotable are sysctls that we can
see in all of the network namespaces and can only change in the initial
network namespace.

That is a very silly way to go.  Just register the network sysctls
in the initial network namespace and we don't have any weird special
cases to deal with.

The sysctls affected are:
/proc/sys/net/ipv4/ipfrag_secret_interval
/proc/sys/net/ipv4/ipfrag_max_dist
/proc/sys/net/ipv6/ip6frag_secret_interval
/proc/sys/net/ipv6/mld_max_msf

I really don't expect anyone will miss them if they can't read them in a
child user namespace.

CC: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Acked-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

43444757

20 4月, 2012 1 次提交

ipv4: dont drop packet in defrag but consume it · cbf8f7bb

由 Eric Dumazet 提交于 4月 19, 2012

When defragmentation is finalized, we clone a packet and kfree_skb() it.

Call consume_skb() to not confuse dropwatch, since its not a drop.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cbf8f7bb

13 3月, 2012 1 次提交

net: ipv4: Standardize prefixes for message logging · afd46503

由 Joe Perches 提交于 3月 12, 2012

Add #define pr_fmt(fmt) as appropriate.

Add "IPv4: ", "TCP: ", and "IPsec: " to appropriate files.
Standardize on "UDPLite: " for appropriate uses.
Some prefixes were previously "UDPLITE: " and "UDP-Lite: ".

Add KBUILD_MODNAME ": " to icmp and gre.
Remove embedded prefixes as appropriate.

Add missing "\n" to pr_info in gre.c.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

afd46503

12 3月, 2012 1 次提交

net: Convert printks to pr_<level> · 058bd4d2

由 Joe Perches 提交于 3月 11, 2012

Use a more current kernel messaging style.

Convert a printk block to print_hex_dump.
Coalesce formats, align arguments.
Use %s, __func__ instead of embedding function names.

Some messages that were prefixed with <foo>_close are
now prefixed with <foo>_fini.  Some ah4 and esp messages
are now not prefixed with "ip ".

The intent of this patch is to later add something like
  #define pr_fmt(fmt) "IPv4: " fmt.
to standardize the output messages.

Text size is trivially reduced. (x86-32 allyesconfig)

$ size net/ipv4/built-in.o*
   text	   data	    bss	    dec	    hex	filename
 887888	  31558	 249696	1169142	 11d6f6	net/ipv4/built-in.o.new
 887934	  31558	 249800	1169292	 11d78c	net/ipv4/built-in.o.old
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

058bd4d2

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功