提交 · e3826f1e946e7d2354943232f1457be1455a29e2 · openanolis / cloud-kernel

16 5月, 2010 1 次提交

net: reserve ports for applications using fixed port numbers · e3826f1e

由 Amerigo Wang 提交于 5月 05, 2010

(Dropped the infiniband part, because Tetsuo modified the related code,
I will send a separate patch for it once this is accepted.)

This patch introduces /proc/sys/net/ipv4/ip_local_reserved_ports which
allows users to reserve ports for third-party applications.

The reserved ports will not be used by automatic port assignments
(e.g. when calling connect() or bind() with port number 0). Explicit
port allocation behavior is unchanged.
Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
Signed-off-by: NWANG Cong <amwang@redhat.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e3826f1e

13 5月, 2010 3 次提交

netfilter: remove unnecessary returns from void function()s · 736d58e3

由 Joe Perches 提交于 5月 13, 2010

This patch removes from net/ netfilter files
all the unnecessary return; statements that precede the
last closing brace of void functions.

It does not remove the returns that are immediately
preceded by a label as gcc doesn't like that.

Done via:
$ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
  xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'
Signed-off-by: NJoe Perches <joe@perches.com>
[Patrick: changed to keep return statements in otherwise empty function bodies]
Signed-off-by: NPatrick McHardy <kaber@trash.net>

736d58e3

netfilter: cleanup printk messages · 654d0fbd

由 Stephen Hemminger 提交于 5月 13, 2010

Make sure all printk messages have a severity level.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

654d0fbd

netfilter: change NF_ASSERT to WARN_ON · af567603

由 Stephen Hemminger 提交于 5月 13, 2010

Change netfilter asserts to standard WARN_ON. This has the
benefit of backtrace info and also causes netfilter errors
to show up on kerneloops.org.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

af567603

12 5月, 2010 5 次提交

netfilter: xtables: combine built-in extension structs · 4538506b

由 Jan Engelhardt 提交于 7月 04, 2009

Prepare the arrays for use with the multiregister function. The
future layer-3 xt matches can then be easily added to it without
needing more (un)register code.
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>

4538506b

netfilter: xtables: change hotdrop pointer to direct modification · b4ba2611

由 Jan Engelhardt 提交于 7月 07, 2009

Since xt_action_param is writable, let's use it. The pointer to
'bool hotdrop' always worried (8 bytes (64-bit) to write 1 byte!).
Surprisingly results in a reduction in size:

   text    data     bss filename
5457066  692730  357892 vmlinux.o-prev
5456554  692730  357892 vmlinux.o
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>

b4ba2611

netfilter: xtables: deconstify struct xt_action_param for matches · 62fc8051

由 Jan Engelhardt 提交于 7月 07, 2009

In future, layer-3 matches will be an xt module of their own, and
need to set the fragoff and thoff fields. Adding more pointers would
needlessy increase memory requirements (esp. so for 64-bit, where
pointers are wider).
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>

62fc8051

J
netfilter: xtables: substitute temporary defines by final name · 4b560b44
由 Jan Engelhardt 提交于 7月 05, 2009
```
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
```
4b560b44

netfilter: xtables: combine struct xt_match_param and xt_target_param · de74c169

由 Jan Engelhardt 提交于 7月 05, 2009

The structures carried - besides match/target - almost the same data.
It is possible to combine them, as extensions are evaluated serially,
and so, the callers end up a little smaller.

  text  data  bss  filename
-15318   740  104  net/ipv4/netfilter/ip_tables.o
+15286   740  104  net/ipv4/netfilter/ip_tables.o
-15333   540  152  net/ipv6/netfilter/ip6_tables.o
+15269   540  152  net/ipv6/netfilter/ip6_tables.o
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>

de74c169

10 5月, 2010 2 次提交

D
net: Fix FDDI and TR config checks in ipv4 arp and LLC. · f0ecde14
由 David S. Miller 提交于 5月 10, 2010
```
Need to check both CONFIG_FOO and CONFIG_FOO_MODULE
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
f0ecde14

IPv4: unresolved multicast route cleanup · bbd72543

由 Andreas Meissner 提交于 5月 10, 2010

Fixes the expiration timer for unresolved multicast route entries.
In case new multicast routing requests come in faster than the
expiration timeout occurs (e.g. zap through multicast TV streams), the
timer is prevented from being called at time for already existing entries.

As the single timer is resetted to default whenever a new entry is made,
the timeout for existing unresolved entires are missed and/or not
updated. As a consequence new requests are denied when the limit of
unresolved entries has been reached because old entries live longer than
they are supposed to.

The solution is to reset the timer only for the first unresolved entry
in the multicast routing cache. All other timers are already set and
updated correctly within the timer function itself by now.

Signed-off by: Andreas Meissner <andreas.meissner@sphairon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bbd72543

08 5月, 2010 1 次提交

ipv4: remove ip_rt_secret timer (v4) · 3ee94372

由 Neil Horman 提交于 5月 08, 2010

A while back there was a discussion regarding the rt_secret_interval timer.
Given that we've had the ability to do emergency route cache rebuilds for awhile
now, based on a statistical analysis of the various hash chain lengths in the
cache, the use of the flush timer is somewhat redundant. This patch removes the
rt_secret_interval sysctl, allowing us to rely solely on the statistical
analysis mechanism to determine the need for route cache flushes.
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ee94372

07 5月, 2010 1 次提交

ipv4: udp: fix short packet and bad checksum logging · ccc2d97c

由 Bjørn Mork 提交于 5月 06, 2010

commit 2783ef23 moved the initialisation of saddr and daddr after
pskb_may_pull() to avoid a potential data corruption.  Unfortunately
also placing it after the short packet and bad checksum error paths,
where these variables are used for logging.  The result is bogus
output like

[92238.389505] UDP: short packet: From 2.0.0.0:65535 23715/178 to 0.0.0.0:65535

Moving the saddr and daddr initialisation above the error paths, while still
keeping it after the pskb_may_pull() to keep the fix from commit 2783ef23.
Signed-off-by: NBjørn Mork <bjorn@mork.no>
Cc: stable@kernel.org
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ccc2d97c

02 5月, 2010 2 次提交
- J
  netfilter: xtables: dissolve do_match function · ef53d702
  由 Jan Engelhardt 提交于 7月 09, 2009
```
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
```
  ef53d702
- J
  netfilter: ip_tables: fix compilation when debug is enabled · b5cad0df
  由 Jan Engelhardt 提交于 5月 02, 2010
```
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
```
  b5cad0df
29 4月, 2010 3 次提交

net: ip_queue_rcv_skb() helper · f84af32c

由 Eric Dumazet 提交于 4月 28, 2010

When queueing a skb to socket, we can immediately release its dst if
target socket do not use IP_CMSG_PKTINFO.

tcp_data_queue() can drop dst too.

This to benefit from a hot cache line and avoid the receiver, possibly
on another cpu, to dirty this cache line himself.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f84af32c

net: speedup udp receive path · 4b0b72f7

由 Eric Dumazet 提交于 4月 28, 2010

Since commit 95766fff ([UDP]: Add memory accounting.), 
each received packet needs one extra sock_lock()/sock_release() pair.

This added latency because of possible backlog handling. Then later,
ticket spinlocks added yet another latency source in case of DDOS.

This patch introduces lock_sock_bh() and unlock_sock_bh()
synchronization primitives, avoiding one atomic operation and backlog
processing.

skb_free_datagram_locked() uses them instead of full blown
lock_sock()/release_sock(). skb is orphaned inside locked section for
proper socket memory reclaim, and finally freed outside of it.

UDP receive path now take the socket spinlock only once.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b0b72f7

Revert "tcp: bind() fix when many ports are bound" · 8d238b25

由 David S. Miller 提交于 4月 28, 2010

This reverts two commits:

fda48a0d
tcp: bind() fix when many ports are bound

and a follow-on fix for it:

6443bb1f
ipv6: Fix inet6_csk_bind_conflict()

It causes problems with binding listening sockets when time-wait
sockets from a previous instance still are alive.

It's too late to keep fiddling with this so late in the -rc
series, and we'll deal with it in net-next-2.6 instead.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d238b25

28 4月, 2010 3 次提交

net: sk_add_backlog() take rmem_alloc into account · c377411f

由 Eric Dumazet 提交于 4月 27, 2010

Current socket backlog limit is not enough to really stop DDOS attacks,
because user thread spend many time to process a full backlog each
round, and user might crazy spin on socket lock.

We should add backlog size and receive_queue size (aka rmem_alloc) to
pace writers, and let user run without being slow down too much.

Introduce a sk_rcvqueues_full() helper, to avoid taking socket lock in
stress situations.

Under huge stress from a multiqueue/RPS enabled NIC, a single flow udp
receiver can now process ~200.000 pps (instead of ~100 pps before the
patch) on a 8 core machine.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c377411f

net: Make RFS socket operations not be inet specific. · c58dc01b

由 David S. Miller 提交于 4月 27, 2010

Idea from Eric Dumazet.

As for placement inside of struct sock, I tried to choose a place
that otherwise has a 32-bit hole on 64-bit systems.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>

c58dc01b

TCP: avoid to send keepalive probes if receiving data · 6c37e5de

由 Flavio Leitner 提交于 4月 26, 2010

RFC 1122 says the following:
...
  Keep-alive packets MUST only be sent when no data or
  acknowledgement packets have been received for the
  connection within an interval.
...

The acknowledgement packet is reseting the keepalive
timer but the data packet isn't. This patch fixes it by
checking the timestamp of the last received data packet
too when the keepalive timer expires.
Signed-off-by: NFlavio Leitner <fleitner@redhat.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c37e5de

26 4月, 2010 3 次提交

net: ipmr: add support for dumping routing tables over netlink · cb6a4e46

由 Patrick McHardy 提交于 4月 26, 2010

The ipmr /proc interface (ip_mr_cache) can't be extended to dump routes
from any tables but the main table in a backwards compatible fashion since
the output format ends in a variable amount of output interfaces.

Introduce a new netlink interface to dump multicast routes from all tables,
similar to the netlink interface for regular routes.
Signed-off-by: NPatrick McHardy <kaber@trash.net>

cb6a4e46

net: rtnetlink: decouple rtnetlink address families from real address families · 25239cee

由 Patrick McHardy 提交于 4月 26, 2010

Decouple rtnetlink address families from real address families in socket.h to
be able to add rtnetlink interfaces to code that is not a real address family
without increasing AF_MAX/NPROTO.

This will be used to add support for multicast route dumping from all tables
as the proc interface can't be extended to support anything but the main table
without breaking compatibility.

This partialy undoes the patch to introduce independant families for routing
rules and converts ipmr routing rules to a new rtnetlink family. Similar to
that patch, values up to 127 are reserved for real address families, values
above that may be used arbitrarily.
Signed-off-by: NPatrick McHardy <kaber@trash.net>

25239cee

net: fib_rules: mark arguments to fib_rules_register const and __net_initdata · 3d0c9c4e

由 Patrick McHardy 提交于 4月 26, 2010

fib_rules_register() duplicates the template passed to it without modification,
mark the argument as const. Additionally the templates are only needed when
instantiating a new namespace, so mark them as __net_initdata, which means
they can be discarded when CONFIG_NET_NS=n.
Signed-off-by: NPatrick McHardy <kaber@trash.net>

3d0c9c4e

23 4月, 2010 3 次提交

netfilter: nf_conntrack: extend with extra stat counter · af740b2c

由 Jesper Dangaard Brouer 提交于 4月 23, 2010

I suspect an unfortunatly series of events occuring under a DDoS
attack, in function __nf_conntrack_find() nf_contrack_core.c.

Adding a stats counter to see if the search is restarted too often.
Signed-off-by: NJesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

af740b2c

tcp: bind() fix when many ports are bound · fda48a0d

由 Eric Dumazet 提交于 4月 21, 2010

Port autoselection done by kernel only works when number of bound
sockets is under a threshold (typically 30000).

When this threshold is over, we must check if there is a conflict before
exiting first loop in inet_csk_get_port()

Change inet_csk_bind_conflict() to forbid two reuse-enabled sockets to
bind on same (address,port) tuple (with a non ANY address)

Same change for inet6_csk_bind_conflict()
Reported-by: NGaspar Chilingarov <gasparch@gmail.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NEvgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fda48a0d

tcp: fix outsegs stat for TSO segments · aa2ea058

由 Tom Herbert 提交于 4月 22, 2010

Account for TSO segments of an skb in TCP_MIB_OUTSEGS counter.  Without
doing this, the counter can be off by orders of magnitude from the
actual number of segments sent.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa2ea058

22 4月, 2010 1 次提交

netfilter: ip_tables: convert pr_devel() to pr_debug() · cecc74de

由 Patrick McHardy 提交于 4月 22, 2010

We want to be able to use CONFIG_DYNAMIC_DEBUG in netfilter code, switch
the few existing pr_devel() calls to pr_debug().
Signed-off-by: NPatrick McHardy <kaber@trash.net>

cecc74de

21 4月, 2010 2 次提交

net: Fix various endianness glitches · 0eae88f3

由 Eric Dumazet 提交于 4月 20, 2010

Sparse can help us find endianness bugs, but we need to make some
cleanups to be able to more easily spot real bugs.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0eae88f3

net: sk_sleep() helper · aa395145

由 Eric Dumazet 提交于 4月 20, 2010

Define a new function to return the waitqueue of a "struct sock".

static inline wait_queue_head_t *sk_sleep(struct sock *sk)
{
	return sk->sk_sleep;
}

Change all read occurrences of sk_sleep by a call to this function.

Needed for a future RCU conversion. sk_sleep wont be a field directly
available.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa395145

20 4月, 2010 1 次提交

netfilter: bridge-netfilter: fix refragmenting IP traffic encapsulated in PPPoE traffic · 6c79bf0f

由 Bart De Schuymer 提交于 4月 20, 2010

The MTU for IP traffic encapsulated inside PPPoE traffic is smaller
than the MTU of the Ethernet device (1500). Connection tracking
gathers all IP packets and sometimes will refragment them in
ip_fragment(). We then need to subtract the length of the
encapsulating header from the mtu used in ip_fragment(). The check in
br_nf_dev_queue_xmit() which determines if ip_fragment() has to be
called is also updated for the PPPoE-encapsulated packets.
nf_bridge_copy_header() is also updated to make sure the PPPoE data
length field has the correct value.
Signed-off-by: NBart De Schuymer <bdschuym@pandora.be>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

6c79bf0f

19 4月, 2010 4 次提交

J
netfilter: xtables: remove old comments about reentrancy · 5b775eb1
由 Jan Engelhardt 提交于 4月 19, 2010
```
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
```
5b775eb1

netfilter: xt_TEE: have cloned packet travel through Xtables too · cd58bcd9

由 Jan Engelhardt 提交于 4月 19, 2010

Since Xtables is now reentrant/nestable, the cloned packet can also go
through Xtables and be subject to rules itself.
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

cd58bcd9

netfilter: xtables: make ip_tables reentrant · f3c5c1bf

由 Jan Engelhardt 提交于 4月 19, 2010

Currently, the table traverser stores return addresses in the ruleset
itself (struct ip6t_entry->comefrom). This has a well-known drawback:
the jumpstack is overwritten on reentry, making it necessary for
targets to return absolute verdicts. Also, the ruleset (which might
be heavy memory-wise) needs to be replicated for each CPU that can
possibly invoke ip6t_do_table.

This patch decouples the jumpstack from struct ip6t_entry and instead
puts it into xt_table_info. Not being restricted by 'comefrom'
anymore, we can set up a stack as needed. By default, there is room
allocated for two entries into the traverser.

arp_tables is not touched though, because there is just one/two
modules and further patches seek to collapse the table traverser
anyhow.
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

f3c5c1bf

netfilter: xtables: inclusion of xt_TEE · e281b198

由 Jan Engelhardt 提交于 4月 19, 2010

xt_TEE can be used to clone and reroute a packet. This can for
example be used to copy traffic at a router for logging purposes
to another dedicated machine.

References: http://www.gossamer-threads.com/lists/iptables/devel/68781Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

e281b198

17 4月, 2010 1 次提交

rfs: Receive Flow Steering · fec5e652

由 Tom Herbert 提交于 4月 16, 2010

This patch implements receive flow steering (RFS).  RFS steers
received packets for layer 3 and 4 processing to the CPU where
the application for the corresponding flow is running.  RFS is an
extension of Receive Packet Steering (RPS).

The basic idea of RFS is that when an application calls recvmsg
(or sendmsg) the application's running CPU is stored in a hash
table that is indexed by the connection's rxhash which is stored in
the socket structure.  The rxhash is passed in skb's received on
the connection from netif_receive_skb.  For each received packet,
the associated rxhash is used to look up the CPU in the hash table,
if a valid CPU is set then the packet is steered to that CPU using
the RPS mechanisms.

The convolution of the simple approach is that it would potentially
allow OOO packets.  If threads are thrashing around CPUs or multiple
threads are trying to read from the same sockets, a quickly changing
CPU value in the hash table could cause rampant OOO packets--
we consider this a non-starter.

To avoid OOO packets, this solution implements two types of hash
tables: rps_sock_flow_table and rps_dev_flow_table.

rps_sock_table is a global hash table.  Each entry is just a CPU
number and it is populated in recvmsg and sendmsg as described above.
This table contains the "desired" CPUs for flows.

rps_dev_flow_table is specific to each device queue.  Each entry
contains a CPU and a tail queue counter.  The CPU is the "current"
CPU for a matching flow.  The tail queue counter holds the value
of a tail queue counter for the associated CPU's backlog queue at
the time of last enqueue for a flow matching the entry.

Each backlog queue has a queue head counter which is incremented
on dequeue, and so a queue tail counter is computed as queue head
count + queue length.  When a packet is enqueued on a backlog queue,
the current value of the queue tail counter is saved in the hash
entry of the rps_dev_flow_table.

And now the trick: when selecting the CPU for RPS (get_rps_cpu)
the rps_sock_flow table and the rps_dev_flow table for the RX queue
are consulted.  When the desired CPU for the flow (found in the
rps_sock_flow table) does not match the current CPU (found in the
rps_dev_flow table), the current CPU is changed to the desired CPU
if one of the following is true:

- The current CPU is unset (equal to RPS_NO_CPU)
- Current CPU is offline
- The current CPU's queue head counter >= queue tail counter in the
rps_dev_flow table.  This checks if the queue tail has advanced
beyond the last packet that was enqueued using this table entry.
This guarantees that all packets queued using this entry have been
dequeued, thus preserving in order delivery.

Making each queue have its own rps_dev_flow table has two advantages:
1) the tail queue counters will be written on each receive, so
keeping the table local to interrupting CPU s good for locality.  2)
this allows lockless access to the table-- the CPU number and queue
tail counter need to be accessed together under mutual exclusion
from netif_receive_skb, we assume that this is only called from
device napi_poll which is non-reentrant.

This patch implements RFS for TCP and connected UDP sockets.
It should be usable for other flow oriented protocols.

There are two configuration parameters for RFS.  The
"rps_flow_entries" kernel init parameter sets the number of
entries in the rps_sock_flow_table, the per rxqueue sysfs entry
"rps_flow_cnt" contains the number of entries in the rps_dev_flow
table for the rxqueue.  Both are rounded to power of two.

The obvious benefit of RFS (over just RPS) is that it achieves
CPU locality between the receive processing for a flow and the
applications processing; this can result in increased performance
(higher pps, lower latency).

The benefits of RFS are dependent on cache hierarchy, application
load, and other factors.  On simple benchmarks, we don't necessarily
see improvement and sometimes see degradation.  However, for more
complex benchmarks and for applications where cache pressure is
much higher this technique seems to perform very well.

Below are some benchmark results which show the potential benfit of
this patch.  The netperf test has 500 instances of netperf TCP_RR
test with 1 byte req. and resp.  The RPC test is an request/response
test similar in structure to netperf RR test ith 100 threads on
each host, but does more work in userspace that netperf.

e1000e on 8 core Intel
   No RFS or RPS		104K tps at 30% CPU
   No RFS (best RPS config):    290K tps at 63% CPU
   RFS				303K tps at 61% CPU

RPC test	tps	CPU%	50/90/99% usec latency	Latency StdDev
  No RFS/RPS	103K	48%	757/900/3185		4472.35
  RPS only:	174K	73%	415/993/2468		491.66
  RFS		223K	73%	379/651/1382		315.61
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fec5e652

16 4月, 2010 3 次提交

net: replace ipfragok with skb->local_df · 4e15ed4d

由 Shan Wei 提交于 4月 15, 2010

As Herbert Xu said: we should be able to simply replace ipfragok
with skb->local_df. commit f88037(sctp: Drop ipfargok in sctp_xmit function)
has droped ipfragok and set local_df value properly.

The patch kills the ipfragok parameter of .queue_xmit().
Signed-off-by: NShan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e15ed4d

ip: Fix ip_dev_loopback_xmit() · e30b38c2

由 Eric Dumazet 提交于 4月 15, 2010

Eric Paris got following trace with a linux-next kernel

[   14.203970] BUG: using smp_processor_id() in preemptible [00000000]
code: avahi-daemon/2093
[   14.204025] caller is netif_rx+0xfa/0x110
[   14.204035] Call Trace:
[   14.204064]  [<ffffffff81278fe5>] debug_smp_processor_id+0x105/0x110
[   14.204070]  [<ffffffff8142163a>] netif_rx+0xfa/0x110
[   14.204090]  [<ffffffff8145b631>] ip_dev_loopback_xmit+0x71/0xa0
[   14.204095]  [<ffffffff8145b892>] ip_mc_output+0x192/0x2c0
[   14.204099]  [<ffffffff8145d610>] ip_local_out+0x20/0x30
[   14.204105]  [<ffffffff8145d8ad>] ip_push_pending_frames+0x28d/0x3d0
[   14.204119]  [<ffffffff8147f1cc>] udp_push_pending_frames+0x14c/0x400
[   14.204125]  [<ffffffff814803fc>] udp_sendmsg+0x39c/0x790
[   14.204137]  [<ffffffff814891d5>] inet_sendmsg+0x45/0x80
[   14.204149]  [<ffffffff8140af91>] sock_sendmsg+0xf1/0x110
[   14.204189]  [<ffffffff8140dc6c>] sys_sendmsg+0x20c/0x380
[   14.204233]  [<ffffffff8100ad82>] system_call_fastpath+0x16/0x1b

While current linux-2.6 kernel doesnt emit this warning, bug is latent
and might cause unexpected failures.

ip_dev_loopback_xmit() runs in process context, preemption enabled, so
must call netif_rx_ni() instead of netif_rx(), to make sure that we
process pending software interrupt.

Same change for ip6_dev_loopback_xmit()
Reported-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e30b38c2

netfilter: ipt_LOG/ip6t_LOG: use more appropriate log level as default · f0d57a54

由 Patrick McHardy 提交于 4月 15, 2010

Use KERN_NOTICE instead of KERN_EMERG by default. This only affects
kernel internal logging (like conntrack), user-specified logging rules
contain a seperate log level.
Signed-off-by: NPatrick McHardy <kaber@trash.net>

f0d57a54

15 4月, 2010 1 次提交

ipv4: ipmr: fix NULL pointer deref during unres queue destruction · 8de53dfb

由 Patrick McHardy 提交于 4月 15, 2010

Fix an oversight in ipmr_destroy_unres() - the net pointer is
unconditionally initialized to NULL, resulting in a NULL pointer
dereference later on.

Fix by adding a net pointer to struct mr_table and using it in
ipmr_destroy_unres().
Signed-off-by: NPatrick McHardy <kaber@trash.net>

8de53dfb

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功