提交 · c80a5cdfc5ca6533cb893154f546370da1fdb8f0 · openeuler / raspberrypi-kernel

26 5月, 2009 1 次提交

tcp: tcp_vegas ssthresh bugfix · c80a5cdf

由 Doug Leith 提交于 5月 25, 2009

This patch fixes ssthresh accounting issues in tcp_vegas when cwnd decreases
Signed-off-by: NDoug Leith <doug.leith@nuim.ie>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c80a5cdf

22 5月, 2009 1 次提交

ipv4: Fix oops with FIB_TRIE · 3ed18d76

由 Robert Olsson 提交于 5月 21, 2009

It seems we can fix this by disabling preemption while we re-balance the 
trie. This is with the CONFIG_CLASSIC_RCU. It's been stress-tested at high 
loads continuesly taking a full BGP table up/down via iproute -batch.

Note. fib_trie is not updated for CONFIG_PREEMPT_RCU

Reported-by: Andrei Popa
Signed-off-by: NRobert Olsson <robert.olsson@its.uu.se>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ed18d76

21 5月, 2009 2 次提交

net: fix rtable leak in net/ipv4/route.c · 1ddbcb00

由 Eric Dumazet 提交于 5月 19, 2009

Alexander V. Lukyanov found a regression in 2.6.29 and made a complete
analysis found in http://bugzilla.kernel.org/show_bug.cgi?id=13339
Quoted here because its a perfect one :

begin_of_quotation
 2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
 patch has at least one critical flaw, and another problem.

 rt_intern_hash calculates rthi pointer, which is later used for new entry
 insertion. The same loop calculates cand pointer which is used to clean the
 list. If the pointers are the same, rtable leak occurs, as first the cand is
 removed then the new entry is appended to it.

 This leak leads to unregister_netdevice problem (usage count > 0).

 Another problem of the patch is that it tries to insert the entries in certain
 order, to facilitate counting of entries distinct by all but QoS parameters.
 Unfortunately, referencing an existing rtable entry moves it to list beginning,
 to speed up further lookups, so the carefully built order is destroyed.

 For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
 it will also destroy the ordering.
end_of_quotation

Problematic commit is 1080d709
(net: implement emergency route cache rebulds when gc_elasticity is exceeded)

Trying to keep dst_entries ordered is too complex and breaks the fact that
order should depend on the frequency of use for garbage collection.

A possible fix is to make rt_intern_hash() simpler, and only makes
rt_check_expire() a litle bit smarter, being able to cope with an arbitrary
entries order. The added loop is running on cache hot data, while cpu
is prefetching next object, so should be unnoticied.
Reported-and-analyzed-by: NAlexander V. Lukyanov <lav@yar.ru>
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ddbcb00

net: fix length computation in rt_check_expire() · cf8da764

由 Eric Dumazet 提交于 5月 19, 2009

rt_check_expire() computes average and standard deviation of chain lengths,
but not correclty reset length to 0 at beginning of each chain.
This probably gives overflows for sum2 (and sum) on loaded machines instead
of meaningful results.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf8da764

19 5月, 2009 2 次提交

ipv4: make default for INET_LRO consistent with help text · bc8a5397

由 Frans Pop 提交于 5月 18, 2009

Commit e81963b1 ("ipv4: Make INET_LRO a bool instead of tristate.")
changed this config from tristate to bool.  Add default so that it is
consistent with the help text.
Signed-off-by: NFrans Pop <elendil@planet.nl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc8a5397

tcp: fix MSG_PEEK race check · 77527313

由 Ilpo Järvinen 提交于 5月 10, 2009

Commit 518a09ef (tcp: Fix recvmsg MSG_PEEK influence of
blocking behavior) lets the loop run longer than the race check
did previously expect, so we need to be more careful with this
check and consider the work we have been doing.

I tried my best to deal with urg hole madness too which happens
here:
	if (!sock_flag(sk, SOCK_URGINLINE)) {
		++*seq;
		...
by using additional offset by one but I certainly have very
little interest in testing that part.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Tested-by: NFrans Pop <elendil@planet.nl>
Tested-by: NIan Zimmermann <itz@buug.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77527313

18 5月, 2009 1 次提交

ipconfig: handle case of delayed DHCP server · 2513dfb8

由 Chris Friesen 提交于 5月 17, 2009

If a DHCP server is delayed, it's possible for the client to receive the 
DHCPOFFER after it has already sent out a new DHCPDISCOVER message from 
a second interface.  The client then sends out a DHCPREQUEST from the 
second interface, but the server doesn't recognize the device and 
rejects the request.

This patch simply tracks the current device being configured and throws 
away the OFFER if it is not intended for the current device.  A more 
sophisticated approach would be to put the OFFER information into the 
struct ic_device rather than storing it globally.
Signed-off-by: NChris Friesen <cfriesen@nortel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2513dfb8

09 5月, 2009 1 次提交

ipv4: Make INET_LRO a bool instead of tristate. · e81963b1

由 David S. Miller 提交于 5月 08, 2009

This code is used as a library by several device drivers,
which select INET_LRO.

If some are modules and some are statically built into the
kernel, we get build failures if INET_LRO is modular.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e81963b1

05 5月, 2009 1 次提交

tcp: Fix tcp_prequeue() to get correct rto_min value · 0c266898

由 Satoru SATOH 提交于 5月 04, 2009

tcp_prequeue() refers to the constant value (TCP_RTO_MIN) regardless of
the actual value might be tuned. The following patches fix this and make
tcp_prequeue get the actual value returns from tcp_rto_min().
Signed-off-by: NSatoru SATOH <satoru.satoh@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c266898

29 4月, 2009 1 次提交

netfilter: revised locking for x_tables · 942e4a2b

由 Stephen Hemminger 提交于 4月 28, 2009

The x_tables are organized with a table structure and a per-cpu copies
of the counters and rules. On older kernels there was a reader/writer
lock per table which was a performance bottleneck. In 2.6.30-rc, this
was converted to use RCU and the counters/rules which solved the performance
problems for do_table but made replacing rules much slower because of
the necessary RCU grace period.

This version uses a per-cpu set of spinlocks and counters to allow to
table processing to proceed without the cache thrashing of a global
reader lock and keeps the same performance for table updates.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

942e4a2b

27 4月, 2009 1 次提交

ipv4: Limit size of route cache hash table · c9503e0f

由 Anton Blanchard 提交于 4月 27, 2009

Right now we have no upper limit on the size of the route cache hash table.
On a 128GB POWER6 box it ends up as 32MB:

IP route cache hash table entries: 4194304 (order: 9, 33554432 bytes)

It would be nice to cap this for memory consumption reasons, but a massive
hashtable also causes a significant spike when measuring OS jitter.

With a 32MB hashtable and 4 million entries, rt_worker_func is taking
5 ms to complete. On another system with more memory it's taking 14 ms.
Even though rt_worker_func does call cond_sched() to limit its impact,
in an HPC environment we want to keep all sources of OS jitter to a minimum.

With the patch applied we limit the number of entries to 512k which
can still be overriden by using the rt_entries boot option:

IP route cache hash table entries: 524288 (order: 6, 4194304 bytes)

With this patch rt_worker_func now takes 0.460 ms on the same system.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Acked-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c9503e0f

20 4月, 2009 1 次提交

tcp: fix mid-wq adjustment helper · 52cf3cc8

由 Ilpo Järvinen 提交于 4月 18, 2009

Just noticed while doing some new work that the recent
mid-wq adjustment logic will misbehave when FACK is not
in use (happens either due sysctl'ed off or auto-detected
reordering) because I forgot the relevant TCPCB tagbit.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52cf3cc8

17 4月, 2009 2 次提交

gro: Fix use after free in tcp_gro_receive · a0a69a01

由 Herbert Xu 提交于 4月 17, 2009

After calling skb_gro_receive skb->len can no longer be relied
on since if the skb was merged using frags, then its pages will
have been removed and the length reduced.

This caused tcp_gro_receive to prematurely end merging which
resulted in suboptimal performance with ixgbe.

The fix is to store skb->len on the stack.
Reported-by: NMark Wagner <mwagner@redhat.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a0a69a01

netfilter: nf_nat: add support for persistent mappings · 98d500d6

由 Patrick McHardy 提交于 4月 16, 2009

The removal of the SAME target accidentally removed one feature that is
not available from the normal NAT targets so far, having multi-range
mappings that use the same mapping for each connection from a single
client. The current behaviour is to choose the address from the range
based on source and destination IP, which breaks when communicating
with sites having multiple addresses that require all connections to
originate from the same IP address.

Introduce a IP_NAT_RANGE_PERSISTENT option that controls whether the
destination address is taken into account for selecting addresses.

http://bugzilla.kernel.org/show_bug.cgi?id=12954Signed-off-by: NPatrick McHardy <kaber@trash.net>

98d500d6

14 4月, 2009 1 次提交

tcp: fix >2 iw selection · 86bcebaf

由 Ilpo Järvinen 提交于 4月 14, 2009

A long-standing feature in tcp_init_metrics() is such that
any of its goto reset prevents call to tcp_init_cwnd().
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86bcebaf

11 4月, 2009 1 次提交

ipv6: Fix NULL pointer dereference with time-wait sockets · 499923c7

由 Vlad Yasevich 提交于 4月 09, 2009

Commit b2f5e7cd
(ipv6: Fix conflict resolutions during ipv6 binding)
introduced a regression where time-wait sockets were
not treated correctly.  This resulted in the following:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000062
IP: [<ffffffff805d7d61>] ipv4_rcv_saddr_equal+0x61/0x70
...
Call Trace:
[<ffffffffa033847b>] ipv6_rcv_saddr_equal+0x1bb/0x250 [ipv6]
[<ffffffffa03505a8>] inet6_csk_bind_conflict+0x88/0xd0 [ipv6]
[<ffffffff805bb18e>] inet_csk_get_port+0x1ee/0x400
[<ffffffffa0319b7f>] inet6_bind+0x1cf/0x3a0 [ipv6]
[<ffffffff8056d17c>] ? sockfd_lookup_light+0x3c/0xd0
[<ffffffff8056ed49>] sys_bind+0x89/0x100
[<ffffffff80613ea2>] ? trace_hardirqs_on_thunk+0x3a/0x3c
[<ffffffff8020bf9b>] system_call_fastpath+0x16/0x1b
Tested-by: NBrian Haley <brian.haley@hp.com>
Tested-by: NEd Tomlinson <edt@aei.ca>
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

499923c7

03 4月, 2009 2 次提交

tcp: miscounts due to tcp_fragment pcount reset · 9eb9362e

由 Ilpo Järvinen 提交于 4月 01, 2009

It seems that trivial reset of pcount to one was not sufficient
in tcp_retransmit_skb. Multiple counters experience a positive
miscount when skb's pcount gets lowered without the necessary
adjustments (depending on skb's sacked bits which exactly), at
worst a packets_out miscount can crash at RTO if the write queue
is empty!

Triggering this requires mss change, so bidir tcp or mtu probe or
like.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
Tested-by: NUwe Bugla <uwe.bugla@gmx.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9eb9362e

tcp: add helper for counter tweaking due mid-wq change · 797108d1

由 Ilpo Järvinen 提交于 4月 01, 2009

We need full-scale adjustment to fix a TCP miscount in the next
patch, so just move it into a helper and call for that from the
other places.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

797108d1

02 4月, 2009 1 次提交

netfilter: use rcu_read_bh() in ipt_do_table() · fa9a86dd

由 Eric Dumazet 提交于 4月 02, 2009

Commit 78454473
(netfilter: iptables: lock free counters) forgot to disable BH
in arpt_do_table(), ipt_do_table() and  ip6t_do_table()

Use rcu_read_lock_bh() instead of rcu_read_lock() cures the problem.
Reported-and-bisected-by: NRoman Mindalev <r000n@r000n.net>
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Acked-by: NPatrick McHardy <kaber@trash.net>
Acked-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa9a86dd

01 4月, 2009 1 次提交

ipv4: remove unused parameter from tcp_recv_urg(). · 377f0a08

由 Rami Rosen 提交于 3月 31, 2009

Signed-off-by: NRami Rosen <ramirose@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

377f0a08

28 3月, 2009 2 次提交

netlabel: Label incoming TCP connections correctly in SELinux · 389fb800

由 Paul Moore 提交于 3月 27, 2009

The current NetLabel/SELinux behavior for incoming TCP connections works but
only through a series of happy coincidences that rely on the limited nature of
standard CIPSO (only able to convey MLS attributes) and the write equality
imposed by the SELinux MLS constraints. The problem is that network sockets
created as the result of an incoming TCP connection were not on-the-wire
labeled based on the security attributes of the parent socket but rather based
on the wire label of the remote peer. The issue had to do with how IP options
were managed as part of the network stack and where the LSM hooks were in
relation to the code which set the IP options on these newly created child
sockets. While NetLabel/SELinux did correctly set the socket's on-the-wire
label it was promptly cleared by the network stack and reset based on the IP
options of the remote peer.

This patch, in conjunction with a prior patch that adjusted the LSM hook
locations, works to set the correct on-the-wire label format for new incoming
connections through the security_inet_conn_request() hook. Besides the
correct behavior there are many advantages to this change, the most significant
is that all of the NetLabel socket labeling code in SELinux now lives in hooks
which can return error codes to the core stack which allows us to finally get
ride of the selinux_netlbl_inode_permission() logic which greatly simplfies
the NetLabel/SELinux glue code. In the process of developing this patch I
also ran into a small handful of AF_INET6 cleanliness issues that have been
fixed which should make the code safer and easier to extend in the future.
Signed-off-by: NPaul Moore <paul.moore@hp.com>
Acked-by: NCasey Schaufler <casey@schaufler-ca.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

389fb800

lsm: Relocate the IPv4 security_inet_conn_request() hooks · 284904aa

由 Paul Moore 提交于 3月 27, 2009

The current placement of the security_inet_conn_request() hooks do not allow
individual LSMs to override the IP options of the connection's request_sock.
This is a problem as both SELinux and Smack have the ability to use labeled
networking protocols which make use of IP options to carry security attributes
and the inability to set the IP options at the start of the TCP handshake is
problematic.

This patch moves the IPv4 security_inet_conn_request() hooks past the code
where the request_sock's IP options are set/reset so that the LSM can safely
manipulate the IP options as needed. This patch intentionally does not change
the related IPv6 hooks as IPv6 based labeling protocols which use IPv6 options
are not currently implemented, once they are we will have a better idea of
the correct placement for the IPv6 hooks.
Signed-off-by: NPaul Moore <paul.moore@hp.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NJames Morris <jmorris@namei.org>

284904aa

26 3月, 2009 4 次提交

H
netfilter: nf_conntrack: calculate per-protocol nlattr size · a400c30e
由 Holger Eitzenberger 提交于 3月 25, 2009
```
Signed-off-by: NHolger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
```
a400c30e

netfilter: nf_conntrack: use SLAB_DESTROY_BY_RCU and get rid of call_rcu() · ea781f19

由 Eric Dumazet 提交于 3月 25, 2009

Use "hlist_nulls" infrastructure we added in 2.6.29 for RCUification of UDP & TCP.

This permits an easy conversion from call_rcu() based hash lists to a
SLAB_DESTROY_BY_RCU one.

Avoiding call_rcu() delay at nf_conn freeing time has numerous gains.

First, it doesnt fill RCU queues (up to 10000 elements per cpu).
This reduces OOM possibility, if queued elements are not taken into account
This reduces latency problems when RCU queue size hits hilimit and triggers
emergency mode.

- It allows fast reuse of just freed elements, permitting better use of
CPU cache.

- We delete rcu_head from "struct nf_conn", shrinking size of this structure
by 8 or 16 bytes.

This patch only takes care of "struct nf_conn".
call_rcu() is still used for less critical conntrack parts, that may
be converted later if necessary.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

ea781f19

netfilter: {ip,ip6,arp}_tables: fix incorrect loop detection · 1f9352ae

由 Patrick McHardy 提交于 3月 25, 2009

Commit e1b4b9f3 ([NETFILTER]: {ip,ip6,arp}_tables: fix exponential worst-case
search for loops) introduced a regression in the loop detection algorithm,
causing sporadic incorrectly detected loops.

When a chain has already been visited during the check, it is treated as
having a standard target containing a RETURN verdict directly at the
beginning in order to not check it again. The real target of the first
rule is then incorrectly treated as STANDARD target and checked not to
contain invalid verdicts.

Fix by making sure the rule does actually contain a standard target.

Based on patch by Francis Dupont <Francis_Dupont@isc.org>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

1f9352ae

netfilter: factorize ifname_compare() · b8dfe498

由 Eric Dumazet 提交于 3月 25, 2009

We use same not trivial helper function in four places. We can factorize it.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

b8dfe498

25 3月, 2009 2 次提交

ipv6: Fix conflict resolutions during ipv6 binding · b2f5e7cd

由 Vlad Yasevich 提交于 3月 24, 2009

The ipv6 version of bind_conflict code calls ipv6_rcv_saddr_equal()
which at times wrongly identified intersections between addresses.
It particularly broke down under a few instances and caused erroneous
bind conflicts.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b2f5e7cd

arp_tables: ifname_compare() can assume 16bit alignment · 35c7f6de

由 Eric Dumazet 提交于 3月 24, 2009

Arches without efficient unaligned access can still perform a loop
assuming 16bit alignment in ifname_compare()
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35c7f6de

24 3月, 2009 1 次提交

udp: Wrong locking code in udp seq_file infrastructure · 30842f29

由 Vitaly Mayatskikh 提交于 3月 23, 2009

Reading zero bytes from /proc/net/udp or other similar files which use
the same seq_file udp infrastructure panics kernel in that way:

=====================================
[ BUG: bad unlock balance detected! ]
-------------------------------------
read/1985 is trying to release lock (&table->hash[i].lock) at:
[<ffffffff81321d83>] udp_seq_stop+0x27/0x29
but there are no more locks to release!

other info that might help us debug this:
1 lock held by read/1985:
 #0:  (&p->lock){--..}, at: [<ffffffff810eefb6>] seq_read+0x38/0x348

stack backtrace:
Pid: 1985, comm: read Not tainted 2.6.29-rc8 #9
Call Trace:
 [<ffffffff81321d83>] ? udp_seq_stop+0x27/0x29
 [<ffffffff8106dab9>] print_unlock_inbalance_bug+0xd6/0xe1
 [<ffffffff8106db62>] lock_release_non_nested+0x9e/0x1c6
 [<ffffffff810ef030>] ? seq_read+0xb2/0x348
 [<ffffffff8106bdba>] ? mark_held_locks+0x68/0x86
 [<ffffffff81321d83>] ? udp_seq_stop+0x27/0x29
 [<ffffffff8106dde7>] lock_release+0x15d/0x189
 [<ffffffff8137163c>] _spin_unlock_bh+0x1e/0x34
 [<ffffffff81321d83>] udp_seq_stop+0x27/0x29
 [<ffffffff810ef239>] seq_read+0x2bb/0x348
 [<ffffffff810eef7e>] ? seq_read+0x0/0x348
 [<ffffffff8111aedd>] proc_reg_read+0x90/0xaf
 [<ffffffff810d878f>] vfs_read+0xa6/0x103
 [<ffffffff8106bfac>] ? trace_hardirqs_on_caller+0x12f/0x153
 [<ffffffff810d88a2>] sys_read+0x45/0x69
 [<ffffffff8101123a>] system_call_fastpath+0x16/0x1b
BUG: scheduling while atomic: read/1985/0xffffff00
INFO: lockdep is turned off.
Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table dm_multipath kvm ppdev snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_seq_dummy snd_seq_oss snd_seq_midi_event arc4 snd_s
eq ecb thinkpad_acpi snd_seq_device iwl3945 hwmon sdhci_pci snd_pcm_oss sdhci rfkill mmc_core snd_mixer_oss i2c_i801 mac80211 yenta_socket ricoh_mmc i2c_core iTCO_wdt snd_pcm iTCO_vendor_support rs
rc_nonstatic snd_timer snd lib80211 cfg80211 soundcore snd_page_alloc video parport_pc output parport e1000e [last unloaded: scsi_wait_scan]
Pid: 1985, comm: read Not tainted 2.6.29-rc8 #9
Call Trace:
 [<ffffffff8106b456>] ? __debug_show_held_locks+0x1b/0x24
 [<ffffffff81043660>] __schedule_bug+0x7e/0x83
 [<ffffffff8136ede9>] schedule+0xce/0x838
 [<ffffffff810d7972>] ? fsnotify_access+0x5f/0x67
 [<ffffffff810112d0>] ? sysret_careful+0xb/0x37
 [<ffffffff8106be9c>] ? trace_hardirqs_on_caller+0x1f/0x153
 [<ffffffff8137127b>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff810112f6>] sysret_careful+0x31/0x37
read[1985]: segfault at 7fffc479bfe8 ip 0000003e7420a180 sp 00007fffc479bfa0 error 6
Kernel panic - not syncing: Aiee, killing interrupt handler!

udp_seq_stop() tries to unlock not yet locked spinlock. The lock was lost
during splitting global udp_hash_lock to subsequent spinlocks.

Signed-off by: Vitaly Mayatskikh <v.mayatskih@gmail.com>
Acked-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

30842f29

23 3月, 2009 1 次提交

tcp: Discard segments that ack data not yet sent · 96e0bf4b

由 John Dykstra 提交于 3月 22, 2009

Discard incoming packets whose ack field iincludes data not yet sent.
This is consistent with RFC 793 Section 3.9.

Change tcp_ack() to distinguish between too-small and too-large ack
field values.  Keep segments with too-large ack fields out of the fast
path, and change slow path to discard them.
Reported-by: NOliver Zheng <mailinglists+netdev@oliverzheng.com>
Signed-off-by: NJohn Dykstra <john.dykstra1@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

96e0bf4b

22 3月, 2009 1 次提交

net/*: use linux/kernel.h swap() · a0bffffc

由 Ilpo Järvinen 提交于 3月 21, 2009

tcp_sack_swap seems unnecessary so I pushed swap to the caller.
Also removed comment that seemed then pointless, and added include
when not already there. Compile tested.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a0bffffc

19 3月, 2009 2 次提交

netns: oops in ip[6]_frag_reasm incrementing stats · 2bad35b7

由 Jorge Boncompte [DTI2] 提交于 3月 18, 2009

dev can be NULL in ip[6]_frag_reasm for skb's coming from RAW sockets.

Quagga's OSPFD sends fragmented packets on a RAW socket, when netfilter
conntrack reassembles them on the OUTPUT path you hit this code path.

You can test it with something like "hping2 -0 -d 2000 -f AA.BB.CC.DD"

With help from Jarek Poplawski.
Signed-off-by: NJorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2bad35b7

tcp: remove parameter from tcp_recv_urg(). · beedad92

由 Rami Rosen 提交于 3月 18, 2009

This patch removes an unused parameter (addr_len) from tcp_recv_urg()
method in net/ipv4/tcp.c.
Signed-off-by: NRami Rosen <ramirose@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

beedad92

16 3月, 2009 7 次提交

netfilter: auto-load ip_queue module when socket opened · 95ba434f

由 Scott James Remnant 提交于 3月 16, 2009

The ip_queue module is missing the net-pf-16-proto-3 alias that would
causae it to be auto-loaded when a socket of that type is opened.  This
patch adds the alias.
Signed-off-by: NScott James Remnant <scott@canonical.com>
Signed-off-by: NTim Gardner <tim.gardner@canonical.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

95ba434f

netfilter: conntrack: increase drop stats if sequence adjustment fails · 1db7a748

由 Pablo Neira Ayuso 提交于 3月 16, 2009

This patch increases the statistics of packets drop if the sequence
adjustment fails in ipv4_confirm().
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

1db7a748

netfilter: Kconfig spelling fixes (trivial) · 67c0d579

由 Stephen Hemminger 提交于 3月 16, 2009

Signed-off-by: NStephen Hemminger <sheminger@vyatta.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

67c0d579

netfilter: use a linked list of loggers · ca735b3a

由 Eric Leblond 提交于 3月 16, 2009

This patch modifies nf_log to use a linked list of loggers for each
protocol. This list of loggers is read and write protected with a
mutex.

This patch separates registration and binding. To be used as
logging module, a module has to register calling nf_log_register()
and to bind to a protocol it has to call nf_log_bind_pf().
This patch also converts the logging modules to the new API. For nfnetlink_log,
it simply switchs call to register functions to call to bind function and
adds a call to nf_log_register() during init. For other modules, it just
remove a const flag from the logger structure and replace it with a
__read_mostly.
Signed-off-by: NEric Leblond <eric@inl.fr>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

ca735b3a

tcp: make sure xmit goal size never becomes zero · afece1c6

由 Ilpo Järvinen 提交于 3月 14, 2009

It's not too likely to happen, would basically require crafted
packets (must hit the max guard in tcp_bound_to_half_wnd()).
It seems that nothing that bad would happen as there's tcp_mems
and congestion window that prevent runaway at some point from
hurting all too much (I'm not that sure what all those zero
sized segments we would generate do though in write queue).
Preventing it regardless is certainly the best way to go.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

afece1c6

tcp: cache result of earlier divides when mss-aligning things · 2a3a041c

由 Ilpo Järvinen 提交于 3月 14, 2009

The results is very unlikely change every so often so we
hardly need to divide again after doing that once for a
connection. Yet, if divide still becomes necessary we
detect that and do the right thing and again settle for
non-divide state. Takes the u16 space which was previously
taken by the plain xmit_size_goal.

This should take care part of the tso vs non-tso difference
we found earlier.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2a3a041c

tcp: simplify tcp_current_mss · 0c54b85f

由 Ilpo Järvinen 提交于 3月 14, 2009

There's very little need for most of the callsites to get
tp->xmit_goal_size updated. That will cost us divide as is,
so slice the function in two. Also, the only users of the
tp->xmit_goal_size are directly behind tcp_current_mss(),
so there's no need to store that variable into tcp_sock
at all! The drop of xmit_goal_size currently leaves 16-bit
hole and some reorganization would again be necessary to
change that (but I'm aiming to fill that hole with u16
xmit_goal_size_segs to cache the results of the remaining
divide to get that tso on regression).

Bring xmit_goal_size parts into tcp.c
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c54b85f