提交 · f6d8bd051c391c1c0458a30b2a7abcd939329259 · _Walt / cloud-kernel

29 4月, 2011 1 次提交

inet: add RCU protection to inet->opt · f6d8bd05

由 Eric Dumazet 提交于 4月 21, 2011

We lack proper synchronization to manipulate inet->opt ip_options

Problem is ip_make_skb() calls ip_setup_cork() and
ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options),
without any protection against another thread manipulating inet->opt.

Another thread can change inet->opt pointer and free old one under us.

Use RCU to protect inet->opt (changed to inet->inet_opt).

Instead of handling atomic refcounts, just copy ip_options when
necessary, to avoid cache line dirtying.

We cant insert an rcu_head in struct ip_options since its included in
skb->cb[], so this patch is large because I had to introduce a new
ip_options_rcu structure.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6d8bd05

28 4月, 2011 2 次提交

ipv4: Remove erroneous check in igmpv3_newpack() and igmp_send_report(). · 2e97e980

由 David S. Miller 提交于 4月 26, 2011

Output route resolution never returns a route with rt_src set to zero
(which is INADDR_ANY).

Even if the flow key for the output route lookup specifies INADDR_ANY
for the source address, the output route resolution chooses a real
source address to use in the final route.

This test has existed forever in igmp_send_report() and David Stevens
simply copied over the erroneous test when implementing support for
IGMPv3.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Reviewed-by: NEric Dumazet <eric.dumazet@gmail.com>

2e97e980

ipv4: Sanitize and simplify ip_route_{connect,newports}() · 2d7192d6

由 David S. Miller 提交于 4月 26, 2011

These functions are used together as a unit for route resolution
during connect().  They address the chicken-and-egg problem that
exists when ports need to be allocated during connect() processing,
yet such port allocations require addressing information from the
routing code.

It's currently more heavy handed than it needs to be, and in
particular we allocate and initialize a flow object twice.

Let the callers provide the on-stack flow object.  That way we only
need to initialize it once in the ip_route_connect() call.

Later, if ip_route_newports() needs to do anything, it re-uses that
flow object as-is except for the ports which it updates before the
route re-lookup.

Also, describe why this set of facilities are needed and how it works
in a big comment.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Reviewed-by: NEric Dumazet <eric.dumazet@gmail.com>

2d7192d6

26 4月, 2011 1 次提交

net: provide cow_metrics() methods to blackhole dst_ops · 0972ddb2

由 Held Bernhard 提交于 4月 24, 2011

Since commit 62fa8a84 (net: Implement read-only protection and COW'ing
of metrics.) the kernel throws an oops.

[  101.620985] BUG: unable to handle kernel NULL pointer dereference at
           (null)
[  101.621050] IP: [<          (null)>]           (null)
[  101.621084] PGD 6e53c067 PUD 3dd6a067 PMD 0
[  101.621122] Oops: 0010 [#1] SMP
[  101.621153] last sysfs file: /sys/devices/virtual/ppp/ppp/uevent
[  101.621192] CPU 2
[  101.621206] Modules linked in: l2tp_ppp pppox ppp_generic slhc
l2tp_netlink l2tp_core deflate zlib_deflate twofish_x86_64
twofish_common des_generic cbc ecb sha1_generic hmac af_key
iptable_filter snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device loop
snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec
snd_pcm snd_timer snd i2c_i801 iTCO_wdt psmouse soundcore snd_page_alloc
evdev uhci_hcd ehci_hcd thermal
[  101.621552]
[  101.621567] Pid: 5129, comm: openl2tpd Not tainted 2.6.39-rc4-Quad #3
Gigabyte Technology Co., Ltd. G33-DS3R/G33-DS3R
[  101.621637] RIP: 0010:[<0000000000000000>]  [<          (null)>]   (null)
[  101.621684] RSP: 0018:ffff88003ddeba60  EFLAGS: 00010202
[  101.621716] RAX: ffff88003ddb5600 RBX: ffff88003ddb5600 RCX:
0000000000000020
[  101.621758] RDX: ffffffff81a69a00 RSI: ffffffff81b7ee61 RDI:
ffff88003ddb5600
[  101.621800] RBP: ffff8800537cd900 R08: 0000000000000000 R09:
ffff88003ddb5600
[  101.621840] R10: 0000000000000005 R11: 0000000000014b38 R12:
ffff88003ddb5600
[  101.621881] R13: ffffffff81b7e480 R14: ffffffff81b7e8b8 R15:
ffff88003ddebad8
[  101.621924] FS:  00007f06e4182700(0000) GS:ffff88007fd00000(0000)
knlGS:0000000000000000
[  101.621971] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  101.622005] CR2: 0000000000000000 CR3: 0000000045274000 CR4:
00000000000006e0
[  101.622046] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  101.622087] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[  101.622129] Process openl2tpd (pid: 5129, threadinfo
ffff88003ddea000, task ffff88003de9a280)
[  101.622177] Stack:
[  101.622191]  ffffffff81447efa ffff88007d3ded80 ffff88003de9a280
ffff88007d3ded80
[  101.622245]  0000000000000001 ffff88003ddebbb8 ffffffff8148d5a7
0000000000000212
[  101.622299]  ffff88003dcea000 ffff88003dcea188 ffffffff00000001
ffffffff81b7e480
[  101.622353] Call Trace:
[  101.622374]  [<ffffffff81447efa>] ? ipv4_blackhole_route+0x1ba/0x210
[  101.622415]  [<ffffffff8148d5a7>] ? xfrm_lookup+0x417/0x510
[  101.622450]  [<ffffffff8127672a>] ? extract_buf+0x9a/0x140
[  101.622485]  [<ffffffff8144c6a0>] ? __ip_flush_pending_frames+0x70/0x70
[  101.622526]  [<ffffffff8146fbbf>] ? udp_sendmsg+0x62f/0x810
[  101.622562]  [<ffffffff813f98a6>] ? sock_sendmsg+0x116/0x130
[  101.622599]  [<ffffffff8109df58>] ? find_get_page+0x18/0x90
[  101.622633]  [<ffffffff8109fd6a>] ? filemap_fault+0x12a/0x4b0
[  101.622668]  [<ffffffff813fb5c4>] ? move_addr_to_kernel+0x64/0x90
[  101.622706]  [<ffffffff81405d5a>] ? verify_iovec+0x7a/0xf0
[  101.622739]  [<ffffffff813fc772>] ? sys_sendmsg+0x292/0x420
[  101.622774]  [<ffffffff810b994a>] ? handle_pte_fault+0x8a/0x7c0
[  101.622810]  [<ffffffff810b76fe>] ? __pte_alloc+0xae/0x130
[  101.622844]  [<ffffffff810ba2f8>] ? handle_mm_fault+0x138/0x380
[  101.622880]  [<ffffffff81024af9>] ? do_page_fault+0x189/0x410
[  101.622915]  [<ffffffff813fbe03>] ? sys_getsockname+0xf3/0x110
[  101.622952]  [<ffffffff81450c4d>] ? ip_setsockopt+0x4d/0xa0
[  101.622986]  [<ffffffff813f9932>] ? sockfd_lookup_light+0x22/0x90
[  101.623024]  [<ffffffff814b61fb>] ? system_call_fastpath+0x16/0x1b
[  101.623060] Code:  Bad RIP value.
[  101.623090] RIP  [<          (null)>]           (null)
[  101.623125]  RSP <ffff88003ddeba60>
[  101.623146] CR2: 0000000000000000
[  101.650871] ---[ end trace ca3856a7d8e8dad4 ]---
[  101.651011] __sk_free: optmem leakage (160 bytes) detected.

The oops happens in dst_metrics_write_ptr()
include/net/dst.h:124: return dst->ops->cow_metrics(dst, p);

dst->ops->cow_metrics is NULL and causes the oops.

Provide cow_metrics() methods, like we did in commit 214f45c9
(net: provide default_advmss() methods to blackhole dst_ops)
Signed-off-by: NHeld Bernhard <berny156@gmx.de>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0972ddb2

23 4月, 2011 1 次提交

inet: constify ip headers and in6_addr · b71d1d42

由 Eric Dumazet 提交于 4月 22, 2011

Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers
where possible, to make code intention more obvious.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b71d1d42

18 4月, 2011 1 次提交

bonding, ipv4, ipv6, vlan: Handle NETDEV_BONDING_FAILOVER like NETDEV_NOTIFY_PEERS · 7c899432

由 Ben Hutchings 提交于 4月 15, 2011

It is undesirable for the bonding driver to be poking into higher
level protocols, and notifiers provide a way to avoid that.  This does
mean removing the ability to configure reptitition of gratuitous ARPs
and unsolicited NAs.
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7c899432

15 4月, 2011 2 次提交

ip: ip_options_compile() resilient to NULL skb route · c65353da

由 Eric Dumazet 提交于 4月 14, 2011

Scot Doyle demonstrated ip_options_compile() could be called with an skb
without an attached route, using a setup involving a bridge, netfilter,
and forged IP packets.

Let's make ip_options_compile() and ip_options_rcv_srr() a bit more
robust, instead of changing bridge/netfilter code.

With help from Hiroaki SHIMODA.
Reported-by: NScot Doyle <lkml@scotdoyle.com>
Tested-by: NScot Doyle <lkml@scotdoyle.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: NHiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c65353da

ipv4: Call fib_select_default() only when actually necessary. · 21d8c49e

由 David S. Miller 提交于 4月 14, 2011

fib_select_default() is a complete NOP, and completely pointless
to invoke, when we have no more than 1 default route installed.

And this is far and away the common case.

So remember how many prefixlen==0 routes we have in the routing
table, and elide the call when we have no more than one of those.

This cuts output route creation time by 157 cycles on Niagara2+.

In order to add the new int to fib_table, we have to correct the type
of ->tb_data[] to unsigned long, otherwise the private area will be
unaligned on 64-bit systems.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Reviewed-by: NEric Dumazet <eric.dumazet@gmail.com>

21d8c49e

14 4月, 2011 1 次提交

Revert "tcp: disallow bind() to reuse addr/port" · 3e8c806a

由 David S. Miller 提交于 4月 13, 2011

This reverts commit c191a836.

It causes known regressions for programs that expect to be able to use
SO_REUSEADDR to shutdown a socket, then successfully rebind another
socket to the same ID.

Programs such as haproxy and amavisd expect this to work.

This should fix kernel bugzilla 32832.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3e8c806a

13 4月, 2011 2 次提交

net: Do not wrap sysctl igmp_max_memberships in IP_MULTICAST · 192910a6

由 Joakim Tjernlund 提交于 4月 12, 2011

controlling igmp_max_membership is useful even when IP_MULTICAST
is off.
Quagga(an OSPF deamon) uses multicast addresses for all interfaces
using a single socket and hits igmp_max_membership limit when
there are 20 interfaces or more.
Always export sysctl igmp_max_memberships in proc, just like
igmp_max_msf
Signed-off-by: NJoakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

192910a6

inetpeer: reduce stack usage · 66944e1c

由 Eric Dumazet 提交于 4月 11, 2011

On 64bit arches, we use 752 bytes of stack when cleanup_once() is called
from inet_getpeer().

Lets share the avl stack to save ~376 bytes.

Before patch :

# objdump -d net/ipv4/inetpeer.o | scripts/checkstack.pl

0x000006c3 unlink_from_pool [inetpeer.o]:		376
0x00000721 unlink_from_pool [inetpeer.o]:		376
0x00000cb1 inet_getpeer [inetpeer.o]:			376
0x00000e6d inet_getpeer [inetpeer.o]:			376
0x0004 inet_initpeers [inetpeer.o]:			112
# size net/ipv4/inetpeer.o
   text	   data	    bss	    dec	    hex	filename
   5320	    432	     21	   5773	   168d	net/ipv4/inetpeer.o

After patch :

objdump -d net/ipv4/inetpeer.o | scripts/checkstack.pl
0x00000c11 inet_getpeer [inetpeer.o]:			376
0x00000dcd inet_getpeer [inetpeer.o]:			376
0x00000ab9 peer_check_expire [inetpeer.o]:		328
0x00000b7f peer_check_expire [inetpeer.o]:		328
0x0004 inet_initpeers [inetpeer.o]:			112
# size net/ipv4/inetpeer.o
   text	   data	    bss	    dec	    hex	filename
   5163	    432	     21	   5616	   15f0	net/ipv4/inetpeer.o
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Scot Doyle <lkml@scotdoyle.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Reviewed-by: NHiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66944e1c

11 4月, 2011 2 次提交

Disable rp_filter for IPsec packets · 990078af

由 Michael Smith 提交于 4月 07, 2011

The reverse path filter interferes with IPsec subnet-to-subnet tunnels,
especially when the link to the IPsec peer is on an interface other than
the one hosting the default route.

With dynamic routing, where the peer might be reachable through eth0
today and eth1 tomorrow, it's difficult to keep rp_filter enabled unless
fake routes to the remote subnets are configured on the interface
currently used to reach the peer.

IPsec provides a much stronger anti-spoofing policy than rp_filter, so
this patch disables the rp_filter for packets with a security path.
Signed-off-by: NMichael Smith <msmith@cbnco.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

990078af

fib_validate_source(): pass sk_buff instead of mark · 5c04c819

由 Michael Smith 提交于 4月 07, 2011

This makes sk_buff available for other use in fib_validate_source().
Signed-off-by: NMichael Smith <msmith@cbnco.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c04c819

08 4月, 2011 1 次提交

ipv4: Fix "Set rt->rt_iif more sanely on output routes." · 1b86a58f

由 OGAWA Hirofumi 提交于 4月 07, 2011

Commit 1018b5c0 ("Set rt->rt_iif more
sanely on output routes.")  breaks rt_is_{output,input}_route.

This became the cause to return "IP_PKTINFO's ->ipi_ifindex == 0".

To fix it, this does:

1) Add "int rt_route_iif;" to struct rtable

2) For input routes, always set rt_route_iif to same value as rt_iif

3) For output routes, always set rt_route_iif to zero.  Set rt_iif
   as it is done currently.

4) Change rt_is_{output,input}_route() to test rt_route_iif
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1b86a58f

05 4月, 2011 1 次提交

net: Allow no-cache copy from user on transmit · c6e1a0d1

由 Tom Herbert 提交于 4月 04, 2011

This patch uses __copy_from_user_nocache on transmit to bypass data
cache for a performance improvement.  skb_add_data_nocache and
skb_copy_to_page_nocache can be called by sendmsg functions to use
this feature, initial support is in tcp_sendmsg.  This functionality is
configurable per device using ethtool.

Presumably, this feature would only be useful when the driver does
not touch the data.  The feature is turned on by default if a device
indicates that it does some form of checksum offload; it is off by
default for devices that do no checksum offload or indicate no checksum
is necessary.  For the former case copy-checksum is probably done
anyway, in the latter case the device is likely loopback in which case
the no cache copy is probably not beneficial.

This patch was tested using 200 instances of netperf TCP_RR with
1400 byte request and one byte reply.  Platform is 16 core AMD x86.

No-cache copy disabled:
   672703 tps, 97.13% utilization
   50/90/99% latency:244.31 484.205 1028.41

No-cache copy enabled:
   702113 tps, 96.16% utilization,
   50/90/99% latency 238.56 467.56 956.955

Using 14000 byte request and response sizes demonstrate the
effects more dramatically:

No-cache copy disabled:
   79571 tps, 34.34 %utlization
   50/90/95% latency 1584.46 2319.59 5001.76

No-cache copy enabled:
   83856 tps, 34.81% utilization
   50/90/95% latency 2508.42 2622.62 2735.88

Note especially the effect on latency tail (95th percentile).

This seems to provide a nice performance improvement and is
consistent in the tests I ran.  Presumably, this would provide
the greatest benfits in the presence of an application workload
stressing the cache and a lot of transmit data happening.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c6e1a0d1

04 4月, 2011 3 次提交

netfilter: get rid of atomic ops in fast path · 7f5c6d4f

由 Eric Dumazet 提交于 4月 04, 2011

We currently use a percpu spinlock to 'protect' rule bytes/packets
counters, after various attempts to use RCU instead.

Lately we added a seqlock so that get_counters() can run without
blocking BH or 'writers'. But we really only need the seqcount in it.

Spinlock itself is only locked by the current/owner cpu, so we can
remove it completely.

This cleanups api, using correct 'writer' vs 'reader' semantic.

At replace time, the get_counters() call makes sure all cpus are done
using the old table.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

7f5c6d4f

netfilter: af_info: add 'strict' parameter to limit lookup to .oif · 0fae2e77

由 Florian Westphal 提交于 4月 04, 2011

ipv6 fib lookup can set RT6_LOOKUP_F_IFACE flag to restrict search
to an interface, but this flag cannot be set via struct flowi.

Also, it cannot be set via ip6_route_output: this function uses the
passed sock struct to determine if this flag is required
(by testing for nonzero sk_bound_dev_if).

Work around this by passing in an artificial struct sk in case
'strict' argument is true.

This is required to replace the rt6_lookup call in xt_addrtype.c with
nf_afinfo->route().
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

0fae2e77

netfilter: af_info: add network namespace parameter to route hook · 31ad3dd6

由 Florian Westphal 提交于 4月 04, 2011

This is required to eventually replace the rt6_lookup call in
xt_addrtype.c with nf_afinfo->route().
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

31ad3dd6

02 4月, 2011 1 次提交

tcp: len check is unnecessarily devastating, change to WARN_ON · 2fceec13

由 Ilpo Järvinen 提交于 4月 01, 2011

All callers are prepared for alloc failures anyway, so this error
can safely be boomeranged to the callers domain without super
bad consequences. ...At worst the connection might go into a state
where each RTO tries to (unsuccessfully) re-fragment with such
a mis-sized value and eventually dies.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2fceec13

31 3月, 2011 8 次提交

Fix common misspellings · 25985edc

由 Lucas De Marchi 提交于 3月 30, 2011

Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>

25985edc

D
ipv4: Use flowi4_init_output() in udp_sendmsg() · c0951cbc
由 David S. Miller 提交于 3月 31, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
c0951cbc
D
ipv4: Use flowi4_init_output() in cookie_v4_check() · 1bba6ffe
由 David S. Miller 提交于 3月 31, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
1bba6ffe
D
ipv4: Use flowi4_init_output() in raw_sendmsg() · ef164ae3
由 David S. Miller 提交于 3月 31, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
ef164ae3
D
ipv4: Use flowi4_init_output() in ip_send_reply() · 538de0e0
由 David S. Miller 提交于 3月 31, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
538de0e0
D
ipv4: Use flowi4_init_output() in inet_connection_sock.c · e79d9bc7
由 David S. Miller 提交于 3月 31, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
e79d9bc7

fib: add __rcu annotations · 0a5c0475

由 Eric Dumazet 提交于 3月 31, 2011

Add __rcu annotations and lockdep checks.

Add const qualifiers

node_parent() and node_parent_rcu() can use
rcu_dereference_index_check()
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a5c0475

fib: add rtnl locking in ip_fib_net_exit · e2666f84

由 Eric Dumazet 提交于 3月 30, 2011

Daniel J Blueman reported a lockdep splat in trie_firstleaf(), caused by
RTNL being not locked before a call to fib_table_flush()
Reported-by: NDaniel J Blueman <daniel.blueman@gmail.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e2666f84

30 3月, 2011 1 次提交

net: gre: provide multicast mappings for ipv4 and ipv6 · 93ca3bb5

由 Timo Teräs 提交于 3月 28, 2011

My commit 6d55cb91 (gre: fix hard header destination
address checking) broke multicast.

The reason is that ip_gre used to get ipgre_header() calls with
zero destination if we have NOARP or multicast destination. Instead
the actual target was decided at ipgre_tunnel_xmit() time based on
per-protocol dissection.

Instead of allowing the "abuse" of ->header() calls with invalid
destination, this creates multicast mappings for ip_gre. This also
fixes "ip neigh show nud noarp" to display the proper multicast
mappings used by the gre device.
Reported-by: NDoug Kehn <rdkehn@yahoo.com>
Signed-off-by: NTimo Teräs <timo.teras@iki.fi>
Acked-by: NDoug Kehn <rdkehn@yahoo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93ca3bb5

29 3月, 2011 1 次提交
- D
  ipv4: Don't ip_rt_put() an error pointer in RAW sockets. · 4910ac6c
  由 David S. Miller 提交于 3月 28, 2011
```
Reported-by: NMarc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  4910ac6c
28 3月, 2011 1 次提交

ipv4: Fix IP timestamp option (IPOPT_TS_PRESPEC) handling in ip_options_echo() · 8628bd8a

由 Jan Luebbe 提交于 3月 24, 2011

The current handling of echoed IP timestamp options with prespecified
addresses is rather broken since the 2.2.x kernels. As far as i understand
it, it should behave like when originating packets.

Currently it will only timestamp the next free slot if:
 - there is space for *two* timestamps
 - some random data from the echoed packet taken as an IP is *not* a local IP

This first is caused by an off-by-one error. 'soffset' points to the next
free slot and so we only need to have 'soffset + 7 <= optlen'.

The second bug is using sptr as the start of the option, when it really is
set to 'skb_network_header(skb)'. I just use dptr instead which points to
the timestamp option.

Finally it would only timestamp for non-local IPs, which we shouldn't do.
So instead we exclude all unicast destinations, similar to what we do in
ip_options_compile().
Signed-off-by: NJan Luebbe <jluebbe@debian.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8628bd8a

26 3月, 2011 1 次提交

ipv4: do not ignore route errors · 1fbc7843

由 Julian Anastasov 提交于 3月 25, 2011

	The "ipv4: Inline fib_semantic_match into check_leaf"
change forgets to return the route errors. check_leaf should
return the same results as fib_table_lookup.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1fbc7843

25 3月, 2011 3 次提交

ipv4: Fix nexthop caching wrt. scoping. · 37e826c5

由 David S. Miller 提交于 3月 24, 2011

Move the scope value out of the fib alias entries and into fib_info,
so that we always use the correct scope when recomputing the nexthop
cached source address.
Reported-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

37e826c5

ipv4: Invalidate nexthop cache nh_saddr more correctly. · 436c3b66

由 David S. Miller 提交于 3月 24, 2011

Any operation that:

1) Brings up an interface
2) Adds an IP address to an interface
3) Deletes an IP address from an interface

can potentially invalidate the nh_saddr value, requiring
it to be recomputed.

Perform the recomputation lazily using a generation ID.
Reported-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

436c3b66

ipv4: fix fib metrics · fcd13f42

由 Eric Dumazet 提交于 3月 24, 2011

Alessandro Suardi reported that we could not change route metrics :

ip ro change default .... advmss 1400

This regression came with commit 9c150e82 (Allocate fib metrics
dynamically). fib_metrics is no longer an array, but a pointer to an
array.
Reported-by: NAlessandro Suardi <alessandro.suardi@gmail.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Tested-by: NAlessandro Suardi <alessandro.suardi@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fcd13f42

24 3月, 2011 2 次提交

ipv4: fix ip_rt_update_pmtu() · eb49a973

由 Eric Dumazet 提交于 3月 23, 2011

commit 2c8cec5c (Cache learned PMTU information in inetpeer) added
an extra inet_putpeer() call in ip_rt_update_pmtu().

This results in various problems, since we can free one inetpeer, while
it is still in use.

Ref: http://www.spinics.net/lists/netdev/msg159121.htmlReported-by: NAlexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eb49a973

ipv4: Fallback to FIB local table in __ip_dev_find(). · 406b6f97

由 David S. Miller 提交于 3月 22, 2011

In commit 9435eb1c
("ipv4: Implement __ip_dev_find using new interface address hash.")
we reimplemented __ip_dev_find() so that it doesn't have to
do a full FIB table lookup.

Instead, it consults a hash table of addresses configured to
interfaces.

This works identically to the old code in all except one case,
and that is for loopback subnets.

The old code would match the loopback device for any IP address
that falls within a subnet configured to the loopback device.

Handle this corner case by doing the FIB lookup.

We could implement this via inet_addr_onlink() but:

1) Someone could configure many addresses to loopback and
   inet_addr_onlink() is a simple list traversal.

2) We know the old code works.
Reported-by: NJulian Anastasov <ja@ssi.bg>
Acked-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

406b6f97

23 3月, 2011 2 次提交

D
tcp: Make undo_ssthresh arg to tcp_undo_cwr() a bool. · f6152737
由 David S. Miller 提交于 3月 22, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
f6152737

tcp: avoid cwnd moderation in undo · 67d4120a

由 Yuchung Cheng 提交于 3月 14, 2011

In the current undo logic, cwnd is moderated after it was restored
to the value prior entering fast-recovery. It was moderated first
in tcp_try_undo_recovery then again in tcp_complete_cwr.

Since the undo indicates recovery was false, these moderations
are not necessary. If the undo is triggered when most of the
outstanding data have been acknowledged, the (restored) cwnd is
falsely pulled down to a small value.

This patch removes these cwnd moderations if cwnd is undone
  a) during fast-recovery
	b) by receiving DSACKs past fast-recovery
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67d4120a

22 3月, 2011 2 次提交

ipv4: optimize route adding on secondary promotion · 04024b93

由 Julian Anastasov 提交于 3月 19, 2011

Optimize the calling of fib_add_ifaddr for all
secondary addresses after the promoted one to start from
their place, not from the new place of the promoted
secondary. It will save some CPU cycles because we
are sure the promoted secondary was first for the subnet
and all next secondaries do not change their place.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04024b93

ipv4: remove the routes on secondary promotion · 2d230e2b

由 Julian Anastasov 提交于 3月 19, 2011

The secondary address promotion relies on fib_sync_down_addr
to remove all routes created for the secondary addresses when
the old primary address is deleted. It does not happen for cases
when the primary address is also in another subnet. Fix that
by deleting local and broadcast routes for all secondaries while
they are on device list and by faking that all addresses from
this subnet are to be deleted. It relies on fib_del_ifaddr being
able to ignore the IPs from the concerned subnet while checking
for duplication.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d230e2b

_Walt / cloud-kernel 与 Fork 源项目一致

_Walt / cloud-kernel
与 Fork 源项目一致