提交 · 04746ff1289f75af26af279eb4b0b3e231677ee4 · openanolis / cloud-kernel

14 9月, 2010 2 次提交

ipv4: enable getsockopt() for IP_NODEFRAG · a89b4763

由 Michael Kerrisk 提交于 9月 10, 2010

While integrating your man-pages patch for IP_NODEFRAG, I noticed
that this option is settable by setsockopt(), but not gettable by
getsockopt(). I suppose this is not intended. The (untested,
trivial) patch below adds getsockopt() support.
Signed-off-by: NMichael kerrisk <mtk.manpages@gmail.com>
Acked-by: NJiri Olsa <jolsa@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a89b4763

ipv4: force_igmp_version ignored when a IGMPv3 query received · 79981563

由 Bob Arendt 提交于 9月 13, 2010

After all these years, it turns out that the
    /proc/sys/net/ipv4/conf/*/force_igmp_version
parameter isn't fully implemented.

*Symptom*:
When set force_igmp_version to a value of 2, the kernel should only perform
multicast IGMPv2 operations (IETF rfc2236).  An host-initiated Join message
will be sent as a IGMPv2 Join message.  But if a IGMPv3 query message is
received, the host responds with a IGMPv3 join message.  Per rfc3376 and
rfc2236, a IGMPv2 host should treat a IGMPv3 query as a IGMPv2 query and
respond with an IGMPv2 Join message.

*Consequences*:
This is an issue when a IGMPv3 capable switch is the querier and will only
issue IGMPv3 queries (which double as IGMPv2 querys) and there's an
intermediate switch that is only IGMPv2 capable.  The intermediate switch
processes the initial v2 Join, but fails to recognize the IGMPv3 Join responses
to the Query, resulting in a dropped connection when the intermediate v2-only
switch times it out.

*Identifying issue in the kernel source*:
The issue is in this section of code (in net/ipv4/igmp.c), which is called when
an IGMP query is received  (from mainline 2.6.36-rc3 gitweb):
 ...
A IGMPv3 query has a length >= 12 and no sources.  This routine will exit after
line 880, setting the general query timer (random timeout between 0 and query
response time).  This calls igmp_gq_timer_expire():
...
.. which only sends a v3 response.  So if a v3 query is received, the kernel
always sends a v3 response.

IGMP queries happen once every 60 sec (per vlan), so the traffic is low.  A
IGMPv3 query *is* a strict superset of a IGMPv2 query, so this patch properly
short circuit's the v3 behaviour.

One issue is that this does not address force_igmp_version=1.  Then again, I've
never seen any IGMPv1 multicast equipment in the wild.  However there is a lot
of v2-only equipment. If it's necessary to support the IGMPv1 case as well:

837         if (len == 8 || IGMP_V2_SEEN(in_dev) || IGMP_V1_SEEN(in_dev)) {
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79981563

09 9月, 2010 3 次提交

udp: add rehash on connect() · 719f8358

由 Eric Dumazet 提交于 9月 08, 2010

commit 30fff923 introduced in linux-2.6.33 (udp: bind() optimisation)
added a secondary hash on UDP, hashed on (local addr, local port).

Problem is that following sequence :

fd = socket(...)
connect(fd, &remote, ...)

not only selects remote end point (address and port), but also sets
local address, while UDP stack stored in secondary hash table the socket
while its local address was INADDR_ANY (or ipv6 equivalent)

Sequence is :
 - autobind() : choose a random local port, insert socket in hash tables
              [while local address is INADDR_ANY]
 - connect() : set remote address and port, change local address to IP
              given by a route lookup.

When an incoming UDP frame comes, if more than 10 sockets are found in
primary hash table, we switch to secondary table, and fail to find
socket because its local address changed.

One solution to this problem is to rehash datagram socket if needed.

We add a new rehash(struct socket *) method in "struct proto", and
implement this method for UDP v4 & v6, using a common helper.

This rehashing only takes care of secondary hash table, since primary
hash (based on local port only) is not changed.
Reported-by: NKrzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Tested-by: NKrzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

719f8358

net: blackhole route should always be recalculated · ae2688d5

由 Jianzhao Wang 提交于 9月 08, 2010

Blackhole routes are used when xfrm_lookup() returns -EREMOTE (error
triggered by IKE for example), hence this kind of route is always
temporary and so we should check if a better route exists for next
packets.
Bug has been introduced by commit d11a4dc1.
Signed-off-by: NJianzhao Wang <jianzhao.wang@6wind.com>
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae2688d5

ipv4: Suppress lockdep-RCU false positive in FIB trie (3) · f6b085b6

由 Jarek Poplawski 提交于 9月 07, 2010

Hi,
Here is one more of these warnings and a patch below:

Sep  5 23:52:33 del kernel: [46044.244833] ===================================================
Sep  5 23:52:33 del kernel: [46044.269681] [ INFO: suspicious rcu_dereference_check() usage. ]
Sep  5 23:52:33 del kernel: [46044.277000] ---------------------------------------------------
Sep  5 23:52:33 del kernel: [46044.285185] net/ipv4/fib_trie.c:1756 invoked rcu_dereference_check() without protection!
Sep  5 23:52:33 del kernel: [46044.293627]
Sep  5 23:52:33 del kernel: [46044.293632] other info that might help us debug this:
Sep  5 23:52:33 del kernel: [46044.293634]
Sep  5 23:52:33 del kernel: [46044.325333]
Sep  5 23:52:33 del kernel: [46044.325335] rcu_scheduler_active = 1, debug_locks = 0
Sep  5 23:52:33 del kernel: [46044.348013] 1 lock held by pppd/1717:
Sep  5 23:52:33 del kernel: [46044.357548]  #0:  (rtnl_mutex){+.+.+.}, at: [<c125dc1f>] rtnl_lock+0xf/0x20
Sep  5 23:52:33 del kernel: [46044.367647]
Sep  5 23:52:33 del kernel: [46044.367652] stack backtrace:
Sep  5 23:52:33 del kernel: [46044.387429] Pid: 1717, comm: pppd Not tainted 2.6.35.4.4a #3
Sep  5 23:52:33 del kernel: [46044.398764] Call Trace:
Sep  5 23:52:33 del kernel: [46044.409596]  [<c12f9aba>] ? printk+0x18/0x1e
Sep  5 23:52:33 del kernel: [46044.420761]  [<c1053969>] lockdep_rcu_dereference+0xa9/0xb0
Sep  5 23:52:33 del kernel: [46044.432229]  [<c12b7235>] trie_firstleaf+0x65/0x70
Sep  5 23:52:33 del kernel: [46044.443941]  [<c12b74d4>] fib_table_flush+0x14/0x170
Sep  5 23:52:33 del kernel: [46044.455823]  [<c1033e92>] ? local_bh_enable_ip+0x62/0xd0
Sep  5 23:52:33 del kernel: [46044.467995]  [<c12fc39f>] ? _raw_spin_unlock_bh+0x2f/0x40
Sep  5 23:52:33 del kernel: [46044.480404]  [<c12b24d0>] ? fib_sync_down_dev+0x120/0x180
Sep  5 23:52:33 del kernel: [46044.493025]  [<c12b069d>] fib_flush+0x2d/0x60
Sep  5 23:52:33 del kernel: [46044.505796]  [<c12b06f5>] fib_disable_ip+0x25/0x50
Sep  5 23:52:33 del kernel: [46044.518772]  [<c12b10d3>] fib_netdev_event+0x73/0xd0
Sep  5 23:52:33 del kernel: [46044.531918]  [<c1048dfd>] notifier_call_chain+0x2d/0x70
Sep  5 23:52:33 del kernel: [46044.545358]  [<c1048f0a>] raw_notifier_call_chain+0x1a/0x20
Sep  5 23:52:33 del kernel: [46044.559092]  [<c124f687>] call_netdevice_notifiers+0x27/0x60
Sep  5 23:52:33 del kernel: [46044.573037]  [<c124faec>] __dev_notify_flags+0x5c/0x80
Sep  5 23:52:33 del kernel: [46044.586489]  [<c124fb47>] dev_change_flags+0x37/0x60
Sep  5 23:52:33 del kernel: [46044.599394]  [<c12a8a8d>] devinet_ioctl+0x54d/0x630
Sep  5 23:52:33 del kernel: [46044.612277]  [<c12aabb7>] inet_ioctl+0x97/0xc0
Sep  5 23:52:34 del kernel: [46044.625208]  [<c123f6af>] sock_ioctl+0x6f/0x270
Sep  5 23:52:34 del kernel: [46044.638046]  [<c109d2b0>] ? handle_mm_fault+0x420/0x6c0
Sep  5 23:52:34 del kernel: [46044.650968]  [<c123f640>] ? sock_ioctl+0x0/0x270
Sep  5 23:52:34 del kernel: [46044.663865]  [<c10c3188>] vfs_ioctl+0x28/0xa0
Sep  5 23:52:34 del kernel: [46044.676556]  [<c10c38fa>] do_vfs_ioctl+0x6a/0x5c0
Sep  5 23:52:34 del kernel: [46044.688989]  [<c1048676>] ? up_read+0x16/0x30
Sep  5 23:52:34 del kernel: [46044.701411]  [<c1021376>] ? do_page_fault+0x1d6/0x3a0
Sep  5 23:52:34 del kernel: [46044.714223]  [<c10b6588>] ? fget_light+0xf8/0x2f0
Sep  5 23:52:34 del kernel: [46044.726601]  [<c1241f98>] ? sys_socketcall+0x208/0x2c0
Sep  5 23:52:34 del kernel: [46044.739140]  [<c10c3eb3>] sys_ioctl+0x63/0x70
Sep  5 23:52:34 del kernel: [46044.751967]  [<c12fca3d>] syscall_call+0x7/0xb
Sep  5 23:52:34 del kernel: [46044.764734]  [<c12f0000>] ? cookie_v6_check+0x3d0/0x630

-------------->

This patch fixes the warning:
 ===================================================
 [ INFO: suspicious rcu_dereference_check() usage. ]
 ---------------------------------------------------
 net/ipv4/fib_trie.c:1756 invoked rcu_dereference_check() without protection!

 other info that might help us debug this:

 rcu_scheduler_active = 1, debug_locks = 0
 1 lock held by pppd/1717:
  #0:  (rtnl_mutex){+.+.+.}, at: [<c125dc1f>] rtnl_lock+0xf/0x20

 stack backtrace:
 Pid: 1717, comm: pppd Not tainted 2.6.35.4a #3
 Call Trace:
  [<c12f9aba>] ? printk+0x18/0x1e
  [<c1053969>] lockdep_rcu_dereference+0xa9/0xb0
  [<c12b7235>] trie_firstleaf+0x65/0x70
  [<c12b74d4>] fib_table_flush+0x14/0x170
  ...

Allow trie_firstleaf() to be called either under rcu_read_lock()
protection or with RTNL held. The same annotation is added to
node_parent_rcu() to prevent a similar warning a bit later.

Followup of commits 634a4b20 and 4eaa0e3c.
Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6b085b6

08 9月, 2010 1 次提交

ipv4: Fix reverse path filtering with multipath routing. · 6f86b325

由 David S. Miller 提交于 9月 06, 2010

Actually iterate over the next-hops to make sure we have
a device match.  Otherwise RP filtering is always elided
when the route matched has multiple next-hops.
Reported-by: NIgor M Podlesny <for.poige@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f86b325

02 9月, 2010 1 次提交

ipv4: minor fix about RPF in help of Kconfig · 750e9fad

由 Nicolas Dichtel 提交于 8月 31, 2010

Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

750e9fad

28 8月, 2010 1 次提交

net/ipv4: Eliminate kstrdup memory leak · c34186ed

由 Julia Lawall 提交于 8月 27, 2010

The string clone is only used as a temporary copy of the argument val
within the while loop, and so it should be freed before leaving the
function.  The call to strsep, however, modifies clone, so a pointer to the
front of the string is kept in saved_clone, to make it possible to free it.

The sematic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@r exists@
local idexpression x;
expression E;
identifier l;
statement S;
@@

*x= \(kasprintf\|kstrdup\)(...);
...
if (x == NULL) S
... when != kfree(x)
    when != E = x
if (...) {
  <... when != kfree(x)
* goto l;
  ...>
* return ...;
}
// </smpl>
Signed-off-by: NJulia Lawall <julia@diku.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c34186ed

26 8月, 2010 2 次提交

tcp: select(writefds) don't hang up when a peer close connection · d84ba638

由 KOSAKI Motohiro 提交于 8月 24, 2010

This issue come from ruby language community. Below test program
hang up when only run on Linux.

	% uname -mrsv
	Linux 2.6.26-2-486 #1 Sat Dec 26 08:37:39 UTC 2009 i686
	% ruby -rsocket -ve '
	BasicSocket.do_not_reverse_lookup = true
	serv = TCPServer.open("127.0.0.1", 0)
	s1 = TCPSocket.open("127.0.0.1", serv.addr[1])
	s2 = serv.accept
	s2.close
	s1.write("a") rescue p $!
	s1.write("a") rescue p $!
	Thread.new {
	  s1.write("a")
	}.join'
	ruby 1.9.3dev (2010-07-06 trunk 28554) [i686-linux]
	#<Errno::EPIPE: Broken pipe>
	[Hang Here]

FreeBSD, Solaris, Mac doesn't. because Ruby's write() method call
select() internally. and tcp_poll has a bug.

SUS defined 'ready for writing' of select() as following.

|  A descriptor shall be considered ready for writing when a call to an output
|  function with O_NONBLOCK clear would not block, whether or not the function
|  would transfer data successfully.

That said, EPIPE situation is clearly one of 'ready for writing'.

We don't have read-side issue because tcp_poll() already has read side
shutdown care.

|        if (sk->sk_shutdown & RCV_SHUTDOWN)
|                mask |= POLLIN | POLLRDNORM | POLLRDHUP;

So, Let's insert same logic in write side.

- reference url
  http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/31065
  http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/31068Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d84ba638

tcp: fix three tcp sysctls tuning · c5ed63d6

由 Eric Dumazet 提交于 8月 25, 2010

As discovered by Anton Blanchard, current code to autotune 
tcp_death_row.sysctl_max_tw_buckets, sysctl_tcp_max_orphans and
sysctl_max_syn_backlog makes little sense.

The bigger a page is, the less tcp_max_orphans is : 4096 on a 512GB
machine in Anton's case.

(tcp_hashinfo.bhash_size * sizeof(struct inet_bind_hashbucket))
is much bigger if spinlock debugging is on. Its wrong to select bigger
limits in this case (where kernel structures are also bigger)

bhash_size max is 65536, and we get this value even for small machines. 

A better ground is to use size of ehash table, this also makes code
shorter and more obvious.

Based on a patch from Anton, and another from David.
Reported-and-tested-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c5ed63d6

25 8月, 2010 1 次提交

tcp: Combat per-cpu skew in orphan tests. · ad1af0fe

由 David S. Miller 提交于 8月 25, 2010

As reported by Anton Blanchard when we use
percpu_counter_read_positive() to make our orphan socket limit checks,
the check can be off by up to num_cpus_online() * batch (which is 32
by default) which on a 128 cpu machine can be as large as the default
orphan limit itself.

Fix this by doing the full expensive sum check if the optimized check
triggers.
Reported-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>

ad1af0fe

24 8月, 2010 1 次提交

netfilter: fix CONFIG_COMPAT support · cca77b7c

由 Florian Westphal 提交于 8月 23, 2010

commit f3c5c1bf
(netfilter: xtables: make ip_tables reentrant) forgot to
also compute the jumpstack size in the compat handlers.

Result is that "iptables -I INPUT -j userchain" turns into -j DROP.

Reported by Sebastian Roesner on #netfilter, closes
http://bugzilla.netfilter.org/show_bug.cgi?id=669.

Note: arptables change is compile-tested only.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Tested-by: NMikael Pettersson <mikpe@it.uu.se>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cca77b7c

18 8月, 2010 1 次提交

netfilter: {ip,ip6,arp}_tables: avoid lockdep false positive · 001389b9

由 Eric Dumazet 提交于 8月 16, 2010

After commit 24b36f01 (netfilter: {ip,ip6,arp}_tables: dont block
bottom half more than necessary), lockdep can raise a warning
because we attempt to lock a spinlock with BH enabled, while
the same lock is usually locked by another cpu in a softirq context.

Disable again BH to avoid these lockdep warnings.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Diagnosed-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

001389b9

08 8月, 2010 1 次提交

tcp: no md5sig option size check bug · ba78e2dd

由 Dmitry Popov 提交于 8月 07, 2010

tcp_parse_md5sig_option doesn't check md5sig option (TCPOPT_MD5SIG)
length, but tcp_v[46]_inbound_md5_hash assume that it's at least 16
bytes long.
Signed-off-by: NDmitry Popov <dp@highloadlab.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ba78e2dd

03 8月, 2010 2 次提交

ip_fragment: fix subtracting PPPOE_SES_HLEN from mtu twice · c893b806

由 Changli Gao 提交于 7月 31, 2010

6c79bf0f subtracts PPPOE_SES_HLEN from mtu at
the front of ip_fragment(). So the later subtraction should be removed. The
MTU of 802.1q is also 1500, so MTU should not be changed.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NBart De Schuymer <bdschuym@pandora.bo>
----
 net/ipv4/ip_output.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)
Signed-off-by: NBart De Schuymer <bdschuym@pandora.bo>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c893b806

net: Add getsockopt support for TCP thin-streams · 3c0fef0b

由 Josh Hunt 提交于 7月 30, 2010

Initial TCP thin-stream commit did not add getsockopt support for the new
socket options: TCP_THIN_LINEAR_TIMEOUTS and TCP_THIN_DUPACK. This adds support
for them.
Signed-off-by: NJosh Hunt <johunt@akamai.com>
Tested-by: NAndreas Petlund <apetlund@simula.no>
Acked-by: NAndreas Petlund <apetlund@simula.no>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c0fef0b

02 8月, 2010 4 次提交

netfilter: nf_nat: don't check if the tuple is unique when there isn't any other choice · 2452a99d

由 Changli Gao 提交于 8月 02, 2010

The tuple got from unique_tuple() doesn't need to be really unique, so the
check for the unique tuple isn't necessary, when there isn't any other
choice. Eliminating the unnecessary nf_nat_used_tuple() can save some CPU
cycles too.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

2452a99d

netfilter: nf_nat: make unique_tuple return void · f43dc98b

由 Changli Gao 提交于 8月 02, 2010

The only user of unique_tuple() get_unique_tuple() doesn't care about the
return value of unique_tuple(), so make unique_tuple() return void (nothing).
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

f43dc98b

netfilter: nf_nat: use local variable hdrlen · 794dbc1d

由 Changli Gao 提交于 8月 02, 2010

Use local variable hdrlen instead of ip_hdrlen(skb).
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

794dbc1d

netfilter: {ip,ip6,arp}_tables: dont block bottom half more than necessary · 24b36f01

由 Eric Dumazet 提交于 8月 02, 2010

We currently disable BH for the whole duration of get_counters()

On machines with a lot of cpus and large tables, this might be too long.

We can disable preemption during the whole function, and disable BH only
while fetching counters for the current cpu.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

24b36f01

31 7月, 2010 1 次提交

tcp: cookie transactions setsockopt memory leak · a3bdb549

由 Dmitry Popov 提交于 7月 29, 2010

There is a bug in do_tcp_setsockopt(net/ipv4/tcp.c),
TCP_COOKIE_TRANSACTIONS case.
In some cases (when tp->cookie_values == NULL) new tcp_cookie_values
structure can be allocated (at cvp), but not bound to
tp->cookie_values. So a memory leak occurs.
Signed-off-by: NDmitry Popov <dp@highloadlab.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a3bdb549

23 7月, 2010 4 次提交

netfilter: iptables: use skb->len for accounting · 7df0884c

由 Changli Gao 提交于 7月 23, 2010

Use skb->len for accounting as xt_quota does.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

7df0884c

netfilter: arptables: use arp_hdr_len() · f667009e

由 Changli Gao 提交于 7月 23, 2010

use arp_hdr_len().
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

f667009e

netfilter: nf_nat_core: merge the same lines · c36952e5

由 Changli Gao 提交于 7月 23, 2010

proto->unique_tuple() will be called finally, if the previous calls fail. This
patch checks the false condition of (range->flags &IP_NAT_RANGE_PROTO_RANDOM)
instead to avoid duplicate line of code: proto->unique_tuple().
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

c36952e5

net: RTA_MARK addition · 963bfeee

由 Eric Dumazet 提交于 7月 20, 2010

Add a new rt attribute, RTA_MARK, and use it in
rt_fill_info()/inet_rtm_getroute() to support following commands :

ip route get 192.168.20.110 mark NUMBER
ip route get 192.168.20.108 from 192.168.20.110 iif eth1 mark NUMBER
ip route list cache [192.168.20.110] mark NUMBER
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

963bfeee

22 7月, 2010 1 次提交

net: remove last uses of __attribute__((packed)) · 3f30fc15

由 Gustavo F. Padovan 提交于 7月 21, 2010

Network code uses the __packed macro instead of __attribute__((packed)).
Signed-off-by: NGustavo F. Padovan <padovan@profusion.mobi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f30fc15

20 7月, 2010 1 次提交

tcp: fix crash in tcp_xmit_retransmit_queue · 45e77d31

由 Ilpo Järvinen 提交于 7月 19, 2010

It can happen that there are no packets in queue while calling
tcp_xmit_retransmit_queue(). tcp_write_queue_head() then returns
NULL and that gets deref'ed to get sacked into a local var.

There is no work to do if no packets are outstanding so we just
exit early.

This oops was introduced by 08ebd172 (tcp: remove tp->lost_out
guard to make joining diff nicer).
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Reported-by: NLennart Schulte <lennart.schulte@nets.rwth-aachen.de>
Tested-by: NLennart Schulte <lennart.schulte@nets.rwth-aachen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

45e77d31

16 7月, 2010 1 次提交

ipmr: Don't leak memory if fib lookup fails. · e40dbc51

由 Ben Greear 提交于 7月 15, 2010

This was detected using two mcast router tables.  The
pimreg for the second interface did not have a specific
mrule, so packets received by it were handled by the
default table, which had nothing configured.

This caused the ipmr_fib_lookup to fail, causing
the memory leak.
Signed-off-by: NBen Greear <greearb@candelatech.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e40dbc51

15 7月, 2010 1 次提交

rfs: call sock_rps_record_flow() in tcp_splice_read() · 3a047bf8

由 Changli Gao 提交于 7月 12, 2010

rfs: call sock_rps_record_flow() in tcp_splice_read()

call sock_rps_record_flow() in tcp_splice_read(), so the applications using
splice(2) or sendfile(2) can utilize RFS.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
----
 net/ipv4/tcp.c |    1 +
 1 file changed, 1 insertion(+)
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3a047bf8

13 7月, 2010 2 次提交

inet, inet6: make tcp_sendmsg() and tcp_sendpage() through inet_sendmsg() and inet_sendpage() · 7ba42910

由 Changli Gao 提交于 7月 10, 2010

a new boolean flag no_autobind is added to structure proto to avoid the autobind
calls when the protocol is TCP. Then sock_rps_record_flow() is called int the
TCP's sendmsg() and sendpage() pathes.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
----
 include/net/inet_common.h |    4 ++++
 include/net/sock.h        |    1 +
 include/net/tcp.h         |    8 ++++----
 net/ipv4/af_inet.c        |   15 +++++++++------
 net/ipv4/tcp.c            |   11 +++++------
 net/ipv4/tcp_ipv4.c       |    3 +++
 net/ipv6/af_inet6.c       |    8 ++++----
 net/ipv6/tcp_ipv6.c       |    3 +++
 8 files changed, 33 insertions(+), 20 deletions(-)
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7ba42910

net/ipv4: EXPORT_SYMBOL cleanups · 4bc2f18b

由 Eric Dumazet 提交于 7月 09, 2010

CodingStyle cleanups

EXPORT_SYMBOL should immediately follow the symbol declaration.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4bc2f18b

09 7月, 2010 1 次提交

gre: propagate ipv6 transport class · dd4ba83d

由 Stephen Hemminger 提交于 7月 08, 2010

This patch makes IPV6 over IPv4 GRE tunnel propagate the transport
class field from the underlying IPV6 header to the IPV4 Type Of Service
field. Without the patch, all IPV6 packets in tunnel look the same to QoS.

This assumes that IPV6 transport class is exactly the same
as IPv4 TOS. Not sure if that is always the case?  Maybe need
to mask off some bits.

The mask and shift to get tclass is copied from ipv6/datagram.c
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dd4ba83d

08 7月, 2010 1 次提交

net/ipv4/ip_output.c: Removal of unused variable in ip_fragment() · 49085bd7

由 George Kadianakis 提交于 7月 06, 2010

Removal of unused integer variable in ip_fragment().
Signed-off-by: NGeorge Kadianakis <desnacked@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

49085bd7

06 7月, 2010 1 次提交

ipv4: use skb_dst_copy() in ip_copy_metadata() · fe76cda3

由 Eric Dumazet 提交于 7月 01, 2010

Avoid touching dst refcount in ip_fragment().
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fe76cda3

05 7月, 2010 3 次提交

netfilter: ipt_REJECT: avoid touching dst ref · b13b7125

由 Eric Dumazet 提交于 7月 05, 2010

We can avoid a pair of atomic ops in ipt_REJECT send_reset()
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

b13b7125

netfilter: ipt_REJECT: postpone the checksum calculation. · 98b0e84a

由 Changli Gao 提交于 7月 05, 2010

postpone the checksum calculation, then if the output NIC supports checksum
offloading, we can utlize it. And though the output NIC doesn't support
checksum offloading, but we'll mangle this packet, this can free us from
updating the checksum, as the checksum calculation occurs later.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

98b0e84a

xfrm: fix xfrm by MARK logic · 44b451f1

由 Peter Kosyh 提交于 7月 02, 2010

While using xfrm by MARK feature in
2.6.34 - 2.6.35 kernels, the mark
is always cleared in flowi structure via memset in
_decode_session4 (net/ipv4/xfrm4_policy.c), so
the policy lookup fails.
IPv6 code is affected by this bug too.
Signed-off-by: NPeter Kosyh <p.kosyh@gmail.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

44b451f1

01 7月, 2010 2 次提交

fragment: add fast path for in-order fragments · d6bebca9

由 Changli Gao 提交于 6月 29, 2010

add fast path for in-order fragments

As the fragments are sent in order in most of OSes, such as Windows, Darwin and
FreeBSD, it is likely the new fragments are at the end of the inet_frag_queue.
In the fast path, we check if the skb at the end of the inet_frag_queue is the
prev we expect.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
----
 include/net/inet_frag.h |    1 +
 net/ipv4/ip_fragment.c  |   12 ++++++++++++
 net/ipv6/reassembly.c   |   11 +++++++++++
 3 files changed, 24 insertions(+)
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d6bebca9

snmp: 64bit ipstats_mib for all arches · 4ce3c183

由 Eric Dumazet 提交于 6月 30, 2010

/proc/net/snmp and /proc/net/netstat expose SNMP counters.

Width of these counters is either 32 or 64 bits, depending on the size
of "unsigned long" in kernel.

This means user program parsing these files must already be prepared to
deal with 64bit values, regardless of user program being 32 or 64 bit.

This patch introduces 64bit snmp values for IPSTAT mib, where some
counters can wrap pretty fast if they are 32bit wide.

# netstat -s|egrep "InOctets|OutOctets"
    InOctets: 244068329096
    OutOctets: 244069348848
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4ce3c183

29 6月, 2010 1 次提交

tcp: tso_fragment() might avoid GFP_ATOMIC · c4ead4c5

由 Eric Dumazet 提交于 6月 24, 2010

We can pass a gfp argument to tso_fragment() and avoid GFP_ATOMIC
allocations sometimes.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c4ead4c5

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功