提交 · ec2c9935688fbd5eaa7c975e3e21562c3da77363 · openeuler / raspberrypi-kernel

06 2月, 2014 4 次提交

netfilter: nf_tables: fix potential oops when dumping sets · ec2c9935

由 Patrick McHardy 提交于 2月 05, 2014

Commit c9c8e485 (netfilter: nf_tables: dump sets in all existing families)
changed nft_ctx_init_from_setattr() to only look up the address family if it
is not NFPROTO_UNSPEC. However if it is NFPROTO_UNSPEC and a table attribute
is given, nftables_afinfo_lookup() will dereference the NULL afi pointer.

Fix by checking for non-NULL afi and also move a check added by that commit
to the proper position.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

ec2c9935

netfilter: nf_tables: fix overrun in nf_tables_set_alloc_name() · 53b70287

由 Patrick McHardy 提交于 2月 05, 2014

The map that is used to allocate anonymous sets is indeed
BITS_PER_BYTE * PAGE_SIZE long.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

53b70287

netfilter: nf_conntrack: don't release a conntrack with non-zero refcnt · e53376be

由 Pablo Neira Ayuso 提交于 2月 03, 2014

With this patch, the conntrack refcount is initially set to zero and
it is bumped once it is added to any of the list, so we fulfill
Eric's golden rule which is that all released objects always have a
refcount that equals zero.

Andrey Vagin reports that nf_conntrack_free can't be called for a
conntrack with non-zero ref-counter, because it can race with
nf_conntrack_find_get().

A conntrack slab is created with SLAB_DESTROY_BY_RCU. Non-zero
ref-counter says that this conntrack is used. So when we release
a conntrack with non-zero counter, we break this assumption.

CPU1                                    CPU2
____nf_conntrack_find()
                                        nf_ct_put()
                                         destroy_conntrack()
                                        ...
                                        init_conntrack
                                         __nf_conntrack_alloc (set use = 1)
atomic_inc_not_zero(&ct->use) (use = 2)
                                         if (!l4proto->new(ct, skb, dataoff, timeouts))
                                          nf_conntrack_free(ct); (use = 2 !!!)
                                        ...
                                        __nf_conntrack_alloc (set use = 1)
 if (!nf_ct_key_equal(h, tuple, zone))
  nf_ct_put(ct); (use = 0)
   destroy_conntrack()
                                        /* continue to work with CT */

After applying the path "[PATCH] netfilter: nf_conntrack: fix RCU
race in nf_conntrack_find_get" another bug was triggered in
destroy_conntrack():

<4>[67096.759334] ------------[ cut here ]------------
<2>[67096.759353] kernel BUG at net/netfilter/nf_conntrack_core.c:211!
...
<4>[67096.759837] Pid: 498649, comm: atdd veid: 666 Tainted: G         C ---------------    2.6.32-042stab084.18 #1 042stab084_18 /DQ45CB
<4>[67096.759932] RIP: 0010:[<ffffffffa03d99ac>]  [<ffffffffa03d99ac>] destroy_conntrack+0x15c/0x190 [nf_conntrack]
<4>[67096.760255] Call Trace:
<4>[67096.760255]  [<ffffffff814844a7>] nf_conntrack_destroy+0x17/0x30
<4>[67096.760255]  [<ffffffffa03d9bb5>] nf_conntrack_find_get+0x85/0x130 [nf_conntrack]
<4>[67096.760255]  [<ffffffffa03d9fb2>] nf_conntrack_in+0x352/0xb60 [nf_conntrack]
<4>[67096.760255]  [<ffffffffa048c771>] ipv4_conntrack_local+0x51/0x60 [nf_conntrack_ipv4]
<4>[67096.760255]  [<ffffffff81484419>] nf_iterate+0x69/0xb0
<4>[67096.760255]  [<ffffffff814b5b00>] ? dst_output+0x0/0x20
<4>[67096.760255]  [<ffffffff814845d4>] nf_hook_slow+0x74/0x110
<4>[67096.760255]  [<ffffffff814b5b00>] ? dst_output+0x0/0x20
<4>[67096.760255]  [<ffffffff814b66d5>] raw_sendmsg+0x775/0x910
<4>[67096.760255]  [<ffffffff8104c5a8>] ? flush_tlb_others_ipi+0x128/0x130
<4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255]  [<ffffffff814c136a>] inet_sendmsg+0x4a/0xb0
<4>[67096.760255]  [<ffffffff81444e93>] ? sock_sendmsg+0x13/0x140
<4>[67096.760255]  [<ffffffff81444f97>] sock_sendmsg+0x117/0x140
<4>[67096.760255]  [<ffffffff8102e299>] ? native_smp_send_reschedule+0x49/0x60
<4>[67096.760255]  [<ffffffff81519beb>] ? _spin_unlock_bh+0x1b/0x20
<4>[67096.760255]  [<ffffffff8109d930>] ? autoremove_wake_function+0x0/0x40
<4>[67096.760255]  [<ffffffff814960f0>] ? do_ip_setsockopt+0x90/0xd80
<4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255]  [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255]  [<ffffffff814457c9>] sys_sendto+0x139/0x190
<4>[67096.760255]  [<ffffffff810efa77>] ? audit_syscall_entry+0x1d7/0x200
<4>[67096.760255]  [<ffffffff810ef7c5>] ? __audit_syscall_exit+0x265/0x290
<4>[67096.760255]  [<ffffffff81474daf>] compat_sys_socketcall+0x13f/0x210
<4>[67096.760255]  [<ffffffff8104dea3>] ia32_sysret+0x0/0x5

I have reused the original title for the RFC patch that Andrey posted and
most of the original patch description.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Andrew Vagin <avagin@parallels.com>
Cc: Florian Westphal <fw@strlen.de>
Reported-by: NAndrew Vagin <avagin@parallels.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Acked-by: NAndrew Vagin <avagin@parallels.com>

e53376be

netfilter: nf_nat_h323: fix crash in nf_ct_unlink_expect_report() · 829d9315

由 Alexey Dobriyan 提交于 2月 03, 2014

Similar bug fixed in SIP module in 3f509c68 ("netfilter: nf_nat_sip: fix
incorrect handling of EBUSY for RTCP expectation").

BUG: unable to handle kernel paging request at 00100104
IP: [<f8214f07>] nf_ct_unlink_expect_report+0x57/0xf0 [nf_conntrack]
...
Call Trace:
  [<c0244bd8>] ? del_timer+0x48/0x70
  [<f8215687>] nf_ct_remove_expectations+0x47/0x60 [nf_conntrack]
  [<f8211c99>] nf_ct_delete_from_lists+0x59/0x90 [nf_conntrack]
  [<f8212e5e>] death_by_timeout+0x14e/0x1c0 [nf_conntrack]
  [<f8212d10>] ? nf_conntrack_set_hashsize+0x190/0x190 [nf_conntrack]
  [<c024442d>] call_timer_fn+0x1d/0x80
  [<c024461e>] run_timer_softirq+0x18e/0x1a0
  [<f8212d10>] ? nf_conntrack_set_hashsize+0x190/0x190 [nf_conntrack]
  [<c023e6f3>] __do_softirq+0xa3/0x170
  [<c023e650>] ? __local_bh_enable+0x70/0x70
  <IRQ>
  [<c023e587>] ? irq_exit+0x67/0xa0
  [<c0202af6>] ? do_IRQ+0x46/0xb0
  [<c027ad05>] ? clockevents_notify+0x35/0x110
  [<c066ac6c>] ? common_interrupt+0x2c/0x40
  [<c056e3c1>] ? cpuidle_enter_state+0x41/0xf0
  [<c056e6fb>] ? cpuidle_idle_call+0x8b/0x100
  [<c02085f8>] ? arch_cpu_idle+0x8/0x30
  [<c027314b>] ? cpu_idle_loop+0x4b/0x140
  [<c0273258>] ? cpu_startup_entry+0x18/0x20
  [<c066056d>] ? rest_init+0x5d/0x70
  [<c0813ac8>] ? start_kernel+0x2ec/0x2f2
  [<c081364f>] ? repair_env_string+0x5b/0x5b
  [<c0813269>] ? i386_start_kernel+0x33/0x35
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

829d9315

05 2月, 2014 3 次提交

netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get · c6825c09

由 Andrey Vagin 提交于 1月 29, 2014

Lets look at destroy_conntrack:

hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
...
nf_conntrack_free(ct)
	kmem_cache_free(net->ct.nf_conntrack_cachep, ct);

net->ct.nf_conntrack_cachep is created with SLAB_DESTROY_BY_RCU.

The hash is protected by rcu, so readers look up conntracks without
locks.
A conntrack is removed from the hash, but in this moment a few readers
still can use the conntrack. Then this conntrack is released and another
thread creates conntrack with the same address and the equal tuple.
After this a reader starts to validate the conntrack:
* It's not dying, because a new conntrack was created
* nf_ct_tuple_equal() returns true.

But this conntrack is not initialized yet, so it can not be used by two
threads concurrently. In this case BUG_ON may be triggered from
nf_nat_setup_info().

Florian Westphal suggested to check the confirm bit too. I think it's
right.

task 1			task 2			task 3
			nf_conntrack_find_get
			 ____nf_conntrack_find
destroy_conntrack
 hlist_nulls_del_rcu
 nf_conntrack_free
 kmem_cache_free
						__nf_conntrack_alloc
						 kmem_cache_alloc
						 memset(&ct->tuplehash[IP_CT_DIR_MAX],
			 if (nf_ct_is_dying(ct))
			 if (!nf_ct_tuple_equal()

I'm not sure, that I have ever seen this race condition in a real life.
Currently we are investigating a bug, which is reproduced on a few nodes.
In our case one conntrack is initialized from a few tasks concurrently,
we don't have any other explanation for this.

<2>[46267.083061] kernel BUG at net/ipv4/netfilter/nf_nat_core.c:322!
...
<4>[46267.083951] RIP: 0010:[<ffffffffa01e00a4>]  [<ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590 [nf_nat]
...
<4>[46267.085549] Call Trace:
<4>[46267.085622]  [<ffffffffa023421b>] alloc_null_binding+0x5b/0xa0 [iptable_nat]
<4>[46267.085697]  [<ffffffffa02342bc>] nf_nat_rule_find+0x5c/0x80 [iptable_nat]
<4>[46267.085770]  [<ffffffffa0234521>] nf_nat_fn+0x111/0x260 [iptable_nat]
<4>[46267.085843]  [<ffffffffa0234798>] nf_nat_out+0x48/0xd0 [iptable_nat]
<4>[46267.085919]  [<ffffffff814841b9>] nf_iterate+0x69/0xb0
<4>[46267.085991]  [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0
<4>[46267.086063]  [<ffffffff81484374>] nf_hook_slow+0x74/0x110
<4>[46267.086133]  [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0
<4>[46267.086207]  [<ffffffff814b5890>] ? dst_output+0x0/0x20
<4>[46267.086277]  [<ffffffff81495204>] ip_output+0xa4/0xc0
<4>[46267.086346]  [<ffffffff814b65a4>] raw_sendmsg+0x8b4/0x910
<4>[46267.086419]  [<ffffffff814c10fa>] inet_sendmsg+0x4a/0xb0
<4>[46267.086491]  [<ffffffff814459aa>] ? sock_update_classid+0x3a/0x50
<4>[46267.086562]  [<ffffffff81444d67>] sock_sendmsg+0x117/0x140
<4>[46267.086638]  [<ffffffff8151997b>] ? _spin_unlock_bh+0x1b/0x20
<4>[46267.086712]  [<ffffffff8109d370>] ? autoremove_wake_function+0x0/0x40
<4>[46267.086785]  [<ffffffff81495e80>] ? do_ip_setsockopt+0x90/0xd80
<4>[46267.086858]  [<ffffffff8100be0e>] ? call_function_interrupt+0xe/0x20
<4>[46267.086936]  [<ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90
<4>[46267.087006]  [<ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90
<4>[46267.087081]  [<ffffffff8118f2e8>] ? kmem_cache_alloc+0xd8/0x1e0
<4>[46267.087151]  [<ffffffff81445599>] sys_sendto+0x139/0x190
<4>[46267.087229]  [<ffffffff81448c0d>] ? sock_setsockopt+0x16d/0x6f0
<4>[46267.087303]  [<ffffffff810efa47>] ? audit_syscall_entry+0x1d7/0x200
<4>[46267.087378]  [<ffffffff810ef795>] ? __audit_syscall_exit+0x265/0x290
<4>[46267.087454]  [<ffffffff81474885>] ? compat_sys_setsockopt+0x75/0x210
<4>[46267.087531]  [<ffffffff81474b5f>] compat_sys_socketcall+0x13f/0x210
<4>[46267.087607]  [<ffffffff8104dea3>] ia32_sysret+0x0/0x5
<4>[46267.087676] Code: 91 20 e2 01 75 29 48 89 de 4c 89 f7 e8 56 fa ff ff 85 c0 0f 84 68 fc ff ff 0f b6 4d c6 41 8b 45 00 e9 4d fb ff ff e8 7c 19 e9 e0 <0f> 0b eb fe f6 05 17 91 20 e2 80 74 ce 80 3d 5f 2e 00 00 00 74
<1>[46267.088023] RIP  [<ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: NAndrey Vagin <avagin@openvz.org>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

c6825c09

netfilter: nf_tables: fix oops when deleting a chain with references · 3dd7279f

由 Patrick McHardy 提交于 1月 25, 2014

The following commands trigger an oops:

 # nft -i
 nft> add table filter
 nft> add chain filter input { type filter hook input priority 0; }
 nft> add chain filter test
 nft> add rule filter input jump test
 nft> delete chain filter test

We need to check the chain use counter before allowing destruction since
we might have references from sets or jump rules.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=69341Reported-by: NMatthew Ife <deleriux1@gmail.com>
Tested-by: NMatthew Ife <deleriux1@gmail.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

3dd7279f

netfilter: nft_ct: fix unconditional dump of 'dir' attr · 2a53bfb3

由 Arturo Borrero 提交于 1月 17, 2014

We want to make sure that the information that we get from the kernel can
be reinjected without troubles. The kernel shouldn't return an attribute
that is not required, or even prohibited.

Dumping unconditionally NFTA_CT_DIRECTION could lead an application in
userspace to interpret that the attribute was originally set, while it
was not.
Signed-off-by: NArturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

2a53bfb3

04 2月, 2014 1 次提交

ipvs: fix AF assignment in ip_vs_conn_new() · 2a971354

由 Michal Kubecek 提交于 1月 30, 2014

If a fwmark is passed to ip_vs_conn_new(), it is passed in
vaddr, not daddr. Therefore we should set AF to AF_UNSPEC in
vaddr assignment (like we do in ip_vs_ct_in_get()), otherwise we
may copy only first 4 bytes of an IPv6 address into cp->daddr.
Signed-off-by: NBogdano Arendartchuk <barendartchuk@suse.com>
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Acked-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NSimon Horman <horms@verge.net.au>

2a971354

28 1月, 2014 14 次提交

net: Document promote_secondaries · d922e1cb

由 Martin Schwenke 提交于 1月 28, 2014

From 038a821667f62c496f2bbae27081b1b612122a97 Mon Sep 17 00:00:00 2001
From: Martin Schwenke <martin@meltin.net>
Date: Tue, 28 Jan 2014 15:16:49 +1100
Subject: [PATCH] net: Document promote_secondaries

This option was added a long time ago...

  commit 8f937c60
  Author: Harald Welte <laforge@gnumonks.org>
  Date:   Sun May 29 20:23:46 2005 -0700

    [IPV4]: Primary and secondary addresses
Signed-off-by: NMartin Schwenke <martin@meltin.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d922e1cb

net: gre: use icmp_hdr() to get inner ip header · c0c0c50f

由 Duan Jiong 提交于 1月 28, 2014

When dealing with icmp messages, the skb->data points the
ip header that triggered the sending of the icmp message.

In gre_cisco_err(), the parse_gre_header() is called, and the
iptunnel_pull_header() is called to pull the skb at the end of
the parse_gre_header(), so the skb->data doesn't point the
inner ip header.

Unfortunately, the ipgre_err still needs those ip addresses in
inner ip header to look up tunnel by ip_tunnel_lookup().

So just use icmp_hdr() to get inner ip header instead of skb->data.
Signed-off-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c0c0c50f

i40e: Add missing braces to i40e_dcb_need_reconfig() · 3d9667a9

由 Dave Jones 提交于 1月 27, 2014

Indentation mismatch spotted with Coverity.
Introduced in 4e3b35b0 ("i40e: add DCB and DCBNL support")
Signed-off-by: NDave Jones <davej@fedoraproject.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d9667a9

xen-netfront: fix resource leak in netfront · cefe0078

由 Annie Li 提交于 1月 28, 2014

This patch removes grant transfer releasing code from netfront, and uses
gnttab_end_foreign_access to end grant access since
gnttab_end_foreign_access_ref may fail when the grant entry is
currently used for reading or writing.

* clean up grant transfer code kept from old netfront(2.6.18) which grants
pages for access/map and transfer. But grant transfer is deprecated in current
netfront, so remove corresponding release code for transfer.

* fix resource leak, release grant access (through gnttab_end_foreign_access)
and skb for tx/rx path, use get_page to ensure page is released when grant
access is completed successfully.

Xen-blkfront/xen-tpmfront/xen-pcifront also have similar issue, but patches
for them will be created separately.

V6: Correct subject line and commit message.

V5: Remove unecessary change in xennet_end_access.

V4: Revert put_page in gnttab_end_foreign_access, and keep netfront change in
single patch.

V3: Changes as suggestion from David Vrabel, ensure pages are not freed untill
grant acess is ended.

V2: Improve patch comments.
Signed-off-by: NAnnie Li <annie.li@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cefe0078

net: 6lowpan: fixup for code movement · ce60e0c4

由 Stephen Rothwell 提交于 1月 07, 2014

Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce60e0c4

hyperv: Add support for physically discontinuous receive buffer · b679ef73

由 Haiyang Zhang 提交于 1月 27, 2014

This will allow us to use bigger receive buffer, and prevent allocation failure
due to fragmented memory.
Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b679ef73

sky2: initialize napi before registering device · 731073b9

由 Stanislaw Gruszka 提交于 1月 25, 2014

There is race condition when call netif_napi_add() after
register_netdevice(), as ->open() can be called without napi initialized
and trigger BUG_ON() on napi_enable(), like on below messages:

[    9.699863] sky2: driver version 1.30
[    9.699960] sky2 0000:02:00.0: Yukon-2 EC Ultra chip revision 2
[    9.700020] sky2 0000:02:00.0: irq 45 for MSI/MSI-X
[    9.700498] ------------[ cut here ]------------
[    9.703391] kernel BUG at include/linux/netdevice.h:501!
[    9.703391] invalid opcode: 0000 [#1] PREEMPT SMP
<snip>
[    9.830018] Call Trace:
[    9.830018]  [<fa996169>] sky2_open+0x309/0x360 [sky2]
[    9.830018]  [<c1007210>] ? via_no_dac+0x40/0x40
[    9.830018]  [<c1007210>] ? via_no_dac+0x40/0x40
[    9.830018]  [<c135ed4b>] __dev_open+0x9b/0x120
[    9.830018]  [<c1431cbe>] ? _raw_spin_unlock_bh+0x1e/0x20
[    9.830018]  [<c135efd9>] __dev_change_flags+0x89/0x150
[    9.830018]  [<c135f148>] dev_change_flags+0x18/0x50
[    9.830018]  [<c13bb8e0>] devinet_ioctl+0x5d0/0x6e0
[    9.830018]  [<c13bcced>] inet_ioctl+0x6d/0xa0

To fix the problem patch changes the order of initialization.

Bug report:
https://bugzilla.kernel.org/show_bug.cgi?id=67151

Reported-and-tested-by: ebrahim.azarisooreh@gmail.com
Signed-off-by: NStanislaw Gruszka <stf_xl@wp.pl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

731073b9

net: Fix memory leak if TPROXY used with TCP early demux · a452ce34

由 Holger Eitzenberger 提交于 1月 27, 2014

I see a memory leak when using a transparent HTTP proxy using TPROXY
together with TCP early demux and Kernel v3.8.13.15 (Ubuntu stable):

unreferenced object 0xffff88008cba4a40 (size 1696):
  comm "softirq", pid 0, jiffies 4294944115 (age 8907.520s)
  hex dump (first 32 bytes):
    0a e0 20 6a 40 04 1b 37 92 be 32 e2 e8 b4 00 00  .. j@..7..2.....
    02 00 07 01 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff810b710a>] kmem_cache_alloc+0xad/0xb9
    [<ffffffff81270185>] sk_prot_alloc+0x29/0xc5
    [<ffffffff812702cf>] sk_clone_lock+0x14/0x283
    [<ffffffff812aaf3a>] inet_csk_clone_lock+0xf/0x7b
    [<ffffffff8129a893>] netlink_broadcast+0x14/0x16
    [<ffffffff812c1573>] tcp_create_openreq_child+0x1b/0x4c3
    [<ffffffff812c033e>] tcp_v4_syn_recv_sock+0x38/0x25d
    [<ffffffff812c13e4>] tcp_check_req+0x25c/0x3d0
    [<ffffffff812bf87a>] tcp_v4_do_rcv+0x287/0x40e
    [<ffffffff812a08a7>] ip_route_input_noref+0x843/0xa55
    [<ffffffff812bfeca>] tcp_v4_rcv+0x4c9/0x725
    [<ffffffff812a26f4>] ip_local_deliver_finish+0xe9/0x154
    [<ffffffff8127a927>] __netif_receive_skb+0x4b2/0x514
    [<ffffffff8127aa77>] process_backlog+0xee/0x1c5
    [<ffffffff8127c949>] net_rx_action+0xa7/0x200
    [<ffffffff81209d86>] add_interrupt_randomness+0x39/0x157

But there are many more, resulting in the machine going OOM after some
days.

From looking at the TPROXY code, and with help from Florian, I see
that the memory leak is introduced in tcp_v4_early_demux():

  void tcp_v4_early_demux(struct sk_buff *skb)
  {
    /* ... */

    iph = ip_hdr(skb);
    th = tcp_hdr(skb);

    if (th->doff < sizeof(struct tcphdr) / 4)
        return;

    sk = __inet_lookup_established(dev_net(skb->dev), &tcp_hashinfo,
                       iph->saddr, th->source,
                       iph->daddr, ntohs(th->dest),
                       skb->skb_iif);
    if (sk) {
        skb->sk = sk;

where the socket is assigned unconditionally to skb->sk, also bumping
the refcnt on it.  This is problematic, because in our case the skb
has already a socket assigned in the TPROXY target.  This then results
in the leak I see.

The very same issue seems to be with IPv6, but haven't tested.
Reviewed-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NHolger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a452ce34

Merge branch 'bonding' · 66dd1c07

由 David S. Miller 提交于 1月 27, 2014

Veaceslav Falico says:

====================
bonding: fix locking in bond_ab_arp_prob

After the latest patches, on every call of bond_ab_arp_probe() without an
active slave I see the following warning:

[    7.912314] RTNL: assertion failed at net/core/dev.c (4494)
...
[    7.922495]  [<ffffffff817acc6f>] dump_stack+0x51/0x72
[    7.923714]  [<ffffffff8168795e>] netdev_master_upper_dev_get+0x6e/0x70
[    7.924940]  [<ffffffff816a2a66>] rtnl_link_fill+0x116/0x260
[    7.926143]  [<ffffffff817acc6f>] ? dump_stack+0x51/0x72
[    7.927333]  [<ffffffff816a350c>] rtnl_fill_ifinfo+0x95c/0xb90
[    7.928529]  [<ffffffff8167af2b>] ? __kmalloc_reserve+0x3b/0xa0
[    7.929681]  [<ffffffff8167bfcf>] ? __alloc_skb+0x9f/0x1e0
[    7.930827]  [<ffffffff816a3b64>] rtmsg_ifinfo+0x84/0x100
[    7.931960]  [<ffffffffa00bca07>] bond_ab_arp_probe+0x1a7/0x370 [bonding]
[    7.933133]  [<ffffffffa00bcd78>] bond_activebackup_arp_mon+0x1a8/0x2f0 [bonding]
...

It happens because in bond_ab_arp_probe() we change the flags of a slave
without holding the RTNL lock.

To fix this - remove the useless curr_active_lock, RCUify it and lock RTNL
while changing the slave's flags. Also, remove bond_ab_arp_probe() from
under any locks in bond_ab_arp_mon().
====================
Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66dd1c07

bonding: restructure locking of bond_ab_arp_probe() · f2ebd477

由 Veaceslav Falico 提交于 1月 27, 2014

Currently we're calling it from under RCU context, however we're using some
functions that require rtnl to be held.

Fix this by restructuring the locking - don't call it under any locks,
aquire rcu_read_lock() if we're sending _only_ (i.e. we have the active
slave present), and use rtnl locking otherwise - if we need to modify
(in)active flags of a slave.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f2ebd477

bonding: RCUify bond_ab_arp_probe · 98b90f26

由 Veaceslav Falico 提交于 1月 27, 2014

Currently bond_ab_arp_probe() is always called under rcu_read_lock(),
however to work with curr_active_slave we're still holding the
curr_slave_lock.

To remove that curr_slave_lock - rcu_dereference the bond's
curr_active_slave and use it further - so that we're sure the slave won't
go away, and we don't care if it will change in the meanwhile.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

98b90f26

AF_PACKET: Add documentation for queue mapping fanout mode · bb9fbe2d

由 Neil Horman 提交于 1月 27, 2014

Recently I added a new AF_PACKET fanout operation mode in commit
2d36097d, but I forgot to document it.  Add PACKET_FANOUT_QM as an available mode
in the af_packet documentation.  Applies to net-next.
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Daniel Borkmann <dborkman@redhat.com>
Acked-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bb9fbe2d

bnx2x: More Shutdown revisions · 5f6db130

由 Yuval Mintz 提交于 1月 27, 2014

Submission d9aee591 "bnx2x: Don't release PCI bars on shutdown" separated
the PCI remove and shutdown flows, but pci_disable_device() is still
being called on both.
As a result, a dev_WARN_ONCE will be hit during shutdown for every bnx2x
VF probed on a hypervisor (as its shutdown callback will be called and later
pci_disable_sriov() will call its remove callback).

This calls the pci_disable_device() only on the remove flow.
Signed-off-by: NYuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: NAriel Elior <ariele@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f6db130

net: ipv4: Use PTR_ERR_OR_ZERO · 27d79f3b

由 Sachin Kamat 提交于 1月 27, 2014

PTR_RET is deprecated. Use PTR_ERR_OR_ZERO instead. While at it
also include missing err.h header.
Signed-off-by: NSachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

27d79f3b

27 1月, 2014 7 次提交

net: stmmac: Log MAC address only once · c88460b7

由 Hans de Goede 提交于 1月 26, 2014

Logging the MAC address on every if-up, is not really useful, and annoying when
there is no cable inserted and NetworkManager tries the ifup every 50 seconds.

Also change the log level from warning to info, as that is what it is.
Signed-off-by: NHans de Goede <hdegoede@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c88460b7

net: stmmac: Silence PTP init errors on hw without PTP · 7509edd6

由 Hans de Goede 提交于 1月 26, 2014

Logging a PTP error on hw which simply does not support PTP is not very
useful. Moreover this message gets logged on every if-up, and if there is
no cable inserted NetworkManager will re-try the ifup every 50 seconds.
Signed-off-by: NHans de Goede <hdegoede@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7509edd6

net/apne: Remove unused variable ei_local · b565c9f7

由 Geert Uytterhoeven 提交于 1月 26, 2014

drivers/net/ethernet/8390/apne.c: In function ‘apne_probe1’:
drivers/net/ethernet/8390/apne.c:215: warning: unused variable ‘ei_local’

Introduced by commit c45f812f ("8390 :
Replace ei_debug with msg_enable/NETIF_MSG_* feature"), which added the
variable without using it.
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b565c9f7

net: add and use skb_gso_transport_seglen() · de960aa9

由 Florian Westphal 提交于 1月 26, 2014

This moves part of Eric Dumazets skb_gso_seglen helper from tbf sched to
skbuff core so it may be reused by upcoming ip forwarding path patch.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de960aa9

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml · 77d143de

由 Linus Torvalds 提交于 1月 26, 2014

Pull UML changes from Richard Weinberger:
 "This time only various cleanups and housekeeping patches"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: hostfs: make functions static
  um: Include generic barrier.h
  um: Removed unused attributes from thread_struct

77d143de

Merge tag 'mmc-updates-for-3.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc · ccc039d6

由 Linus Torvalds 提交于 1月 26, 2014

Pull MMC updates from Chris Ball:
 "MMC highlights for 3.14:

  Core:
   - Avoid get_cd() on cards marked nonremovable

  Drivers:
   - arasan: New driver for controllers found in e.g. Xilinx Zynq SoC
   - dwmmc: Support Hisilicon K3 SoC controllers
   - esdhc-imx: Support for HS200 mode, DDR modes on MX6, runtime PM
   - sdhci-pci: Support O2Micro/BayHubTech controllers used in laptops
     like Lenovo ThinkPad W540, Dell Latitude E5440, Dell Latitude E6540
   - tegra: Support Tegra124 SoCs"

* tag 'mmc-updates-for-3.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: (55 commits)
  mmc: sdhci-pci: Fix possibility of chip->fixes being null
  mmc: sdhci-pci: Fix BYT sd card getting stuck in runtime suspend
  mmc: sdhci: Allow for long command timeouts
  mmc: sdio: add a quirk for broken SDIO_CCCR_INTx polling
  mmc: sdhci: fix lockdep error in tuning routine
  mmc: dw_mmc: k3: remove clk_table
  mmc: dw_mmc: fix dw_mci_get_cd
  mmc: dw_mmc: fix sparse non static symbol warning
  mmc: sdhci-esdhc-imx: fix warning during module remove function
  mmc: sdhci-esdhc-imx: fix access hardirq-unsafe lock in atomic context
  mmc: core: sd: implement proper support for sd3.0 au sizes
  mmc: atmel-mci: add vmmc-supply support
  mmc: sdhci-pci: add broken HS200 quirk for Intel Merrifield
  mmc: sdhci: add quirk for broken HS200 support
  mmc: arasan: Add driver for Arasan SDHCI
  mmc: dw_mmc: add dw_mmc-k3 for k3 platform
  mmc: dw_mmc: use slot-gpio to handle cd pin
  mmc: sdhci-pci: add support of O2Micro/BayHubTech SD hosts
  mmc: sdhci-pci: break out definitions to header file
  mmc: tmio: fixup compile error
  ...

Conflicts:
	MAINTAINERS

ccc039d6

Merge tag 'for-3.14-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs · 1c294838

由 Linus Torvalds 提交于 1月 26, 2014

Pull 9p changes from Eric Van Hensbergen:
 "Included are a new cache model for support of mmap, and several
  cleanups across the filesystem and networking portions of the code"

* tag 'for-3.14-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: update documentation
  9P: introduction of a new cache=mmap model.
  net/9p: remove virtio default hack and set appropriate bits instead
  9p: remove useless 'name' variable and assignment
  9p: fix return value in case in v9fs_fid_xattr_set()
  9p: remove useless variable and assignment
  9p: remove useless assignment
  9p: remove unused 'super_block' struct pointer
  9p: remove never used return variable
  9p: remove unused 'p9_fid' struct pointer
  9p: remove unused 'p9_client' struct pointer

1c294838

26 1月, 2014 11 次提交

um: hostfs: make functions static · 9e443bc3

由 James Hogan 提交于 11月 14, 2013

The hostfs_*() callback functions are all only used within
hostfs_kern.c, so make them static.
Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: user-mode-linux-devel@lists.sourceforge.net
Signed-off-by: NRichard Weinberger <richard@nod.at>

9e443bc3

um: Include generic barrier.h · 9af2452a

由 Richard Weinberger 提交于 1月 15, 2014

...to get smp_store_release().
Reported-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NRichard Weinberger <richard@nod.at>

9af2452a

R
um: Removed unused attributes from thread_struct · 61aad98a
由 Richard Weinberger 提交于 9月 13, 2013
```
temp_stack and mm_count have no users and can be killed.
Signed-off-by: NRichard Weinberger <richard@nod.at>
```
61aad98a

Merge branch 'ipmi' (ipmi patches from Corey Minyard) · b2e448ec

由 Linus Torvalds 提交于 1月 25, 2014

Merge ipmi fixes from Corey Minyard:
 "Just some collected fixes for 3.14.  Nothing huge"

* emailed patches from Corey Minyard <minyard@acm.org>:
  ipmi: Cleanup error return
  ipmi: fix timeout calculation when bmc is disconnected
  ipmi: use USEC_PER_SEC instead of 1000000 for more meaningful
  ipmi: remove deprecated IRQF_DISABLED

b2e448ec

ipmi: Cleanup error return · d02b3709

由 Corey Minyard 提交于 1月 24, 2014

Return proper errors for a lot of IPMI failure cases.  Also call
pci_disable_device when IPMI PCI devices are removed.
Signed-off-by: NCorey Minyard <cminyard@mvista.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d02b3709

ipmi: fix timeout calculation when bmc is disconnected · e21404dc

由 Xie XiuQi 提交于 1月 24, 2014

Loading ipmi_si module while bmc is disconnected, we found the timeout
is longer than 5 secs.  Actually it takes about 3 mins and 20
secs.(HZ=250)

error message as below:
  Dec 12 19:08:59 linux kernel: IPMI BT: timeout in RD_WAIT [ ] 1 retries left
  Dec 12 19:08:59 linux kernel: BT: write 4 bytes seq=0x01 03 18 00 01
  [...]
  Dec 12 19:12:19 linux kernel: IPMI BT: timeout in RD_WAIT [ ]
  Dec 12 19:12:19 linux kernel: failed 2 retries, sending error response
  Dec 12 19:12:19 linux kernel: IPMI: BT reset (takes 5 secs)
  Dec 12 19:12:19 linux kernel: IPMI BT: flag reset [ ]

Function wait_for_msg_done() use schedule_timeout_uninterruptible(1) to
sleep 1 tick, so we should subtract jiffies_to_usecs(1) instead of 100
usecs from timeout.
Reported-by: NHu Shiyuan <hushiyuan@huawei.com>
Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NCorey Minyard <cminyard@mvista.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e21404dc

ipmi: use USEC_PER_SEC instead of 1000000 for more meaningful · ccb3368c

由 Xie XiuQi 提交于 1月 24, 2014

Use USEC_PER_SEC instead of 1000000, that making the later bugfix
more clearly.
Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NCorey Minyard <cminyard@mvista.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ccb3368c

ipmi: remove deprecated IRQF_DISABLED · aa5b2bab

由 Michael Opdenacker 提交于 1月 24, 2014

This patch proposes to remove the use of the IRQF_DISABLED flag

It's a NOOP since 2.6.35 and it will be removed one day.
Signed-off-by: NMichael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: NCorey Minyard <cminyard@mvista.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

aa5b2bab

Merge tag 'spi-v3.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · 2d2e7d19

由 Linus Torvalds 提交于 1月 25, 2014

Pull spi updates from Mark Brown:
 "A respun version of the merges for the pull request previously sent
  with a few additional fixes.  The last two merges were fixed up by
  hand since the branches have moved on and currently have the prior
  merge in them.

  Quite a busy release for the SPI subsystem, mostly in cleanups big and
  small scattered through the stack rather than anything else:

   - New driver for the Broadcom BC63xx HSSPI controller
   - Fix duplicate device registration for ACPI
   - Conversion of s3c64xx to DMAEngine (this pulls in platform and DMA
     changes upon which the transiton depends)
   - Some small optimisations to reduce the amount of time we hold locks
     in the datapath, eliminate some redundant checks and the size of a
     spi_transfer
   - Lots of fixes, cleanups and general enhancements to drivers,
     especially the rspi and Atmel drivers"

* tag 'spi-v3.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (112 commits)
  spi: core: Fix transfer failure when master->transfer_one returns positive value
  spi: Correct set_cs() documentation
  spi: Clarify transfer_one() w.r.t. spi_finalize_current_transfer()
  spi: Spelling s/finised/finished/
  spi: sc18is602: Convert to use bits_per_word_mask
  spi: Remove duplicate code to set default bits_per_word setting
  spi/pxa2xx: fix compilation warning when !CONFIG_PM_SLEEP
  spi: clps711x: Add MODULE_ALIAS to support module auto-loading
  spi: rspi: Add missing clk_disable() calls in error and cleanup paths
  spi: rspi: Spelling s/transmition/transmission/
  spi: rspi: Add support for specifying CPHA/CPOL
  spi/pxa2xx: initialize DMA channels to -1 to prevent inadvertent match
  spi: rspi: Add more QSPI register documentation
  spi: rspi: Add more RSPI register documentation
  spi: rspi: Remove dependency on DMAE for SHMOBILE
  spi/s3c64xx: Correct indentation
  spi: sh: Use spi_sh_clear_bit() instead of open-coded
  spi: bitbang: Grammar s/make to make/to make/
  spi: sh-hspi: Spelling s/recive/receive/
  spi: core: Improve tx/rx_nbits check comments
  ...

2d2e7d19

Merge tag 'regulator-v3.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator · 15333539

由 Linus Torvalds 提交于 1月 25, 2014

Pull regulator updates from Mark Brown:
 "A respin of the merges in the previous pull request with one extra
  fix.

  A quiet release for the regulator API, quite a large number of small
  improvements all over but other than the addition of new drivers for
  the AS3722 and MAX14577 there is nothing of substantial non-local
  impact"

* tag 'regulator-v3.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (47 commits)
  regulator: pfuze100-regulator: Improve dev_info() message
  regulator: pfuze100-regulator: Fix some checkpatch complaints
  regulator: twl: Fix checkpatch issue
  regulator: core: Fix checkpatch issue
  regulator: anatop-regulator: Remove unneeded memset()
  regulator: s5m8767: Update LDO index in s5m8767-regulator.txt
  regulator: as3722: set enable time for SD0/1/6
  regulator: as3722: detect SD0 low-voltage mode
  regulator: tps62360: Fix up a pointer-integer size mismatch warning
  regulator: anatop-regulator: Remove unneeded kstrdup()
  regulator: act8865: Fix build error when !OF
  regulator: act8865: register all regulators regardless of how many are used
  regulator: wm831x-dcdc: Remove unneeded 'err' label
  regulator: anatop-regulator: Add MODULE_ALIAS()
  regulator: act8865: fix incorrect devm_kzalloc for act8865
  regulator: act8865: Remove set_suspend_[en|dis]able implementation
  regulator: act8865: Remove unneeded regulator_unregister() calls
  regulator: s2mps11: Clean up redundant code
  regulator: tps65910: Simplify setting enable_mask for regulators
  regulator: act8865: add device tree binding doc
  ...

15333539

Merge tag 'regmap-v3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap · bb1b6490

由 Linus Torvalds 提交于 1月 25, 2014

Pull regmap updates from Mark Brown:
 "Nothing terribly exciting with regmap this release, mainly a few small
  extensions to allow more devices to be supported:

   - Allow the bulk I/O APIs to be used with no-bus regmaps
   - Support interrupt controllers with zero ack base
   - Warning and spelling fixes"

* tag 'regmap-v3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: fix a couple of typos
  regmap: Allow regmap_bulk_write() to work for "no-bus" regmaps
  regmap: Allow regmap_bulk_read() to work for "no-bus" regmaps
  regmap: irq: Allow using zero value for ack_base
  regmap: Fix 'ret' would return an uninitialized value

bb1b6490