提交 · f76a47c83247b453f25629618056a6d2c1e39103 · openeuler / Kernel

17 3月, 2010 40 次提交

J
netfilter: xt_NFQUEUE: consolidate v4/v6 targets into one · f76a47c8
由 Jan Engelhardt 提交于 6月 05, 2009
```
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
```
f76a47c8
J
netfilter: xt_CT: par->family is an nfproto · 076f7839
由 Jan Engelhardt 提交于 3月 11, 2010
```
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
```
076f7839
D
e1000e: Fix build with CONFIG_PM disabled. · e50208a0
由 David S. Miller 提交于 3月 16, 2010
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
e50208a0

drivers/net/e100.c: Use pr_<level> and netif_<level> · fa05e1ad

由 Joe Perches 提交于 3月 16, 2010

Convert DPRINTK, commonly used for debugging, to netif_<level>
Remove #define PFX
Use #define pr_fmt
Consistently use no periods for non-sentence logging messages
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa05e1ad

NET: Support clause 45 MDIO commands at the MDIO bus level · abf35df2

由 Jason Gunthorpe 提交于 3月 09, 2010

IEEE 802.3ae clause 45 specifies a somewhat modified MDIO protocol
for use by 10GIGE phys. The main change is a 21 bit address split into
a 5 bit device ID and a 16 bit register offset. The definition is designed
so that normal and extended devices can run on the same MDIO bus.

Extend mdio-bitbang to do the new protocol. At the MDIO bus level the
protocol is requested by or'ing MII_ADDR_C45 into the register offset.

Make phy_read/phy_write/etc pass a full 32 bit register offset.

This does not attempt to make the phy layer support C45 style PHYs, just
to provide the MDIO bus support.

Tested against a Broadcom 10GE phy with ID 0x206034, and several
Broadcom 10/100/1000 Phys in normal mode.
Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

abf35df2

e1000e / PCI / PM: Add basic runtime PM support (rev. 4) · 23606cf5

由 Rafael J. Wysocki 提交于 3月 14, 2010

Use the PCI runtime power management framework to add basic PCI
runtime PM support to the e1000e driver.  Namely, make the driver
suspend the device when the link is off and set it up for generating
a wakeup event after the link has been detected again.  [This
feature is disabled until the user space enables it with the help of
the /sys/devices/.../power/contol device attribute.]

Based on a patch from Matthew Garrett.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

23606cf5

r8169 / PCI / PM: Add simplified runtime PM support (rev. 3) · e1759441

由 Rafael J. Wysocki 提交于 3月 14, 2010

Use the PCI runtime power management framework to add basic PCI
runtime PM support to the r8169 driver.  Namely, make the driver
suspend the device when the link is not present and set it up for
generating a wakeup event after the link has been detected again.
[This feature is disabled until the user space enables it with the
help of the /sys/devices/.../power/contol device attribute.]
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e1759441

net: convert multiple drivers to use netdev_for_each_mc_addr, part7 · ff6e2163

由 Jiri Pirko 提交于 3月 01, 2010

In mlx4, using char * to store mc address in private structure instead.
Signed-off-by: NJiri Pirko <jpirko@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff6e2163

drivers/net/ks*: Use netdev_<level>, netif_<level> and pr_<level> · 0dc7d2b3

由 Joe Perches 提交于 2月 27, 2010

I'm not sure this is correct.

It changes logging macros from:
	dev_<level>(&ks->spidev->dev,
to
	netdev_<level>(ks->netdev,

Comments?

Use netdev_<level>
Use netif_<level>
Use pr_<level>
Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
Add missing line to message in ks8851_remove
Change kmalloc/memset(,0) to kzalloc
Remove ks_<level> macros
Consolidation code into set_media_state
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0dc7d2b3

tipc: Allow retransmission of cloned buffers · ca509101

由 Neil Horman 提交于 3月 15, 2010

Forward port commit
fc477e160af086f6e30c3d4fdf5f5c000d29beb5
from git://tipc.cslab.ericsson.net/pub/git/people/allan/tipc.git

Origional commit message:

Allow retransmission of cloned buffers

This patch fixes an issue with TIPC's message retransmission logic
that prevented retransmission of clone sk_buffs.  Originally intended
as a means of avoiding wasted work in retransmitting messages that
were still on the driver's outbound queue, it also prevented TIPC
from retransmitting messages through other means -- such as the
secondary bearer of the broadcast link, or another interface in a
set of bonded interfaces.  This fix removes existing checks for
cloned sk_buffs that prevented such retransmission.
Origionally-Signed-off-by: NAllan Stephens <allan.stephens@windriver.com>
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ca509101

tipc: Increase frequency of load distribution over broadcast link · 1a624832

由 Neil Horman 提交于 3月 15, 2010

Forward port commit 29eb572941501c40ac6e62dbc5043bf9ee76ee56
from git://tipc.cslab.ericsson.net/pub/git/people/allan/tipc.git

Origional commit message:
Increase frequency of load distribution over broadcast link

This patch enhances the behavior of TIPC's broadcast link so that it
alternates between redundant bearers (if available) after every
message sent, rather than after every 10 messages.  This change helps
to speed up delivery of retransmitted messages by ensuring that
they are not sent repeatedly over a bearer that is no longer working,
but not yet recognized as failed.

Tested by myself in the latest net-2.6 tree using the tipc sanity test suite
Origionally-signed-off-by: NAllan Stephens <allan.stephens@windriver.com>
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>

bcast.c |   35 ++++++++++++++---------------------
1 file changed, 14 insertions(+), 21 deletions(-)
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1a624832

net: core: add IFLA_STATS64 support · 10708f37

由 Jan Engelhardt 提交于 3月 11, 2010

`ip -s link` shows interface counters truncated to 32 bit. This is
because interface statistics are transported only in 32-bit quantity
to userspace. This commit adds a new IFLA_STATS64 attribute that
exports them in full 64 bit.

References: http://lkml.indiana.edu/hypermail/linux/kernel/0307.3/0215.htmlSigned-off-by: NJan Engelhardt <jengelh@medozas.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

10708f37

J
net: tcp: make veno selectable as default congestion module · 6ce1a6df
由 Jan Engelhardt 提交于 3月 11, 2010
```
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
6ce1a6df
J
net: tcp: make hybla selectable as default congestion module · dd2acaa7
由 Jan Engelhardt 提交于 3月 11, 2010
```
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
dd2acaa7

net: remove rcu locking from fib_rules_event() · 2fb3573d

由 Eric Dumazet 提交于 3月 09, 2010

We hold RTNL at this point and dont use RCU variants of list traversals,
we dont need rcu_read_lock()/rcu_read_unlock()
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2fb3573d

bridge: per-cpu packet statistics (v3) · 14bb4789

由 stephen hemminger 提交于 3月 02, 2010

The shared packet statistics are a potential source of slow down
on bridged traffic. Convert to per-cpu array, but only keep those
statistics which change per-packet.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

14bb4789

rps: Receive Packet Steering · 0a9627f2

由 Tom Herbert 提交于 3月 16, 2010

This patch implements software receive side packet steering (RPS).  RPS
distributes the load of received packet processing across multiple CPUs.

Problem statement: Protocol processing done in the NAPI context for received
packets is serialized per device queue and becomes a bottleneck under high
packet load.  This substantially limits pps that can be achieved on a single
queue NIC and provides no scaling with multiple cores.

This solution queues packets early on in the receive path on the backlog queues
of other CPUs.   This allows protocol processing (e.g. IP and TCP) to be
performed on packets in parallel.   For each device (or each receive queue in
a multi-queue device) a mask of CPUs is set to indicate the CPUs that can
process packets. A CPU is selected on a per packet basis by hashing contents
of the packet header (e.g. the TCP or UDP 4-tuple) and using the result to index
into the CPU mask.  The IPI mechanism is used to raise networking receive
softirqs between CPUs.  This effectively emulates in software what a multi-queue
NIC can provide, but is generic requiring no device support.

Many devices now provide a hash over the 4-tuple on a per packet basis
(e.g. the Toeplitz hash).  This patch allow drivers to set the HW reported hash
in an skb field, and that value in turn is used to index into the RPS maps.
Using the HW generated hash can avoid cache misses on the packet when
steering it to a remote CPU.

The CPU mask is set on a per device and per queue basis in the sysfs variable
/sys/class/net/<device>/queues/rx-<n>/rps_cpus.  This is a set of canonical
bit maps for receive queues in the device (numbered by <n>).  If a device
does not support multi-queue, a single variable is used for the device (rx-0).

Generally, we have found this technique increases pps capabilities of a single
queue device with good CPU utilization.  Optimal settings for the CPU mask
seem to depend on architectures and cache hierarcy.  Below are some results
running 500 instances of netperf TCP_RR test with 1 byte req. and resp.
Results show cumulative transaction rate and system CPU utilization.

e1000e on 8 core Intel
   Without RPS: 108K tps at 33% CPU
   With RPS:    311K tps at 64% CPU

forcedeth on 16 core AMD
   Without RPS: 156K tps at 15% CPU
   With RPS:    404K tps at 49% CPU

bnx2x on 16 core AMD
   Without RPS  567K tps at 61% CPU (4 HW RX queues)
   Without RPS  738K tps at 96% CPU (8 HW RX queues)
   With RPS:    854K tps at 76% CPU (4 HW RX queues)

Caveats:
- The benefits of this patch are dependent on architecture and cache hierarchy.
Tuning the masks to get best performance is probably necessary.
- This patch adds overhead in the path for processing a single packet.  In
a lightly loaded server this overhead may eliminate the advantages of
increased parallelism, and possibly cause some relative performance degradation.
We have found that masks that are cache aware (share same caches with
the interrupting CPU) mitigate much of this.
- The RPS masks can be changed dynamically, however whenever the mask is changed
this introduces the possibility of generating out of order packets.  It's
probably best not change the masks too frequently.
Signed-off-by: NTom Herbert <therbert@google.com>

 include/linux/netdevice.h |   32 ++++-
 include/linux/skbuff.h    |    3 +
 net/core/dev.c            |  335 +++++++++++++++++++++++++++++++++++++--------
 net/core/net-sysfs.c      |  225 ++++++++++++++++++++++++++++++-
 net/core/skbuff.c         |    2 +
 5 files changed, 538 insertions(+), 59 deletions(-)
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a9627f2

RDS: Enable per-cpu workqueue threads · 768bbedf

由 Tina Yang 提交于 3月 11, 2010

Create per-cpu workqueue threads instead of a single
krdsd thread. This is a step towards better scalability.
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

768bbedf

RDS: Do not call set_page_dirty() with irqs off · 561c7df6

由 Andy Grover 提交于 3月 11, 2010

set_page_dirty() unconditionally re-enables interrupts, so
if we call it with irqs off, they will be on after the call,
and that's bad. This patch moves the call after we've re-enabled
interrupts in send_drop_to(), so it's safe.

Also, add BUG_ONs to let us know if we ever do call set_page_dirty
with interrupts off.
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

561c7df6

RDS: Properly unmap when getting a remote access error · 450d06c0

由 Sherman Pun 提交于 3月 11, 2010

If the RDMA op has aborted with a remote access error,
in addition to what we already do (tell userspace it has
completed with an error) also unmap it and put() the rm.

Otherwise, hangs may occur on arches that track maps and
will not exit without proper cleanup.
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

450d06c0

RDS: only put sockets that have seen congestion on the poll_waitq · b98ba52f

由 Andy Grover 提交于 3月 11, 2010

rds_poll_waitq's listeners will be awoken if we receive a congestion
notification. Bad performance may result because *all* polled sockets
contend for this single lock. However, it should not be necessary to
wake pollers when a congestion update arrives if they have never
experienced congestion, and not putting these on the waitq will
hopefully greatly reduce contention.
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b98ba52f

RDS: Fix locking in rds_send_drop_to() · 550a8002

由 Tina Yang 提交于 3月 11, 2010

It seems rds_send_drop_to() called
__rds_rdma_send_complete(rs, rm, RDS_RDMA_CANCELED)
with only rds_sock lock, but not rds_message lock. It raced with
other threads that is attempting to modify the rds_message as well,
such as from within rds_rdma_send_complete().
Signed-off-by: NTina Yang <tina.yang@oracle.com>
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

550a8002

RDS: Turn down alarming reconnect messages · 97069788

由 Andy Grover 提交于 3月 11, 2010

RDS's error messages when a connection goes down are a little
extreme. A connection may go down, and it will be re-established,
and everything is fine. This patch links these messages through
rdsdebug(), instead of to printk directly.
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

97069788

RDS: Workaround for in-use MRs on close causing crash · 571c02fa

由 Andy Grover 提交于 3月 11, 2010

if a machine is shut down without closing sockets properly, and
freeing all MRs, then a BUG_ON will bring it down. This patch
changes these to WARN_ONs -- leaking MRs is not fatal (although
not ideal, and there is more work to do here for a proper fix.)
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

571c02fa

RDS: Fix send locking issue · 048c15e6

由 Tina Yang 提交于 3月 11, 2010

Fix a deadlock between rds_rdma_send_complete() and
rds_send_remove_from_sock() when rds socket lock and
rds message lock are acquired out-of-order.
Signed-off-by: NTina Yang <Tina.Yang@oracle.com>
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

048c15e6

RDS: Fix congestion issues for loopback · 2e7b3b99

由 Andy Grover 提交于 3月 11, 2010

We have two kinds of loopback: software (via loop transport)
and hardware (via IB). sw is used for 127.0.0.1, and doesn't
support rdma ops. hw is used for sends to local device IPs,
and supports rdma. Both are used in different cases.

For both of these, when there is a congestion map update, we
want to call rds_cong_map_updated() but not actually send
anything -- since loopback local and foreign congestion maps
point to the same spot, they're already in sync.

The old code never called sw loop's xmit_cong_map(),so
rds_cong_map_updated() wasn't being called for it. sw loop
ports would not work right with the congestion monitor.

Fixing that meant that hw loopback now would send congestion maps
to itself. This is also undesirable (racy), so we check for this
case in the ib-specific xmit code.
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2e7b3b99

RDS/TCP: Wait to wake thread when write space available · 8e82376e

由 Andy Grover 提交于 3月 11, 2010

Instead of waking the send thread whenever any send space is available,
wait until it is at least half empty. This is modeled on how
sock_def_write_space() does it, and may help to minimize context
switches.
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8e82376e

RDS: update copy_to_user state in tcp transport · b075cfdb

由 Andy Grover 提交于 3月 11, 2010

Other transports use rds_page_copy_user, which updates our
s_copy_to_user counter. TCP doesn't, so it needs to explicity
call rds_stats_add().
Reported-by: NRichard Frank <richard.frank@oracle.com>
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b075cfdb

RDS: sendmsg() should check sndtimeo, not rcvtimeo · 1123fd73

由 Andy Grover 提交于 3月 11, 2010

Most likely cut n paste error - sendmsg() was checking sock_rcvtimeo.
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1123fd73

RDS: Do not BUG() on error returned from ib_post_send · 735f61e6

由 Andy Grover 提交于 3月 11, 2010

BUGging on a runtime error code should be avoided. This
patch also eliminates all other BUG()s that have no real
reason to exist.
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

735f61e6

bridge: Make first arg to deliver_clone const. · 87faf3cc

由 David S. Miller 提交于 3月 16, 2010

Otherwise we get a warning from the call in br_forward().
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

87faf3cc

bridge br_multicast: Don't refer to BR_INPUT_SKB_CB(skb)->mrouters_only without IGMP snooping. · 32dec5dd

由 YOSHIFUJI Hideaki / 吉藤英明提交于 3月 15, 2010

Without CONFIG_BRIDGE_IGMP_SNOOPING,
BR_INPUT_SKB_CB(skb)->mrouters_only is not appropriately
initialized, so we can see garbage.

A clear option to fix this is to set it even without that
config, but we cannot optimize out the branch.

Let's introduce a macro that returns value of mrouters_only
and let it return 0 without CONFIG_BRIDGE_IGMP_SNOOPING.
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

32dec5dd

route: Fix caught BUG_ON during rt_secret_rebuild_oneshot() · 858a18a6

由 Vitaliy Gusev 提交于 3月 16, 2010

route: Fix caught BUG_ON during rt_secret_rebuild_oneshot()

Call rt_secret_rebuild can cause BUG_ON(timer_pending(&net->ipv4.rt_secret_timer)) in
add_timer as there is not any synchronization for call rt_secret_rebuild_oneshot()
for the same net namespace.

Also this issue affects to rt_secret_reschedule().

Thus use mod_timer enstead.
Signed-off-by: NVitaliy Gusev <vgusev@openvz.org>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

858a18a6

Y
bridge br_multicast: Fix skb leakage in error path. · 8440853b
由 YOSHIFUJI Hideaki / 吉藤英明提交于 3月 15, 2010
```
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
8440853b
Y
bridge br_multicast: Fix handling of Max Response Code in IGMPv3 message. · 0ba8c9ec
由 YOSHIFUJI Hideaki / 吉藤英明提交于 3月 15, 2010
```
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
0ba8c9ec

NET: netpoll, fix potential NULL ptr dereference · 21edbb22

由 Jiri Slaby 提交于 3月 16, 2010

Stanse found that one error path in netpoll_setup dereferences npinfo
even though it is NULL. Avoid that by adding new label and go to that
instead.
Signed-off-by: NJiri Slaby <jslaby@suse.cz>
Cc: Daniel Borkmann <danborkmann@googlemail.com>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: chavey@google.com
Acked-by: NMatt Mackall <mpm@selenic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

21edbb22

tipc: fix lockdep warning on address assignment · a2f46ee1

由 Neil Horman 提交于 3月 16, 2010

So in the forward porting of various tipc packages, I was constantly
getting this lockdep warning everytime I used tipc-config to set a network
address for the protocol:

[ INFO: possible circular locking dependency detected ]
2.6.33 #1
tipc-config/1326 is trying to acquire lock:
(ref_table_lock){+.-...}, at: [<ffffffffa0315148>] tipc_ref_discard+0x53/0xd4 [tipc]

but task is already holding lock:
(&(&entry->lock)->rlock#2){+.-...}, at: [<ffffffffa03150d5>] tipc_ref_lock+0x43/0x63 [tipc]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&(&entry->lock)->rlock#2){+.-...}:
[<ffffffff8107b508>] __lock_acquire+0xb67/0xd0f
[<ffffffff8107b78c>] lock_acquire+0xdc/0x102
[<ffffffff8145471e>] _raw_spin_lock_bh+0x3b/0x6e
[<ffffffffa03152b1>] tipc_ref_acquire+0xe8/0x11b [tipc]
[<ffffffffa031433f>] tipc_createport_raw+0x78/0x1b9 [tipc]
[<ffffffffa031450b>] tipc_createport+0x8b/0x125 [tipc]
[<ffffffffa030f221>] tipc_subscr_start+0xce/0x126 [tipc]
[<ffffffffa0308fb2>] process_signal_queue+0x47/0x7d [tipc]
[<ffffffff81053e0c>] tasklet_action+0x8c/0xf4
[<ffffffff81054bd8>] __do_softirq+0xf8/0x1cd
[<ffffffff8100aadc>] call_softirq+0x1c/0x30
[<ffffffff810549f4>] _local_bh_enable_ip+0xb8/0xd7
[<ffffffff81054a21>] local_bh_enable_ip+0xe/0x10
[<ffffffff81454d31>] _raw_spin_unlock_bh+0x34/0x39
[<ffffffffa0308eb8>] spin_unlock_bh.clone.0+0x15/0x17 [tipc]
[<ffffffffa0308f47>] tipc_k_signal+0x8d/0xb1 [tipc]
[<ffffffffa0308dd9>] tipc_core_start+0x8a/0xad [tipc]
[<ffffffffa01b1087>] 0xffffffffa01b1087
[<ffffffff8100207d>] do_one_initcall+0x72/0x18a
[<ffffffff810872fb>] sys_init_module+0xd8/0x23a
[<ffffffff81009b42>] system_call_fastpath+0x16/0x1b

-> #0 (ref_table_lock){+.-...}:
[<ffffffff8107b3b2>] __lock_acquire+0xa11/0xd0f
[<ffffffff8107b78c>] lock_acquire+0xdc/0x102
[<ffffffff81454836>] _raw_write_lock_bh+0x3b/0x6e
[<ffffffffa0315148>] tipc_ref_discard+0x53/0xd4 [tipc]
[<ffffffffa03141ee>] tipc_deleteport+0x40/0x119 [tipc]
[<ffffffffa0316e35>] release+0xeb/0x137 [tipc]
[<ffffffff8139dbf4>] sock_release+0x1f/0x6f
[<ffffffff8139dc6b>] sock_close+0x27/0x2b
[<ffffffff811116f6>] __fput+0x12a/0x1df
[<ffffffff811117c5>] fput+0x1a/0x1c
[<ffffffff8110e49b>] filp_close+0x68/0x72
[<ffffffff8110e552>] sys_close+0xad/0xe7
[<ffffffff81009b42>] system_call_fastpath+0x16/0x1b

Finally decided I should fix this.  Its a straightforward inversion,
tipc_ref_acquire takes two locks in this order:
ref_table_lock
entry->lock

while tipc_deleteport takes them in this order:
entry->lock (via tipc_port_lock())
ref_table_lock (via tipc_ref_discard())

when the same entry is referenced, we get the above warning.  The fix is equally
straightforward.  Theres no real relation between the entry->lock and the
ref_table_lock (they just are needed at the same time), so move the entry->lock
aquisition in tipc_ref_acquire down, after we unlock ref_table_lock (this is
safe since the ref_table_lock guards changes to the reference table, and we've
already claimed a slot there.  I've tested the below fix and confirmed that it
clears up the lockdep issue
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
CC: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a2f46ee1

l2tp: Fix UDP socket reference count bugs in the pppol2tp driver · c3259c8a

由 James Chapman 提交于 3月 16, 2010

This patch fixes UDP socket refcnt bugs in the pppol2tp driver.

A bug can cause a kernel stack trace when a tunnel socket is closed.

A way to reproduce the issue is to prepare the UDP socket for L2TP (by
opening a tunnel pppol2tp socket) and then close it before any L2TP
sessions are added to it. The sequence is

Create UDP socket
Create tunnel pppol2tp socket to prepare UDP socket for L2TP
  pppol2tp_connect: session_id=0, peer_session_id=0
L2TP SCCRP control frame received (tunnel_id==0)
  pppol2tp_recv_core: sock_hold()
  pppol2tp_recv_core: sock_put
L2TP ZLB control frame received (tunnel_id=nnn)
  pppol2tp_recv_core: sock_hold()
  pppol2tp_recv_core: sock_put
Close tunnel management socket
  pppol2tp_release: session_id=0, peer_session_id=0
Close UDP socket
  udp_lib_close: BUG

The addition of sock_hold() in pppol2tp_connect() solves the problem.

For data frames, two sock_put() calls were added to plug a refcnt leak
per received data frame. The ref that is grabbed at the top of
pppol2tp_recv_core() must always be released, but this wasn't done for
accepted data frames or data frames discarded because of bad UDP
checksums. This leak meant that any UDP socket that had passed L2TP
data traffic (i.e. L2TP data frames, not just L2TP control frames)
using pppol2tp would not be released by the kernel.

WARNING: at include/net/sock.h:435 udp_lib_unhash+0x117/0x120()
Pid: 1086, comm: openl2tpd Not tainted 2.6.33-rc1 #8
Call Trace:
 [<c119e9b7>] ? udp_lib_unhash+0x117/0x120
 [<c101b871>] ? warn_slowpath_common+0x71/0xd0
 [<c119e9b7>] ? udp_lib_unhash+0x117/0x120
 [<c101b8e3>] ? warn_slowpath_null+0x13/0x20
 [<c119e9b7>] ? udp_lib_unhash+0x117/0x120
 [<c11598a7>] ? sk_common_release+0x17/0x90
 [<c11a5e33>] ? inet_release+0x33/0x60
 [<c11577b0>] ? sock_release+0x10/0x60
 [<c115780f>] ? sock_close+0xf/0x30
 [<c106e542>] ? __fput+0x52/0x150
 [<c106b68e>] ? filp_close+0x3e/0x70
 [<c101d2e2>] ? put_files_struct+0x62/0xb0
 [<c101eaf7>] ? do_exit+0x5e7/0x650
 [<c1081623>] ? mntput_no_expire+0x13/0x70
 [<c106b68e>] ? filp_close+0x3e/0x70
 [<c101eb8a>] ? do_group_exit+0x2a/0x70
 [<c101ebe1>] ? sys_exit_group+0x11/0x20
 [<c10029b0>] ? sysenter_do_call+0x12/0x26
Signed-off-by: NJames Chapman <jchapman@katalix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c3259c8a

smsc95xx: wait for PHY to complete reset during init · db443c44

由 Steve Glendinning 提交于 3月 16, 2010

This patch ensures the PHY correctly completes its reset before
setting register values.
Signed-off-by: NSteve Glendinning <steve.glendinning@smsc.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

db443c44

l2tp: Fix oops in pppol2tp_xmit · 3feec909

由 James Chapman 提交于 3月 16, 2010

When transmitting L2TP frames, we derive the outgoing interface's UDP
checksum hardware assist capabilities from the tunnel dst dev. This
can sometimes be NULL, especially when routing protocols are used and
routing changes occur. This patch just checks for NULL dst or dev
pointers when checking for netdev hardware assist features.

BUG: unable to handle kernel NULL pointer dereference at 0000000c
IP: [<f89d074c>] pppol2tp_xmit+0x341/0x4da [pppol2tp]
*pde = 00000000
Oops: 0000 [#1] SMP
last sysfs file: /sys/class/net/lo/operstate
Modules linked in: pppol2tp pppox ppp_generic slhc ipv6 dummy loop snd_hda_codec_atihdmi snd_hda_intel snd_hda_codec snd_pcm snd_timer snd soundcore snd_page_alloc evdev psmouse serio_raw processor button i2c_piix4 i2c_core ati_agp agpgart pcspkr ext3 jbd mbcache sd_mod ide_pci_generic atiixp ide_core ahci ata_generic floppy ehci_hcd ohci_hcd libata e1000e scsi_mod usbcore nls_base thermal fan thermal_sys [last unloaded: scsi_wait_scan]

Pid: 0, comm: swapper Not tainted (2.6.32.8 #1)
EIP: 0060:[<f89d074c>] EFLAGS: 00010297 CPU: 3
EIP is at pppol2tp_xmit+0x341/0x4da [pppol2tp]
EAX: 00000000 EBX: f64d1680 ECX: 000005b9 EDX: 00000000
ESI: f6b91850 EDI: f64d16ac EBP: f6a0c4c0 ESP: f70a9cac
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process swapper (pid: 0, ti=f70a8000 task=f70a31c0 task.ti=f70a8000)
Stack:
 000005a9 000005b9 f734c400 f66652c0 f7352e00 f67dc800 00000000 f6b91800
<0> 000005a3 f70ef6c4 f67dcda9 000005a3 f89b192e 00000246 000005a3 f64d1680
<0> f63633e0 f6363320 f64d1680 f65a7320 f65a7364 f65856c0 f64d1680 f679f02f
Call Trace:
 [<f89b192e>] ? ppp_push+0x459/0x50e [ppp_generic]
 [<f89b217f>] ? ppp_xmit_process+0x3b6/0x430 [ppp_generic]
 [<f89b2306>] ? ppp_start_xmit+0x10d/0x120 [ppp_generic]
 [<c11c15cb>] ? dev_hard_start_xmit+0x21f/0x2b2
 [<c11d0947>] ? sch_direct_xmit+0x48/0x10e
 [<c11c19a0>] ? dev_queue_xmit+0x263/0x3a6
 [<c11e2a9f>] ? ip_finish_output+0x1f7/0x221
 [<c11df682>] ? ip_forward_finish+0x2e/0x30
 [<c11de645>] ? ip_rcv_finish+0x295/0x2a9
 [<c11c0b19>] ? netif_receive_skb+0x3e9/0x404
 [<f814b791>] ? e1000_clean_rx_irq+0x253/0x2fc [e1000e]
 [<f814cb7a>] ? e1000_clean+0x63/0x1fc [e1000e]
 [<c1047eff>] ? sched_clock_local+0x15/0x11b
 [<c11c1095>] ? net_rx_action+0x96/0x195
 [<c1035750>] ? __do_softirq+0xaa/0x151
 [<c1035828>] ? do_softirq+0x31/0x3c
 [<c10358fe>] ? irq_exit+0x26/0x58
 [<c1004b21>] ? do_IRQ+0x78/0x89
 [<c1003729>] ? common_interrupt+0x29/0x30
 [<c101ac28>] ? native_safe_halt+0x2/0x3
 [<c1008c54>] ? default_idle+0x55/0x75
 [<c1009045>] ? c1e_idle+0xd2/0xd5
 [<c100233c>] ? cpu_idle+0x46/0x62
Code: 8d 45 08 f0 ff 45 08 89 6b 08 c7 43 68 7e fb 9c f8 8a 45 24 83 e0 0c 3c 04 75 09 80 63 64 f3 e9 b4 00 00 00 8b 43 18 8b 4c 24 04 <8b> 40 0c 8d 79 11 f6 40 44 0e 8a 43 64 75 51 6a 00 8b 4c 24 08
EIP: [<f89d074c>] pppol2tp_xmit+0x341/0x4da [pppol2tp] SS:ESP 0068:f70a9cac
CR2: 000000000000000c
Signed-off-by: NJames Chapman <jchapman@katalix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3feec909

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功