提交 · 555353cfa1aee293de445bfa6de43276138ddd82 · openanolis / cloud-kernel

09 7月, 2008 4 次提交

netdev: The ingress_lock member is no longer needed. · 555353cf

由 David S. Miller 提交于 7月 08, 2008

Every qdisc is assosciated with a queue, and in the case of ingress
qdiscs that will now be netdev->rx_queue so using that queue's lock is
the thing to do.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

555353cf

netdev: Move queue_lock into struct netdev_queue. · dc2b4847

由 David S. Miller 提交于 7月 08, 2008

The lock is now an attribute of the device queue.

One thing to notice is that "suspicious" places
emerge which will need specific training about
multiple queue handling.  They are so marked with
explicit "netdev->rx_queue" and "netdev->tx_queue"
references.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc2b4847

pkt_sched: Remove 'dev' member of struct Qdisc. · 5ce2d488

由 David S. Miller 提交于 7月 08, 2008

It can be obtained via the netdev_queue. So create a helper routine,
qdisc_dev(), to make the transformations nicer looking.

Now, qdisc_alloc() now no longer needs a net_device pointer argument.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5ce2d488

netdev: Create netdev_queue abstraction. · bb949fbd

由 David S. Miller 提交于 7月 08, 2008

A netdev_queue is an entity managed by a qdisc.

Currently there is one RX and one TX queue, and a netdev_queue merely
contains a backpointer to the net_device.

The Qdisc struct is augmented with a netdev_queue pointer as well.

Eventually the 'dev' Qdisc member will go away and we will have the
resulting hierarchy:

	net_device --> netdev_queue --> Qdisc

Also, qdisc_alloc() and qdisc_create_dflt() now take a netdev_queue
pointer argument.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bb949fbd

28 6月, 2008 1 次提交

pkt_sched: ERR_PTR() ususally encodes an negative errno, not positive. · 01e123d7

由 WANG Cong 提交于 6月 27, 2008

Note, in the following patch, 'err' is initialized as:

int err = -ENOBUFS;
Signed-off-by: NWANG Cong <wcong@critical-links.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

01e123d7

03 5月, 2008 1 次提交

net: Add a WARN_ON_ONCE() to the transmit timeout function · b4192bbd

由 Arjan van de Ven 提交于 5月 02, 2008

WARN_ON_ONCE() gives a stack trace including the full module list.
Having this in the kernel dump for the timeout case in the
generic netdev watchdog will help us see quicker which driver
is involved. It also allows us to collect statistics 
and patterns in terms of which drivers have this event occuring.

Suggested by Andrew Morton
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b4192bbd

29 3月, 2008 1 次提交

[NET]: Add preemption point in qdisc_run · 2ba2506c

由 Herbert Xu 提交于 3月 28, 2008

The qdisc_run loop is currently unbounded and runs entirely in a
softirq.  This is bad as it may create an unbounded softirq run.

This patch fixes this by calling need_resched and breaking out if
necessary.

It also adds a break out if the jiffies value changes since that would
indicate we've been transmitting for too long which starves other
softirqs.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2ba2506c

29 1月, 2008 5 次提交

[NET_SCHED]: Convert packet schedulers from rtnetlink to new netlink API · 1e90474c

由 Patrick McHardy 提交于 1月 22, 2008

Convert packet schedulers to use the netlink API. Unfortunately a gradual
conversion is not possible without breaking compilation in the middle or
adding lots of casts, so this patch converts them all in one step. The
patch has been mostly generated automatically with some minor edits to
at least allow seperate conversion of classifiers and actions.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1e90474c

P
[NET_SCHED]: Move EXPORT_SYMBOL next to exported symbol · 62e3ba1b
由 Patrick McHardy 提交于 1月 22, 2008
```
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
62e3ba1b

[NET]: Add some acquires/releases sparse annotations. · 9a429c49

由 Eric Dumazet 提交于 1月 01, 2008

Add __acquires() and __releases() annotations to suppress some sparse
warnings.

example of warnings :

net/ipv4/udp.c:1555:14: warning: context imbalance in 'udp_seq_start' - wrong
count at exit
net/ipv4/udp.c:1571:13: warning: context imbalance in 'udp_seq_stop' -
unexpected unlock
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a429c49

[NET]: Move Qdisc_class_ops and Qdisc_ops in appropriate sections. · 20fea08b

由 Eric Dumazet 提交于 11月 14, 2007

Qdisc_class_ops are const, and Qdisc_ops are mostly read.

Using "const" and "__read_mostly" qualifiers helps to reduce false
sharing.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

20fea08b

[NET]: Convert init_timer into setup_timer · b24b8a24

由 Pavel Emelyanov 提交于 1月 23, 2008

Many-many code in the kernel initialized the timer->function
and  timer->data together with calling init_timer(timer). There
is already a helper for this. Use it for networking code.

The patch is HUGE, but makes the code 130 lines shorter
(98 insertions(+), 228 deletions(-)).
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b24b8a24

14 11月, 2007 1 次提交

[PKT_SCHED]: Check subqueue status before calling hard_start_xmit · 5f1a485d

由 Peter P Waskiewicz Jr 提交于 11月 13, 2007

The only qdiscs that check subqueue state before dequeue'ing are PRIO
and RR. The other qdiscs, including the default pfifo_fast qdisc,
will allow traffic bound for subqueue 0 through to hard_start_xmit.
The check for netif_queue_stopped() is done above in pkt_sched.h, so
it is unnecessary for qdisc_restart(). However, if the underlying
driver is multiqueue capable, and only sets queue states on subqueues,
this will allow packets to enter the driver when it's currently unable
to process packets, resulting in expensive requeues and driver
entries. This patch re-adds the check for the subqueue status before
calling hard_start_xmit, so we can try and avoid the driver entry when
the queues are stopped.
Signed-off-by: NPeter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f1a485d

19 10月, 2007 1 次提交

[NET]: Fix possible dev_deactivate race condition · ce0e32e6

由 Herbert Xu 提交于 10月 18, 2007

The function dev_deactivate is supposed to only return when
all outstanding transmissions have completed.  Unfortunately
it is possible for store operations in the driver's transmit
function to only become visible after dev_deactivate returns.

This patch fixes this by taking the queue lock after we see
the end of the queue run.  This ensures that all effects of
any previous transmit calls are visible.

If however we detect that there is another queue run occuring,
then we'll warn about it because this should never happen as
we have pointed dev->qdisc to noop_qdisc within the same queue
lock earlier in the functino.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce0e32e6

18 10月, 2007 1 次提交

[NET]: fix carrier-on bug? · bfaae0f0

由 Jeff Garzik 提交于 10月 17, 2007

While looking at a net driver with the following construct,

	if (!netif_carrier_ok(dev))
		netif_carrier_on(dev);

it stuck me that the netif_carrier_ok() check was redundant, since
netif_carrier_on() checks bit __LINK_STATE_NOCARRIER anyway.  This is
the same reason why netif_queue_stopped() need not be called prior to
netif_wake_queue().

This is true, but there is however an unwanted side effect from assuming
that netif_carrier_on() can be called multiple times:  it touches the
watchdog, regardless of pre-existing carrier state.

The fix:  move watchdog-up inside the bit-cleared code path.
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bfaae0f0

11 10月, 2007 2 次提交

[NET_SCHED]: explict hold dev tx lock · 8236632f

由 Jamal Hadi Salim 提交于 9月 25, 2007

For N cpus, with full throttle traffic on all N CPUs, funneling traffic
to the same ethernet device, the devices queue lock is contended by all
N CPUs constantly. The TX lock is only contended by a max of 2 CPUS.
In the current mode of operation, after all the work of entering the
dequeue region, we may endup aborting the path if we are unable to get
the tx lock and go back to contend for the queue lock. As N goes up,
this gets worse.

The changes in this patch result in a small increase in performance
with a 4CPU (2xdual-core) with no irq binding. Both e1000 and tg3
showed similar behavior;
Signed-off-by: NJamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8236632f

[NET]: Make NAPI polling independent of struct net_device objects. · bea3348e

由 Stephen Hemminger 提交于 10月 03, 2007

Several devices have multiple independant RX queues per net
device, and some have a single interrupt doorbell for several
queues.

In either case, it's easier to support layouts like that if the
structure representing the poll is independant from the net
device itself.

The signature of the ->poll() call back goes from:

	int foo_poll(struct net_device *dev, int *budget)

to

	int foo_poll(struct napi_struct *napi, int budget)

The caller is returned the number of RX packets processed (or
the number of "NAPI credits" consumed if you want to get
abstract).  The callee no longer messes around bumping
dev->quota, *budget, etc. because that is all handled in the
caller upon return.

The napi_struct is to be embedded in the device driver private data
structures.

Furthermore, it is the driver's responsibility to disable all NAPI
instances in it's ->stop() device close handler.  Since the
napi_struct is privatized into the driver's private data structures,
only the driver knows how to get at all of the napi_struct instances
it may have per-device.

With lots of help and suggestions from Rusty Russell, Roland Dreier,
Michael Chan, Jeff Garzik, and Jamal Hadi Salim.

Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra,
Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan.

[ Ported to current tree and all drivers converted.  Integrated
  Stephen's follow-on kerneldoc additions, and restored poll_list
  handling to the old style to fix mutual exclusion issues.  -DaveM ]
Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bea3348e

11 7月, 2007 5 次提交

[NET_SCHED]: Remove unnecessary includes · 0ba48053

由 Patrick McHardy 提交于 7月 02, 2007

Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ba48053

[NET_SCHED]: Remove CONFIG_NET_ESTIMATOR option · 876d48aa

由 Patrick McHardy 提交于 7月 02, 2007

The generic estimator is always built in anways and all the config options
does is prevent including a minimal amount of code for setting it up.
Additionally the option is already automatically selected for most cases.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

876d48aa

[NET]: qdisc_restart - couple of optimizations. · e50c41b5

由 Krishna Kumar 提交于 6月 24, 2007

Changes :

- netif_queue_stopped need not be called inside qdisc_restart as
  it has been called already in qdisc_run() before the first skb
  is sent, and in __qdisc_run() after each intermediate skb is
  sent (note : we are the only sender, so the queue cannot get
  stopped while the tx lock was got in the ~LLTX case).

- BUG_ON((int) q->q.qlen < 0) was a relic from old times when -1
  meant more packets are available, and __qdisc_run used to loop
  when qdisc_restart() returned -1. During those days, it was
  necessary to make sure that qlen is never less than zero, since
  __qdisc_run would get into an infinite loop if no packets are on
  the queue and this bug in qdisc was there (and worse - no more
  skbs could ever get queue'd as we hold the queue lock too). With
  Herbert's recent change to return values, this check is not
  required.  Hopefully Herbert can validate this change. If at all
  this is required, it should be added to skb_dequeue (in failure
  case), and not to qdisc_qlen.
Signed-off-by: NKrishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e50c41b5

[NET]: qdisc_restart - readability changes plus one bug fix. · 6c1361a6

由 Krishna Kumar 提交于 6月 24, 2007

New changes :

- Incorporated Peter Waskiewicz's comments.
- Re-added back one warning message (on driver returning wrong value).

Previous changes :

- Converted to use switch/case code which looks neater.

- "if (ret == NETDEV_TX_LOCKED && lockless)" is buggy, and the lockless
  check should be removed, since driver will return NETDEV_TX_LOCKED only
  if lockless is true and driver has to do the locking. In the original
  code as well as the latest code, this code can result in a bug where
  if LLTX is not set for a driver (lockless == 0) but the driver is written
  wrongly to do a trylock (despite LLTX being set), the driver returns
  LOCKED. But since lockless is zero, the packet is requeue'd instead of
  calling collision code which will issue warning and free up the skb.
  Instead this skb will be retried with this driver next time, and the same
  result will ensue. Removing this check will catch these driver bugs instead
  of hiding the problem. I am keeping this change to readability section
  since :
  	a. it is confusing to check two things as it is; and
  	b. it is difficult to keep this check in the changed 'switch' code.

- Changed some names, like try_get_tx_pkt to dev_dequeue_skb (as that is
  the work being done and easier to understand) and do_dev_requeue to
  dev_requeue_skb, merged handle_dev_cpu_collision and tx_islocked to
  dev_handle_collision (handle_dev_cpu_collision is a small routine with only
  one caller, so there is no need to have two separate routines which also
  results in getting rid of two macros, etc.

- Removed an XXX comment as it should never fail (I suspect this was related
  to batch skb WIP, Jamal ?). Converted some functions to original coding
  style of having the return values and the function name on same line, eg
  prio2list.
Signed-off-by: NKrishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c1361a6

[NET_SCHED]: Cleanup readability of qdisc restart · c716a81a

由 Jamal Hadi Salim 提交于 6月 10, 2007

Over the years this code has gotten hairier. Resulting in many long
discussions over long summer days and patches that get it wrong.
This patch helps tame that code so normal people will understand it.

Thanks to Thomas Graf, Peter J. waskiewicz Jr, and Patrick McHardy
for their valuable reviews.
Signed-off-by: NJamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c716a81a

04 6月, 2007 1 次提交

[NET]: Make net watchdog timers 1 sec jiffy aligned. · 60468d5b

由 Venkatesh Pallipadi 提交于 5月 31, 2007

round_jiffies for net dev watchdog timer.
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60468d5b

25 5月, 2007 1 次提交

[NET_SCHED]: Fix qdisc_restart return value when dequeue is empty · 36247f54

由 Herbert Xu 提交于 5月 23, 2007

My previous patch that changed the return value of qdisc_restart
incorrectly made the case where dequeue returns empty continue
processing packets.

This patch is based on diagnosis and fix by Patrick McHardy.
Reported-and-debugged-by: NAnant Nitya <kernel@prachanda.info>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

36247f54

11 5月, 2007 4 次提交

[NET_SCHED]: Avoid requeue warning on dev_deactivate · 41a23b07

由 Herbert Xu 提交于 5月 10, 2007

When we relinquish queue_lock in qdisc_restart and then retake it for
requeueing, we might race against dev_deactivate and end up requeueing
onto noop_qdisc.  This causes a warning to be printed.

This patch fixes this by checking this before we requeue.  As an added
bonus, we can remove the same check in __qdisc_run which was added to
prevent dev->gso_skb from being requeued when we're shutting down.

Even though we've had to add a new conditional in its place, it's better
because it only happens on requeues rather than every single time that
qdisc_run is called.

For this to work we also need to move the clearing of gso_skb up in
dev_deactivate as now qdisc_restart can occur even after we wait for
__LINK_STATE_QDISC_RUNNING to clear (but it won't do anything as long
as the queue and gso_skb is already clear).
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

41a23b07

[NET_SCHED]: Reread dev->qdisc for NETDEV_TX_OK · cce1fa36

由 Herbert Xu 提交于 5月 10, 2007

Now that we return the queue length after NETDEV_TX_OK we better
make sure that we have the right queue.  Otherwise we can cause a
stall after a really quick dev_deactive/dev_activate.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cce1fa36

[NET_SCHED]: Rationalise return value of qdisc_restart · d90df3ad

由 Herbert Xu 提交于 5月 10, 2007

The current return value scheme and associated comment was invented
back in the 20th century when we still had that tbusy flag.  Things
have changed quite a bit since then (even Tony Blair is moving on
now, not to mention the new French president).

All we need to indicate now is whether the caller should continue
processing the queue.  Therefore it's sufficient if we return 0 if
we want to stop and non-zero otherwise.

This is based on a patch by Krishna Kumar.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d90df3ad

[NET]: Fix dev->qdisc race for NETDEV_TX_LOCKED case · 5830725f

由 Thomas Graf 提交于 5月 10, 2007

When transmit fails with NETDEV_TX_LOCKED the skb is requeued
to dev->qdisc again. The dev->qdisc pointer is protected by
the queue lock which needs to be dropped when attempting to
transmit and acquired again before requeing. The problem is
that qdisc_restart() fetches the dev->qdisc pointer once and
stores it in the `q' variable which is invalidated when
dropping the queue_lock, therefore the variable needs to be
refreshed before requeueing.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5830725f

26 4月, 2007 2 次提交

[NET_SCHED]: ingress: switch back to using ingress_lock · fd44de7c

由 Patrick McHardy 提交于 4月 16, 2007

Switch ingress queueing back to use ingress_lock. qdisc_lock_tree now locks
both the ingress and egress qdiscs on the device. All changes to data that
might be used on both ingress and egress needs to be protected by using
qdisc_lock_tree instead of manually taking dev->queue_lock. Additionally
the qdisc stats_lock needs to be initialized to ingress_lock for ingress
qdiscs.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd44de7c

[NET_SCHED]: Eliminate qdisc_tree_lock · 0463d4ae

由 Patrick McHardy 提交于 4月 16, 2007

Since we're now holding the rtnl during the entire dump operation, we
can remove qdisc_tree_lock, whose only purpose is to protect dump
callbacks from concurrent changes to the qdisc tree.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0463d4ae

11 2月, 2007 1 次提交

[NET] SCHED: Fix whitespace errors. · 10297b99

由 YOSHIFUJI Hideaki 提交于 2月 09, 2007

Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

10297b99

09 2月, 2007 1 次提交

[NET]: user of the jiffies rounding code: Networking · f5a6e01c

由 Arjan van de Ven 提交于 2月 05, 2007

This patch introduces users of the round_jiffies() function in the
networking code.

These timers all were of the "about once a second" or "about once
every X seconds" variety and several showed up in the "what wakes the
cpu up" profiles that the tickless patches provide.  Some timers are
highly dynamic based on network load; but even on low activity systems
they still show up so the rounding is done only in cases of low
activity, allowing higher frequency timers in the high activity case.

The various hardware watchdogs are an obvious case; they run every 2
seconds but aren't otherwise specific of exactly when they need to
run.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f5a6e01c

03 12月, 2006 2 次提交

[PKT_SCHED]: Remove unused exports. · 5f68e4c0

由 Adrian Bunk 提交于 11月 30, 2006

This patch removes the following unused EXPORT_SYMBOL's:
- sch_api.c: qdisc_lookup
- sch_generic.c: __netdev_watchdog_up
- sch_generic.c: noop_qdisc_ops
- sch_generic.c: qdisc_alloc
Signed-off-by: NAdrian Bunk <bunk@stusta.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f68e4c0

[NET_SCHED]: Set parent classid in default qdiscs · 9f9afec4

由 Patrick McHardy 提交于 11月 29, 2006

Set parent classids in default qdiscs to allow walking up the tree
from outside the qdiscs. This is needed by the next patch.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f9afec4

29 9月, 2006 1 次提交

[NET_SCHED]: Fix fallout from dev->qdisc RCU change · 85670cc1

由 Patrick McHardy 提交于 9月 27, 2006

The move of qdisc destruction to a rcu callback broke locking in the
entire qdisc layer by invalidating previously valid assumptions about
the context in which changes to the qdisc tree occur.

The two assumptions were:

- since changes only happen in process context, read_lock doesn't need
  bottem half protection. Now invalid since destruction of inner qdiscs,
  classifiers, actions and estimators happens in the RCU callback unless
  they're manually deleted, resulting in dead-locks when read_lock in
  process context is interrupted by write_lock_bh in bottem half context.

- since changes only happen under the RTNL, no additional locking is
  necessary for data not used during packet processing (f.e. u32_list).
  Again, since destruction now happens in the RCU callback, this assumption
  is not valid anymore, causing races while using this data, which can
  result in corruption or use-after-free.

Instead of "fixing" this by disabling bottem halfs everywhere and adding
new locks/refcounting, this patch makes these assumptions valid again by
moving destruction back to process context. Since only the dev->qdisc
pointer is protected by RCU, but ->enqueue and the qdisc tree are still
protected by dev->qdisc_lock, destruction of the tree can be performed
immediately and only the final free needs to happen in the rcu callback
to make sure dev_queue_xmit doesn't access already freed memory.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

85670cc1

18 9月, 2006 1 次提交

[NET]: Drop tx lock in dev_watchdog_up · d7811e62

由 Herbert Xu 提交于 9月 18, 2006

Fix lockdep warning with GRE, iptables and Speedtouch ADSL, PPP over ATM.

On Sat, Sep 02, 2006 at 08:39:28PM +0000, Krzysztof Halasa wrote:
> 
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> -------------------------------------------------------
> swapper/0 is trying to acquire lock:
>  (&dev->queue_lock){-+..}, at: [<c02c8c46>] dev_queue_xmit+0x56/0x290
> 
> but task is already holding lock:
>  (&dev->_xmit_lock){-+..}, at: [<c02c8e14>] dev_queue_xmit+0x224/0x290
> 
> which lock already depends on the new lock.

This turns out to be a genuine bug.  The queue lock and xmit lock are
intentionally taken out of order.  Two things are supposed to prevent
dead-locks from occuring:

1) When we hold the queue_lock we're supposed to only do try_lock on the
tx_lock.

2) We always drop the queue_lock after taking the tx_lock and before doing
anything else.

> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (&dev->_xmit_lock){-+..}:
>        [<c012e7b6>] lock_acquire+0x76/0xa0
>        [<c0336241>] _spin_lock_bh+0x31/0x40
>        [<c02d25a9>] dev_activate+0x69/0x120

This path obviously breaks assumption 1) and therefore can lead to ABBA
dead-locks.

I've looked at the history and there seems to be no reason for the lock
to be held at all in dev_watchdog_up.  The lock appeared in day one and
even there it was unnecessary.  In fact, people added __dev_watchdog_up
precisely in order to get around the tx lock there.

The function dev_watchdog_up is already serialised by rtnl_lock since
its only caller dev_activate is always called under it.

So here is a simple patch to remove the tx lock from dev_watchdog_up.
In 2.6.19 we can eliminate the unnecessary __dev_watchdog_up and
replace it with dev_watchdog_up.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d7811e62

22 7月, 2006 1 次提交
- P
  [NET]: Conversions from kmalloc+memset to k(z|c)alloc. · 0da974f4
  由 Panagiotis Issaris 提交于 7月 21, 2006
```
Signed-off-by: NPanagiotis Issaris <takis@issaris.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  0da974f4
01 7月, 2006 1 次提交

Remove obsolete #include <linux/config.h> · 6ab3d562

由 Jörn Engel 提交于 6月 30, 2006

Signed-off-by: NJörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>

6ab3d562

23 6月, 2006 2 次提交

[NET]: Add generic segmentation offload · f6a78bfc

由 Herbert Xu 提交于 6月 22, 2006

This patch adds the infrastructure for generic segmentation offload.
The idea is to tap into the potential savings of TSO without hardware
support by postponing the allocation of segmented skb's until just
before the entry point into the NIC driver.

The same structure can be used to support software IPv6 TSO, as well as
UFO and segmentation offload for other relevant protocols, e.g., DCCP.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6a78bfc

[NET]: Prevent transmission after dev_deactivate · d4828d85

由 Herbert Xu 提交于 6月 22, 2006

The dev_deactivate function has bit-rotted since the introduction of
lockless drivers. In particular, the spin_unlock_wait call at the end
has no effect on the xmit routine of lockless drivers.

With a little bit of work, we can make it much more useful by providing
the guarantee that when it returns, no more calls to the xmit routine
of the underlying driver will be made.

The idea is simple. There are two entry points in to the xmit routine.
The first comes from dev_queue_xmit. That one is easily stopped by
using synchronize_rcu. This works because we set the qdisc to noop_qdisc
before the synchronize_rcu call. That in turn causes all subsequent
packets sent to dev_queue_xmit to be dropped. The synchronize_rcu call
also ensures all outstanding calls leave their critical section.

The other entry point is from qdisc_run. Since we now have a bit that
indicates whether it's running, all we have to do is to wait until the
bit is off.

I've removed the loop to wait for __LINK_STATE_SCHED to clear. This is
useless because netif_wake_queue can cause it to be set again. It is
also harmless because we've disarmed qdisc_run.

I've also removed the spin_unlock_wait on xmit_lock because its only
purpose of making sure that all outstanding xmit_lock holders have
exited is also given by dev_watchdog_down.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4828d85

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功