提交 · 3072367300aa8c779e3a14ee8e89de079e90f3ad · openanolis / cloud-kernel

19 7月, 2008 1 次提交

pkt_sched: Manage qdisc list inside of root qdisc. · 30723673

由 David S. Miller 提交于 7月 18, 2008

Idea is from Patrick McHardy.

Instead of managing the list of qdiscs on the device level, manage it
in the root qdisc of a netdev_queue.  This solves all kinds of
visibility issues during qdisc destruction.

The way to iterate over all qdiscs of a netdev_queue is to visit
the netdev_queue->qdisc, and then traverse it's list.

The only special case is to ignore builting qdiscs at the root when
dumping or doing a qdisc_lookup().  That was not needed previously
because builtin qdiscs were not added to the device's qdisc_list.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

30723673

18 7月, 2008 13 次提交

pkt_sched: Make default qdisc nonshared-multiqueue safe. · a0c80b80

由 David S. Miller 提交于 7月 17, 2008

Instead of 'pfifo_fast' we have just plain 'fifo_fast'.
No priority queues, just a straight FIFO.

This is necessary in order to legally have a seperate
qdisc per queue in multi-TX-queue setups, and thus get
full parallelization.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a0c80b80

pkt_sched: Kill netdev_queue lock. · 83874000

由 David S. Miller 提交于 7月 17, 2008

We can simply use the qdisc->q.lock for all of the
qdisc tree synchronization.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

83874000

D
pkt_sched: Kill qdisc_lock_tree and qdisc_unlock_tree. · c7e4f3bb
由 David S. Miller 提交于 7月 16, 2008
```
No longer used.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
c7e4f3bb
D
netdevice: Move qdisc_list back into net_device proper. · ead81cc5
由 David S. Miller 提交于 7月 17, 2008
```
And give it it's own lock.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
ead81cc5
D
pkt_sched: Use per-queue locking in shutdown_scheduler_queue. · 17715e62
由 David S. Miller 提交于 7月 16, 2008
```
This eliminates another qdisc_lock_tree user.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
17715e62

pkt_sched: Perform bulk of qdisc destruction in RCU. · 8a34c5dc

由 David S. Miller 提交于 7月 17, 2008

This allows less strict control of access to the qdisc attached to a
netdev_queue.  It is even allowed to enqueue into a qdisc which is
in the process of being destroyed.  The RCU handler will toss out
those packets.

We will need this to handle sharing of a qdisc amongst multiple
TX queues.  In such a setup the lock has to be shared, so will
be inside of the qdisc itself.  At which point the netdev_queue
lock cannot be used to hard synchronize access to the ->qdisc
pointer.

One operation we have to keep inside of qdisc_destroy() is the list
deletion.  It is the only piece of state visible after the RCU quiesce
period, so we have to undo it early and under the appropriate locking.

The operations in the RCU handler do not need any looking because the
qdisc tree is no longer visible to anything at that point.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8a34c5dc

pkt_sched: dev_init_scheduler() does not need to lock qdisc tree. · 16361127

由 David S. Miller 提交于 7月 16, 2008

We are registering the device, there is no way anyone can get
at this object's qdiscs yet in any meaningful way.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16361127

pkt_sched: Schedule qdiscs instead of netdev_queue. · 37437bb2

由 David S. Miller 提交于 7月 16, 2008

When we have shared qdiscs, packets come out of the qdiscs
for multiple transmit queues.

Therefore it doesn't make any sense to schedule the transmit
queue when logically we cannot know ahead of time the TX
queue of the SKB that the qdisc->dequeue() will give us.

Just for sanity I added a BUG check to make sure we never
get into a state where the noop_qdisc is scheduled.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

37437bb2

pkt_sched: Add and use qdisc_root() and qdisc_root_lock(). · 7698b4fc

由 David S. Miller 提交于 7月 16, 2008

When code wants to lock the qdisc tree state, the logic
operation it's doing is locking the top-level qdisc that
sits of the root of the netdev_queue.

Add qdisc_root_lock() to represent this and convert the
easiest cases.

In order for this to work out in all cases, we have to
hook up the noop_qdisc to a dummy netdev_queue.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7698b4fc

pkt_sched: Make QDISC_RUNNING a qdisc state. · e2627c8c

由 David S. Miller 提交于 7月 16, 2008

Currently it is associated with a netdev_queue, but when we have
qdisc sharing that no longer makes any sense.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e2627c8c

pkt_sched: Move gso_skb into Qdisc. · d3b753db

由 David S. Miller 提交于 7月 15, 2008

We liberate any dangling gso_skb during qdisc destruction.

It really only matters for the root qdisc.  But when qdiscs
can be shared by multiple netdev_queue objects, we can't
have the gso_skb in the netdev_queue any more.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3b753db

net: Use queue aware tests throughout. · fd2ea0a7

由 David S. Miller 提交于 7月 17, 2008

This effectively "flips the switch" by making the core networking
and multiqueue-aware drivers use the new TX multiqueue structures.

Non-multiqueue drivers need no changes.  The interfaces they use such
as netif_stop_queue() degenerate into an operation on TX queue zero.
So everything "just works" for them.

Code that really wants to do "X" to all TX queues now invokes a
routine that does so, such as netif_tx_wake_all_queues(),
netif_tx_stop_all_queues(), etc.

pktgen and netpoll required a little bit more surgery than the others.

In particular the pktgen changes, whilst functional, could be largely
improved.  The initial check in pktgen_xmit() will sometimes check the
wrong queue, which is mostly harmless.  The thing to do is probably to
invoke fill_packet() earlier.

The bulk of the netpoll changes is to make the code operate solely on
the TX queue indicated by by the SKB queue mapping.

Setting of the SKB queue mapping is entirely confined inside of
net/core/dev.c:dev_pick_tx().  If we end up needing any kind of
special semantics (drops, for example) it will be implemented here.

Finally, we now have a "real_num_tx_queues" which is where the driver
indicates how many TX queues are actually active.

With IGB changes from Jeff Kirsher.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd2ea0a7

netdev: Allocate multiple queues for TX. · e8a0464c

由 David S. Miller 提交于 7月 17, 2008

alloc_netdev_mq() now allocates an array of netdev_queue
structures for TX, based upon the queue_count argument.

Furthermore, all accesses to the TX queues are now vectored
through the netdev_get_tx_queue() and netdev_for_each_tx_queue()
interfaces.  This makes it easy to grep the tree for all
things that want to get to a TX queue of a net device.

Problem spots which are not really multiqueue aware yet, and
only work with one queue, can easily be spotted by grepping
for all netdev_get_tx_queue() calls that pass in a zero index.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e8a0464c

09 7月, 2008 11 次提交

D
netdev: Move atomic queue state bits into netdev_queue. · 79d16385
由 David S. Miller 提交于 7月 08, 2008
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
79d16385

netdev: Move _xmit_lock and xmit_lock_owner into netdev_queue. · c773e847

由 David S. Miller 提交于 7月 08, 2008

Accesses are mostly structured such that when there are multiple TX
queues the code transformations will be a little bit simpler.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c773e847

pkt_sched: Make qdisc_run take a netdev_queue. · eb6aafe3

由 David S. Miller 提交于 7月 08, 2008

This allows us to use this calling convention all the way down into
qdisc_restart().
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eb6aafe3

netdev: Make netif_schedule() routines work with netdev_queue objects. · 86d804e1

由 David S. Miller 提交于 7月 08, 2008

Only plain netif_schedule() remains taking a net_device, mostly as a
compatability item while we transition the rest of these interfaces.

Everything else calls netif_schedule_queue() or __netif_schedule(),
both of which take a netdev_queue pointer.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86d804e1

D
netdev: Move gso_skb into netdev_queue. · 970565bb
由 David S. Miller 提交于 7月 08, 2008
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
970565bb
D
pkt_sched: Kill stats_lock member of struct Qdisc. · 68dfb427
由 David S. Miller 提交于 7月 08, 2008
```
It is always equal to qdisc->dev_queue->lock
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
68dfb427
D
netdev: Move rest of qdisc state into struct netdev_queue · b0e1e646
由 David S. Miller 提交于 7月 08, 2008
```
Now qdisc, qdisc_sleeping, and qdisc_list also live there.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
b0e1e646

netdev: The ingress_lock member is no longer needed. · 555353cf

由 David S. Miller 提交于 7月 08, 2008

Every qdisc is assosciated with a queue, and in the case of ingress
qdiscs that will now be netdev->rx_queue so using that queue's lock is
the thing to do.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

555353cf

netdev: Move queue_lock into struct netdev_queue. · dc2b4847

由 David S. Miller 提交于 7月 08, 2008

The lock is now an attribute of the device queue.

One thing to notice is that "suspicious" places
emerge which will need specific training about
multiple queue handling.  They are so marked with
explicit "netdev->rx_queue" and "netdev->tx_queue"
references.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc2b4847

pkt_sched: Remove 'dev' member of struct Qdisc. · 5ce2d488

由 David S. Miller 提交于 7月 08, 2008

It can be obtained via the netdev_queue. So create a helper routine,
qdisc_dev(), to make the transformations nicer looking.

Now, qdisc_alloc() now no longer needs a net_device pointer argument.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5ce2d488

netdev: Create netdev_queue abstraction. · bb949fbd

由 David S. Miller 提交于 7月 08, 2008

A netdev_queue is an entity managed by a qdisc.

Currently there is one RX and one TX queue, and a netdev_queue merely
contains a backpointer to the net_device.

The Qdisc struct is augmented with a netdev_queue pointer as well.

Eventually the 'dev' Qdisc member will go away and we will have the
resulting hierarchy:

	net_device --> netdev_queue --> Qdisc

Also, qdisc_alloc() and qdisc_create_dflt() now take a netdev_queue
pointer argument.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bb949fbd

28 6月, 2008 1 次提交

pkt_sched: ERR_PTR() ususally encodes an negative errno, not positive. · 01e123d7

由 WANG Cong 提交于 6月 27, 2008

Note, in the following patch, 'err' is initialized as:

int err = -ENOBUFS;
Signed-off-by: NWANG Cong <wcong@critical-links.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

01e123d7

03 5月, 2008 1 次提交

net: Add a WARN_ON_ONCE() to the transmit timeout function · b4192bbd

由 Arjan van de Ven 提交于 5月 02, 2008

WARN_ON_ONCE() gives a stack trace including the full module list.
Having this in the kernel dump for the timeout case in the
generic netdev watchdog will help us see quicker which driver
is involved. It also allows us to collect statistics 
and patterns in terms of which drivers have this event occuring.

Suggested by Andrew Morton
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b4192bbd

29 3月, 2008 1 次提交

[NET]: Add preemption point in qdisc_run · 2ba2506c

由 Herbert Xu 提交于 3月 28, 2008

The qdisc_run loop is currently unbounded and runs entirely in a
softirq.  This is bad as it may create an unbounded softirq run.

This patch fixes this by calling need_resched and breaking out if
necessary.

It also adds a break out if the jiffies value changes since that would
indicate we've been transmitting for too long which starves other
softirqs.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2ba2506c

29 1月, 2008 5 次提交

[NET_SCHED]: Convert packet schedulers from rtnetlink to new netlink API · 1e90474c

由 Patrick McHardy 提交于 1月 22, 2008

Convert packet schedulers to use the netlink API. Unfortunately a gradual
conversion is not possible without breaking compilation in the middle or
adding lots of casts, so this patch converts them all in one step. The
patch has been mostly generated automatically with some minor edits to
at least allow seperate conversion of classifiers and actions.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1e90474c

P
[NET_SCHED]: Move EXPORT_SYMBOL next to exported symbol · 62e3ba1b
由 Patrick McHardy 提交于 1月 22, 2008
```
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
62e3ba1b

[NET]: Add some acquires/releases sparse annotations. · 9a429c49

由 Eric Dumazet 提交于 1月 01, 2008

Add __acquires() and __releases() annotations to suppress some sparse
warnings.

example of warnings :

net/ipv4/udp.c:1555:14: warning: context imbalance in 'udp_seq_start' - wrong
count at exit
net/ipv4/udp.c:1571:13: warning: context imbalance in 'udp_seq_stop' -
unexpected unlock
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a429c49

[NET]: Move Qdisc_class_ops and Qdisc_ops in appropriate sections. · 20fea08b

由 Eric Dumazet 提交于 11月 14, 2007

Qdisc_class_ops are const, and Qdisc_ops are mostly read.

Using "const" and "__read_mostly" qualifiers helps to reduce false
sharing.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

20fea08b

[NET]: Convert init_timer into setup_timer · b24b8a24

由 Pavel Emelyanov 提交于 1月 23, 2008

Many-many code in the kernel initialized the timer->function
and  timer->data together with calling init_timer(timer). There
is already a helper for this. Use it for networking code.

The patch is HUGE, but makes the code 130 lines shorter
(98 insertions(+), 228 deletions(-)).
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b24b8a24

14 11月, 2007 1 次提交

[PKT_SCHED]: Check subqueue status before calling hard_start_xmit · 5f1a485d

由 Peter P Waskiewicz Jr 提交于 11月 13, 2007

The only qdiscs that check subqueue state before dequeue'ing are PRIO
and RR. The other qdiscs, including the default pfifo_fast qdisc,
will allow traffic bound for subqueue 0 through to hard_start_xmit.
The check for netif_queue_stopped() is done above in pkt_sched.h, so
it is unnecessary for qdisc_restart(). However, if the underlying
driver is multiqueue capable, and only sets queue states on subqueues,
this will allow packets to enter the driver when it's currently unable
to process packets, resulting in expensive requeues and driver
entries. This patch re-adds the check for the subqueue status before
calling hard_start_xmit, so we can try and avoid the driver entry when
the queues are stopped.
Signed-off-by: NPeter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f1a485d

19 10月, 2007 1 次提交

[NET]: Fix possible dev_deactivate race condition · ce0e32e6

由 Herbert Xu 提交于 10月 18, 2007

The function dev_deactivate is supposed to only return when
all outstanding transmissions have completed.  Unfortunately
it is possible for store operations in the driver's transmit
function to only become visible after dev_deactivate returns.

This patch fixes this by taking the queue lock after we see
the end of the queue run.  This ensures that all effects of
any previous transmit calls are visible.

If however we detect that there is another queue run occuring,
then we'll warn about it because this should never happen as
we have pointed dev->qdisc to noop_qdisc within the same queue
lock earlier in the functino.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce0e32e6

18 10月, 2007 1 次提交

[NET]: fix carrier-on bug? · bfaae0f0

由 Jeff Garzik 提交于 10月 17, 2007

While looking at a net driver with the following construct,

	if (!netif_carrier_ok(dev))
		netif_carrier_on(dev);

it stuck me that the netif_carrier_ok() check was redundant, since
netif_carrier_on() checks bit __LINK_STATE_NOCARRIER anyway.  This is
the same reason why netif_queue_stopped() need not be called prior to
netif_wake_queue().

This is true, but there is however an unwanted side effect from assuming
that netif_carrier_on() can be called multiple times:  it touches the
watchdog, regardless of pre-existing carrier state.

The fix:  move watchdog-up inside the bit-cleared code path.
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bfaae0f0

11 10月, 2007 2 次提交

[NET_SCHED]: explict hold dev tx lock · 8236632f

由 Jamal Hadi Salim 提交于 9月 25, 2007

For N cpus, with full throttle traffic on all N CPUs, funneling traffic
to the same ethernet device, the devices queue lock is contended by all
N CPUs constantly. The TX lock is only contended by a max of 2 CPUS.
In the current mode of operation, after all the work of entering the
dequeue region, we may endup aborting the path if we are unable to get
the tx lock and go back to contend for the queue lock. As N goes up,
this gets worse.

The changes in this patch result in a small increase in performance
with a 4CPU (2xdual-core) with no irq binding. Both e1000 and tg3
showed similar behavior;
Signed-off-by: NJamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8236632f

[NET]: Make NAPI polling independent of struct net_device objects. · bea3348e

由 Stephen Hemminger 提交于 10月 03, 2007

Several devices have multiple independant RX queues per net
device, and some have a single interrupt doorbell for several
queues.

In either case, it's easier to support layouts like that if the
structure representing the poll is independant from the net
device itself.

The signature of the ->poll() call back goes from:

	int foo_poll(struct net_device *dev, int *budget)

to

	int foo_poll(struct napi_struct *napi, int budget)

The caller is returned the number of RX packets processed (or
the number of "NAPI credits" consumed if you want to get
abstract).  The callee no longer messes around bumping
dev->quota, *budget, etc. because that is all handled in the
caller upon return.

The napi_struct is to be embedded in the device driver private data
structures.

Furthermore, it is the driver's responsibility to disable all NAPI
instances in it's ->stop() device close handler.  Since the
napi_struct is privatized into the driver's private data structures,
only the driver knows how to get at all of the napi_struct instances
it may have per-device.

With lots of help and suggestions from Rusty Russell, Roland Dreier,
Michael Chan, Jeff Garzik, and Jamal Hadi Salim.

Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra,
Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan.

[ Ported to current tree and all drivers converted.  Integrated
  Stephen's follow-on kerneldoc additions, and restored poll_list
  handling to the old style to fix mutual exclusion issues.  -DaveM ]
Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bea3348e

11 7月, 2007 2 次提交

[NET_SCHED]: Remove unnecessary includes · 0ba48053

由 Patrick McHardy 提交于 7月 02, 2007

Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ba48053

[NET_SCHED]: Remove CONFIG_NET_ESTIMATOR option · 876d48aa

由 Patrick McHardy 提交于 7月 02, 2007

The generic estimator is always built in anways and all the config options
does is prevent including a minimal amount of code for setting it up.
Additionally the option is already automatically selected for most cases.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

876d48aa

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功