提交 · b25bd2515ea32cf5ddd5fd5a2a93b8c9dd875e4f · openeuler / Kernel

14 9月, 2014 29 次提交

virtio_ring: unify direct/indirect code paths. · b25bd251

由 Rusty Russell 提交于 9月 11, 2014

virtqueue_add() populates the virtqueue descriptor table from the sgs
given.  If it uses an indirect descriptor table, then it puts a single
descriptor in the descriptor table pointing to the kmalloc'ed indirect
table where the sg is populated.

Previously vring_add_indirect() did the allocation and the simple
linear layout.  We replace that with alloc_indirect() which allocates
the indirect table then chains it like the normal descriptor table so
we can reuse the core logic.

This slows down pktgen by less than 1/2 a percent (which uses direct
descriptors), as well as vring_bench, but it's far neater.

vring_bench before:
	1061485790-1104800648(1.08254e+09+/-6.6e+06)ns
vring_bench after:
	1125610268-1183528965(1.14172e+09+/-8e+06)ns

pktgen before:
   787781-796334(793165+/-2.4e+03)pps 365-369(367.5+/-1.2)Mb/sec (365530384-369498976(3.68028e+08+/-1.1e+06)bps) errors: 0

pktgen after:
   779988-790404(786391+/-2.5e+03)pps 361-366(364.35+/-1.3)Mb/sec (361914432-366747456(3.64885e+08+/-1.2e+06)bps) errors: 0

Now, if we make force indirect descriptors by turning off any_header_sg
in virtio_net.c:

pktgen before:
  713773-721062(718374+/-2.1e+03)pps 331-334(332.95+/-0.92)Mb/sec (331190672-334572768(3.33325e+08+/-9.6e+05)bps) errors: 0
pktgen after:
  710542-719195(714898+/-2.4e+03)pps 329-333(331.15+/-1.1)Mb/sec (329691488-333706480(3.31713e+08+/-1.1e+06)bps) errors: 0
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b25bd251

virtio_ring: assume sgs are always well-formed. · eeebf9b1

由 Rusty Russell 提交于 9月 11, 2014

We used to have several callers which just used arrays.  They're
gone, so we can use sg_next() everywhere, simplifying the code.

On my laptop, this slowed down vring_bench by 15%:

vring_bench before:
	936153354-967745359(9.44739e+08+/-6.1e+06)ns
vring_bench after:
	1061485790-1104800648(1.08254e+09+/-6.6e+06)ns

However, a more realistic test using pktgen on a AMD FX(tm)-8320 saw
a few percent improvement:

pktgen before:
  767390-792966(785159+/-6.5e+03)pps 356-367(363.75+/-2.9)Mb/sec (356068960-367936224(3.64314e+08+/-3e+06)bps) errors: 0

pktgen after:
   787781-796334(793165+/-2.4e+03)pps 365-369(367.5+/-1.2)Mb/sec (365530384-369498976(3.68028e+08+/-1.1e+06)bps) errors: 0
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eeebf9b1

virtio_net: pass well-formed sgs to virtqueue_add_*() · a5835440

由 Rusty Russell 提交于 9月 11, 2014

This is the only driver which doesn't hand virtqueue_add_inbuf and
virtqueue_add_outbuf a well-formed, well-terminated sg.  Fix it,
so we can make virtio_add_* simpler.

pktgen results:
	modprobe pktgen
	echo 'add_device eth0' > /proc/net/pktgen/kpktgend_0
	echo nowait 1 > /proc/net/pktgen/eth0
	echo count 1000000 > /proc/net/pktgen/eth0
	echo clone_skb 100000 > /proc/net/pktgen/eth0
	echo dst_mac 4e:14:25:a9:30:ac > /proc/net/pktgen/eth0
	echo dst 192.168.1.2 > /proc/net/pktgen/eth0
	for i in `seq 20`; do echo start > /proc/net/pktgen/pgctrl; tail -n1 /proc/net/pktgen/eth0; done

Before:
  746547-793084(786421+/-9.6e+03)pps 346-367(364.4+/-4.4)Mb/sec (346397808-367990976(3.649e+08+/-4.5e+06)bps) errors: 0

After:
  767390-792966(785159+/-6.5e+03)pps 356-367(363.75+/-2.9)Mb/sec (356068960-367936224(3.64314e+08+/-3e+06)bps) errors: 0
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5835440

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · 0fe13151

由 David S. Miller 提交于 9月 13, 2014

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2014-09-12

This series contains updates to e1000, ixgbe and ixgbevf.

Mark provide two fixes to reduce compile warnings produce by ixgbe
and ixgbevf.

Alex provides two patches for ixgbe, first removes the receive buffer
allocation at the end of the ixgbe_clean_rx_irq().  The reason for
removing this is to avoid the extra latency introduced by the MMIO write.
Second patch addresses several issues in the current ixgbe implementation
of busy poll sockets.  It was possible for frames to be delivered out of
order if they were held in GRO, so address this by flushing the GRO
buffers before releasing the q_vector back to the idle state.  Also, we
were having to take a spinlock on changing the state to and from idle,
so to resolve this, replaced the state value with an atomic and use
atomic_cmpxchg to change the value from idle, and a simple atomic set
to restore it back to idle after we have acquired it.  This allows us
to only use a locked operation on acquiring the vector without a need
for a locked operation to release it.

Florian Westphal provides several patches for e1000 which does some
cleanup and updating of the driver.  Moved e1000_tbi_adjust_stats()
so that he could make the function static.  Added a helper function
to deal with the tbi workaround that was located in 2 different
Rx clean functions.  Added a e1000_rx_buffer struct for use on receive
since the transmit and receive have different requirements.  Updates
e1000 to use napi_gro_frags API.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0fe13151

Merge branch 'sched_rcu' · 54996b52

由 David S. Miller 提交于 9月 13, 2014

John Fastabend says:

====================
net/sched rcu classifiers and tcf

This series converts the tcf_proto usage to RCU.

This requires updating each classifier individually to handle the
new copy/update requirement and also to update the core list
traversals. This makes the assumption that updates to the tables
are infrequent in comparison to the packet per second being
classified. On a 10Gbps running near line rate we can easily
produce 12+ million packets per second so IMO this is a reasonable
assumption. The updates are serialized by RTNL.

I have done some basic testing on this series and do not see any
immediate splats or issues. The patch series has been running
on my dev systems for a month or so now and I've not seen any
issues. Although my configurations are not overly complicated.

My test cases at this point cover all the filters with a
tight loop to add/remove filters. Some basic estimator tests
where I add an estimator to the qdisc and verify the statistics
accurate using pktgen. And finally I have a small script to
exercise the 'tc actions' interface. Feel free to send me more
tests off list and I can run them.

This is prep work to drop the qdisc lock with the first
target being the ingress qdisc. To be done is making the
tc actions RCU safe and statistics per cpu. These patches
are in the works.

Comments:
  - Checkpatch is still giving errors on some >80 char lines I know
    about this. IMO the way to fix this is to restructure the sched
    code to avoid being so heavily indented. But doing this here
    bloats the patchset and anyways there are already lots of >80
    chars in these files. I would prefer to keep the patches as is
    but let me know if others think I should fix these and I will.
    A follow up patch set could restructure the code and fix this
    throughout the code blocks.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

54996b52

net: sched: rcu'ify cls_bpf · 1f947bf1

由 John Fastabend 提交于 9月 12, 2014

This patch makes the cls_bpf classifier RCU safe. The tcf_lock
was being used to protect a list of cls_bpf_prog now this list
is RCU safe and updates occur with rcu_replace.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f947bf1

net: sched: rcu'ify cls_rsvp · b929d86d

由 John Fastabend 提交于 9月 12, 2014

Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b929d86d

net: sched: make cls_u32 lockless · 1ce87720

由 John Fastabend 提交于 9月 12, 2014

Make cls_u32 classifier safe to run without holding lock. This patch
converts statistics that are kept in read section u32_classify into
per cpu counters.

This patch was tested with a tight u32 filter add/delete loop while
generating traffic with pktgen. By running pktgen on vlan devices
created on top of a physical device we can hit the qdisc layer
correctly. For ingress qdisc's a loopback cable was used.

for i in {1..100}; do
        q=`echo $i%8|bc`;
        echo -n "u32 tos: iteration $i on queue $q";
        tc filter add dev p3p2 parent $p prio $i u32 match ip tos 0x10 0xff \
                  action skbedit queue_mapping $q;
        sleep 1;
        tc filter del dev p3p2 prio $i;

        echo -n "u32 tos hash table: iteration $i on queue $q";
        tc filter add dev p3p2 parent $p protocol ip prio $i handle 628: u32 divisor 1
        tc filter add dev p3p2 parent $p protocol ip prio $i u32 \
                match ip protocol 17 0xff link 628: offset at 0 mask 0xf00 shift 6 plus 0
        tc filter add dev p3p2 parent $p protocol ip prio $i u32 \
                ht 628:0 match ip tos 0x10 0xff action skbedit queue_mapping $q
        sleep 2;
        tc filter del dev p3p2 prio $i
        sleep 1;
done
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ce87720

net: sched: make cls_u32 per cpu · 459d5f62

由 John Fastabend 提交于 9月 12, 2014

This uses per cpu counters in cls_u32 in preparation
to convert over to rcu.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

459d5f62

net: sched: RCU cls_tcindex · 331b7292

由 John Fastabend 提交于 9月 12, 2014

Make cls_tcindex RCU safe.

This patch addds a new RCU routine rcu_dereference_bh_rtnl() to check
caller either holds the rcu read lock or RTNL. This is needed to
handle the case where tcindex_lookup() is being called in both cases.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

331b7292

net: sched: RCU cls_route · 1109c005

由 John Fastabend 提交于 9月 12, 2014

RCUify the route classifier. For now however spinlock's are used to
protect fastmap cache.

The issue here is the fastmap may be read by one CPU while the
cache is being updated by another. An array of pointers could be
one possible solution.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1109c005

net: sched: fw use RCU · e35a8ee5

由 John Fastabend 提交于 9月 12, 2014

RCU'ify fw classifier.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e35a8ee5

net: sched: cls_flow use RCU · 70da9f0b

由 John Fastabend 提交于 9月 12, 2014

Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

70da9f0b

net: sched: cls_cgroup use RCU · 952313bd

由 John Fastabend 提交于 9月 12, 2014

Make cgroup classifier safe for RCU.

Also drops the calls in the classify routine that were doing a
rcu_read_lock()/rcu_read_unlock(). If the rcu_read_lock() isn't held
entering this routine we have issues with deleting the classifier
chain so remove the unnecessary rcu_read_lock()/rcu_read_unlock()
pair noting all paths AFAIK hold rcu_read_lock.

If there is a case where classify is called without the rcu read lock
then an rcu splat will occur and we can correct it.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

952313bd

net: sched: cls_basic use RCU · 9888faef

由 John Fastabend 提交于 9月 12, 2014

Enable basic classifier for RCU.

Dereferencing tp->root may look a bit strange here but it is needed
by my accounting because it is allocated at init time and needs to
be kfree'd at destroy time. However because it may be referenced in
the classify() path we must wait an RCU grace period before free'ing
it. We use kfree_rcu() and rcu_ APIs to enforce this. This pattern
is used in all the classifiers.

Also the hgenerator can be incremented without concern because it
is always incremented under RTNL.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9888faef

net: rcu-ify tcf_proto · 25d8c0d5

由 John Fastabend 提交于 9月 12, 2014

rcu'ify tcf_proto this allows calling tc_classify() without holding
any locks. Updaters are protected by RTNL.

This patch prepares the core net_sched infrastracture for running
the classifier/action chains without holding the qdisc lock however
it does nothing to ensure cls_xxx and act_xxx types also work without
locking. Additional patches are required to address the fall out.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

25d8c0d5

net: qdisc: use rcu prefix and silence sparse warnings · 46e5da40

由 John Fastabend 提交于 9月 12, 2014

Add __rcu notation to qdisc handling by doing this we can make
smatch output more legible. And anyways some of the cases should
be using rcu_dereference() see qdisc_all_tx_empty(),
qdisc_tx_chainging(), and so on.

Also *wake_queue() API is commonly called from driver timer routines
without rcu lock or rtnl lock. So I added rcu_read_lock() blocks
around netif_wake_subqueue and netif_tx_wake_queue.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46e5da40

net: sched: rcu'ify cls_bpf · d355ab09

由 John Fastabend 提交于 9月 12, 2014

This patch makes the cls_bpf classifier RCU safe. The tcf_lock
was being used to protect a list of cls_bpf_prog now this list
is RCU safe and updates occur with rcu_replace.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d355ab09

net: sched: rcu'ify cls_rsvp · 8b21e230

由 John Fastabend 提交于 9月 12, 2014

Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8b21e230

net: sched: make cls_u32 lockless · 8f787cd1

由 John Fastabend 提交于 9月 12, 2014

Make cls_u32 classifier safe to run without holding lock. This patch
converts statistics that are kept in read section u32_classify into
per cpu counters.

This patch was tested with a tight u32 filter add/delete loop while
generating traffic with pktgen. By running pktgen on vlan devices
created on top of a physical device we can hit the qdisc layer
correctly. For ingress qdisc's a loopback cable was used.

for i in {1..100}; do
        q=`echo $i%8|bc`;
        echo -n "u32 tos: iteration $i on queue $q";
        tc filter add dev p3p2 parent $p prio $i u32 match ip tos 0x10 0xff \
                  action skbedit queue_mapping $q;
        sleep 1;
        tc filter del dev p3p2 prio $i;

        echo -n "u32 tos hash table: iteration $i on queue $q";
        tc filter add dev p3p2 parent $p protocol ip prio $i handle 628: u32 divisor 1
        tc filter add dev p3p2 parent $p protocol ip prio $i u32 \
                match ip protocol 17 0xff link 628: offset at 0 mask 0xf00 shift 6 plus 0
        tc filter add dev p3p2 parent $p protocol ip prio $i u32 \
                ht 628:0 match ip tos 0x10 0xff action skbedit queue_mapping $q
        sleep 2;
        tc filter del dev p3p2 prio $i
        sleep 1;
done
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8f787cd1

net: sched: make cls_u32 per cpu · f4f64050

由 John Fastabend 提交于 9月 12, 2014

This uses per cpu counters in cls_u32 in preparation
to convert over to rcu.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f4f64050

net: sched: RCU cls_tcindex · 8332904a

由 John Fastabend 提交于 9月 12, 2014

Make cls_tcindex RCU safe.

This patch addds a new RCU routine rcu_dereference_bh_rtnl() to check
caller either holds the rcu read lock or RTNL. This is needed to
handle the case where tcindex_lookup() is being called in both cases.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8332904a

net: sched: RCU cls_route · cc91210c

由 John Fastabend 提交于 9月 12, 2014

RCUify the route classifier. For now however spinlock's are used to
protect fastmap cache.

The issue here is the fastmap may be read by one CPU while the
cache is being updated by another. An array of pointers could be
one possible solution.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc91210c

net: sched: fw use RCU · 1f31fea5

由 John Fastabend 提交于 9月 12, 2014

RCU'ify fw classifier.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f31fea5

net: sched: cls_flow use RCU · ad7a97ae

由 John Fastabend 提交于 9月 12, 2014

Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad7a97ae

net: sched: cls_cgroup use RCU · c7953ef2

由 John Fastabend 提交于 9月 12, 2014

Make cgroup classifier safe for RCU.

Also drops the calls in the classify routine that were doing a
rcu_read_lock()/rcu_read_unlock(). If the rcu_read_lock() isn't held
entering this routine we have issues with deleting the classifier
chain so remove the unnecessary rcu_read_lock()/rcu_read_unlock()
pair noting all paths AFAIK hold rcu_read_lock.

If there is a case where classify is called without the rcu read lock
then an rcu splat will occur and we can correct it.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c7953ef2

net: sched: cls_basic use RCU · c8b9affe

由 John Fastabend 提交于 9月 12, 2014

Enable basic classifier for RCU.

Dereferencing tp->root may look a bit strange here but it is needed
by my accounting because it is allocated at init time and needs to
be kfree'd at destroy time. However because it may be referenced in
the classify() path we must wait an RCU grace period before free'ing
it. We use kfree_rcu() and rcu_ APIs to enforce this. This pattern
is used in all the classifiers.

Also the hgenerator can be incremented without concern because it
is always incremented under RTNL.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c8b9affe

net: rcu-ify tcf_proto · 80a735f7

由 John Fastabend 提交于 9月 12, 2014

rcu'ify tcf_proto this allows calling tc_classify() without holding
any locks. Updaters are protected by RTNL.

This patch prepares the core net_sched infrastracture for running
the classifier/action chains without holding the qdisc lock however
it does nothing to ensure cls_xxx and act_xxx types also work without
locking. Additional patches are required to address the fall out.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

80a735f7

net: qdisc: use rcu prefix and silence sparse warnings · b26b0d1e

由 John Fastabend 提交于 9月 12, 2014

Add __rcu notation to qdisc handling by doing this we can make
smatch output more legible. And anyways some of the cases should
be using rcu_dereference() see qdisc_all_tx_empty(),
qdisc_tx_chainging(), and so on.

Also *wake_queue() API is commonly called from driver timer routines
without rcu lock or rtnl lock. So I added rcu_read_lock() blocks
around netif_wake_subqueue and netif_tx_wake_queue.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b26b0d1e

13 9月, 2014 8 次提交

sunvnet: Avoid sending superfluous LDC messages. · d1015645

由 Sowmini Varadhan 提交于 9月 11, 2014

When sending out a burst of packets across multiple descriptors,
it is sufficient to send one LDC "start" trigger for
the first descriptor, so do not send an LDC "start" for every
pass through vnet_start_xmit. Similarly, it is sufficient to send
one "DRING_STOPPED" trigger for the last dring (and if that
fails, hold off and send the trigger later).

Optimizations to the number of LDC messages helps avoid
filling up the LDC channel with superfluous LDC messages
that risk triggering flow-control on the channel,
and also boosts performance.
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: NRaghuram Kothakota <raghuram.kothakota@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1015645

net: axienet: remove unnecessary ether_setup after alloc_etherdev · c706471b

由 Subbaraya Sundeep Bhatta 提交于 9月 11, 2014

calling ether_setup is redundant since alloc_etherdev calls
it.
Signed-off-by: NSubbaraya Sundeep Bhatta <sbhatta@xilinx.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c706471b

ethernet: amd: use pr_info_once() · e9c3f99f

由 Varka Bhadram 提交于 9月 11, 2014

It will use pr_info_one() to print the version info of the
driver in probe function only once. No need to use the static
variable here.
Signed-off-by: NVarka Bhadram <varkab@cdac.in>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e9c3f99f

udp: Fix inverted NAPI_GRO_CB(skb)->flush test · 2d8f7e2c

由 Scott Wood 提交于 9月 10, 2014

Commit 2abb7cdc ("udp: Add support for doing checksum unnecessary
conversion") caused napi_gro_cb structs with the "flush" field zero to
take the "udp_gro_receive" path rather than the "set flush to 1" path
that they would previously take.  As a result I saw booting from an NFS
root hang shortly after starting userspace, with "server not
responding" messages.

This change to the handling of "flush == 0" packets appears to be
incidental to the goal of adding new code in the case where
skb_gro_checksum_validate_zero_check() returns zero.  Based on that and
the fact that it breaks things, I'm assuming that it is unintentional.

Fixes: 2abb7cdc ("udp: Add support for doing checksum unnecessary conversion")
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: NScott Wood <scottwood@freescale.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d8f7e2c

Merge branch 'sock_queue_err_skb' · c5306726

由 David S. Miller 提交于 9月 12, 2014

Alexander Duyck says:

====================
Address reference counting issues with sock_queue_err_skb

After looking over the code for skb_clone_sk after some comments made by
Eric Dumazet I have come to the conclusion that skb_clone_sk is taking the
correct approach in how to handle the sk_refcnt when creating a buffer that
is eventually meant to be returned to the socket via the sock_queue_err_skb
function.

However upon review of other callers I found what I believe to be a
possible reference count issue in the path for handling "wifi ack" packets.
To address this I have applied the same logic that is currently in place so
that the sk_refcnt will be forced to stay at least 1, or we will not
provide an skb to return in the sk_error_queue.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c5306726

mac80211: Resolve sk_refcnt/sk_wmem_alloc issue in wifi ack path · bf7fa551

由 Alexander Duyck 提交于 9月 10, 2014

There is a possible issue with the use, or lack thereof of sk_refcnt and
sk_wmem_alloc in the wifi ack status functionality.

Specifically if a socket were to request acknowledgements, and the socket
were to have sk_refcnt drop to 0 resulting in it waiting on sk_wmem_alloc
to reach 0 it would be possible to have sock_queue_err_skb orphan the last
buffer, resulting in __sk_free being called on the socket. After this the
buffer is enqueued on sk_error_queue, however the queue has already been
flushed resulting in at least a memory leak, if not a data corruption.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Acked-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bf7fa551

skb: Add documentation for skb_clone_sk · cab41c47

由 Alexander Duyck 提交于 9月 10, 2014

This change adds some documentation to the call skb_clone_sk. This is
meant to help clarify the purpose of the function for other developers.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cab41c47

Revert "ipv4: Clarify in docs that accept_local requires rp_filter." · 72b126a4

由 Sébastien Barré 提交于 9月 10, 2014

This reverts commit c801e3cc ("ipv4: Clarify in docs that accept_local requires rp_filter.").
It is not needed anymore since commit 1dced6a8 ("ipv4: Restore accept_local behaviour in fib_validate_source()").
Suggested-by: NJulian Anastasov <ja@ssi.bg>
Cc: Gregory Detal <gregory.detal@uclouvain.be>
Cc: Christoph Paasch <christoph.paasch@uclouvain.be>
Cc: Hannes Frederic Sowa <hannes@redhat.com>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: NSébastien Barré <sebastien.barre@uclouvain.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72b126a4

12 9月, 2014 3 次提交

e1000: switch to napi_gro_frags api · de591c78

由 Florian Westphal 提交于 9月 03, 2014

napi_gro_frags allows skb re-use in case GRO can merge payload pages
into an skb on the GRO lists.

netperf TCP_STREAM, kvm-e1000 emulation, mtu 9k:
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec
old: 87380  16384  16384    30.00  8985.78
new: 87380  16384  16384    30.00  9907.05
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

de591c78

e1000: convert to build_skb · 13809609

由 Florian Westphal 提交于 9月 03, 2014

Instead of preallocating Rx skbs, allocate them right before sending
inbound packet up the stack.

e1000-kvm, mtu1500, netperf TCP_STREAM:
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec
old: 87380  16384  16384    60.00    4532.40
new: 87380  16384  16384    60.00    4599.05
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

13809609

e1000: rename struct e1000_buffer to e1000_tx_buffer · 580f321d

由 Florian Westphal 提交于 9月 03, 2014

and remove *page, its only used for Rx.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

580f321d

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功