提交 · 67147ba99aeb48f2863e03b68e090088a34c1b5d · openanolis / cloud-kernel

27 5月, 2009 4 次提交

gro: Localise offset/headlen in skb_gro_offset · 67147ba9

由 Herbert Xu 提交于 5月 26, 2009

This patch stores the offset/headlen in local variables as they're
used repeatedly in skb_gro_offset.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67147ba9

gro: Inline skb_gro_header and cache frag0 virtual address · 78a478d0

由 Herbert Xu 提交于 5月 26, 2009

The function skb_gro_header is called four times per packet which
quickly adds up at 10Gb/s.  This patch inlines it to allow better
optimisations.

Some architectures perform multiplication for page_address, which
is done by each skb_gro_header invocation.  This patch caches that
value in skb->cb to avoid the unnecessary multiplications.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

78a478d0

gro: Open-code frags copy in skb_gro_receive · 42da6994

由 Herbert Xu 提交于 5月 26, 2009

gcc does a poor job at generating code for the memcpy of the frags
array in skb_gro_receive, which is the primary purpose of that
function when merging frags.  In particular, it can't utilise the
alignment information of the source and destination.  This patch
open-codes the copy so we process words instead of bytes.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

42da6994

net: Remove bogus reference to BUS_ID_SIZE in sysfs code. · 2b0cc7f7

由 David S. Miller 提交于 5月 26, 2009

BUS_ID_SIZE is really no more, and device names are dynamically
allocated and thus can be any necessary size.

So remove the BUG check here making sure BUS_ID_SIZE is at least
as large as IFNAMSIZ.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2b0cc7f7

26 5月, 2009 2 次提交

net: txq_trans_update() helper · 08baf561

由 Eric Dumazet 提交于 5月 25, 2009

We would like to get rid of netdev->trans_start = jiffies; that about all net
drivers have to use in their start_xmit() function, and use txq->trans_start
instead.

This can be done generically in core network, as suggested by David.

Some devices, (particularly loopback) dont need trans_start update, because
they dont have transmit watchdog. We could add a new device flag, or rely
on fact that txq->tran_start can be updated is txq->xmit_lock_owner is
different than -1. Use a helper function to hide our choice.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

08baf561

pkt_sched: gen_estimator: Fix signed integers right-shifts. · a1dcb662

由 Jarek Poplawski 提交于 5月 25, 2009

Right-shifts of signed integers are implementation-defined so unportable.

With feedback from: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a1dcb662

25 5月, 2009 3 次提交

net: remove COMPAT_NET_DEV_OPS · e3804cbe

由 Alexander Beregalov 提交于 5月 25, 2009

All drivers are already converted to new net_device_ops API
and nobody uses old API anymore.
Signed-off-by: NAlexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e3804cbe

skbuff: Copy csum instead of csum_start/csum_offset · 9bcb97ca

由 Herbert Xu 提交于 5月 22, 2009

Hi:

skbuff: Copy csum instead of csum_start/csum_offset

It's easier to copy the u32 csum instead of its two u16
constituents.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

Cheers,
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9bcb97ca

skbuff: Move new code into __copy_skb_header · 82c49a35

由 Herbert Xu 提交于 5月 22, 2009

Hi:

skbuff: Move new __skb_clone code into __copy_skb_header

It seems that people just keep on adding stuff to __skb_clone
instead __copy_skb_header.  This is wrong as it means your brand-new
attributes won't always get copied as you intended.

This patch moves them to the right place, and adds a comment to
prevent this from happening again.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

Thanks,
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

82c49a35

22 5月, 2009 5 次提交

D
net: Fix arg to trace_napi_poll() in netpoll. · 7d18f114
由 David S. Miller 提交于 5月 21, 2009
```
Reproted by Stephen Rothwell.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
7d18f114

dropmon: add ability to detect when hardware dropsrxpackets · 4ea7e386

由 Neil Horman 提交于 5月 21, 2009

Patch to add the ability to detect drops in hardware interfaces via dropwatch.
Adds a tracepoint to net_rx_action to signal everytime a napi instance is
polled.  The dropmon code then periodically checks to see if the rx_frames
counter has changed, and if so, adds a drop notification to the netlink
protocol, using the reserved all-0's vector to indicate the drop location was in
hardware, rather than somewhere in the code.
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>

 include/linux/net_dropmon.h |    8 ++
 include/trace/napi.h        |   11 +++
 net/core/dev.c              |    5 +
 net/core/drop_monitor.c     |  124 ++++++++++++++++++++++++++++++++++++++++++--
 net/core/net-traces.c       |    4 +
 net/core/netpoll.c          |    2
 6 files changed, 149 insertions(+), 5 deletions(-)
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4ea7e386

netns: simplify net_ns_init · ca0f3112

由 Stephen Hemminger 提交于 5月 21, 2009

The net_ns_init code can be simplified. No need to save error code
if it is only going to panic if it is set 4 lines later.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ca0f3112

netns: remove leftover debugging message · 1f7a2bb4

由 Stephen Hemminger 提交于 5月 21, 2009

Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f7a2bb4

pktgen: do not access flows[] beyond its length · 5b5f792a

由 Florian Westphal 提交于 5月 21, 2009

typo -- pkt_dev->nflows is for stats only, the number of concurrent
flows is stored in cflows.
Reported-By: NVladimir Ivashchenko <hazard@francoudi.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5b5f792a

21 5月, 2009 1 次提交

net: Remove unused parameter from fill method in fib_rules_ops. · 04af8cf6

由 Rami Rosen 提交于 5月 20, 2009

The netlink message header (struct nlmsghdr) is an unused parameter in
fill method of fib_rules_ops struct.  This patch removes this
parameter from this method and fixes the places where this method is
called.

(include/net/fib_rules.h)
Signed-off-by: NRami Rosen <ramirose@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04af8cf6

19 5月, 2009 5 次提交

net: release dst entry in dev_hard_start_xmit() · 93f154b5

由 Eric Dumazet 提交于 5月 18, 2009

One point of contention in high network loads is the dst_release() performed
when a transmited skb is freed. This is because NIC tx completion calls
dev_kree_skb() long after original call to dev_queue_xmit(skb).

CPU cache is cold and the atomic op in dst_release() stalls. On SMP, this is
quite visible if one CPU is 100% handling softirqs for a network device,
since dst_clone() is done by other cpus, involving cache line ping pongs.

It seems right place to release dst is in dev_hard_start_xmit(), for most
devices but ones that are virtual, and some exceptions.

David Miller suggested to define a new device flag, set in alloc_netdev_mq()
(so that most devices set it at init time), and carefuly unset in devices
which dont want a NULL skb->dst in their ndo_start_xmit().

List of devices that must clear this flag is :

- loopback device, because it calls netif_rx() and quoting Patrick :
    "ip_route_input() doesn't accept loopback addresses, so loopback packets
     already need to have a dst_entry attached."
- appletalk/ipddp.c : needs skb->dst in its xmit function

- And all devices that call again dev_queue_xmit() from their xmit function
(as some classifiers need skb->dst) : bonding, vlan, macvlan, eql, ifb, hdlc_fr
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93f154b5

net-sysfs: Use rtnl_trylock in sysfs methods. · 336ca57c

由 Eric W. Biederman 提交于 5月 13, 2009

The earlier patch to fix the deadlock between a network device going
away and writing to sysfs attributes was incomplete.
- It did not set signal_pending so we would leak ERSTARTSYS to user space.
- It used ERESTARTSYS which only restarts if sigaction configures it to.
- It did not cover store and show for ifalias.

So fix all of these up and use the new helper restart_syscall so we get
the details correct on what it takes.
Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

336ca57c

net: fix skb_seq_read returning wrong offset/length for page frag data · 995b3379

由 Thomas Chenault 提交于 5月 18, 2009

When called with a consumed value that is less than skb_headlen(skb)
bytes into a page frag, skb_seq_read() incorrectly returns an
offset/length relative to skb->data. Ensure that data which should come
from a page frag does.
Signed-off-by: NThomas Chenault <thomas_chenault@dell.com>
Tested-by: NShyam Iyer <shyam_iyer@dell.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

995b3379

pkt_sched: gen_estimator: use 64 bit intermediate counters for bps · 511e11e3

由 Eric Dumazet 提交于 5月 18, 2009

gen_estimator can overflow bps (bytes per second) with Gb links, while
it was designed with a u32 API, with a theorical limit of 34360Mbit
(2^32 bytes)

Using 64 bit intermediate avbps/brate counters can allow us to reach
this theorical limit.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

511e11e3

net: add tx_packets/tx_bytes/tx_dropped counters in struct netdev_queue · 7004bf25

由 Eric Dumazet 提交于 5月 18, 2009

offsetof(struct net_device, features)=0x44
offsetof(struct net_device, stats.tx_packets)=0x54
offsetof(struct net_device, stats.tx_bytes)=0x5c
offsetof(struct net_device, stats.tx_dropped)=0x6c

Network drivers that touch dev->stats.tx_packets/stats.tx_bytes in their
tx path can slow down SMP operations, since they dirty a cache line
that should stay shared (dev->features is needed in rx and tx paths)

We could move away stats field in net_device but it wont help that much.
(Two cache lines dirtied in tx path, we can do one only)

Better solution is to add tx_packets/tx_bytes/tx_dropped in struct
netdev_queue because this structure is already touched in tx path and
counters updates will then be free (no increase in size)
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7004bf25

18 5月, 2009 3 次提交

tcp: tcp_prequeue() can use keyed wakeups · 9dc20c5f

由 John Dykstra 提交于 5月 12, 2009

When TCP frees up write buffer space, avoid waking up tasks that have
done a poll() or select() on the same socket specifying read-side
events.

This is an extension of a read-side patch by Eric Dumazet.
Signed-off-by: NJohn Dykstra <john.dykstra1@gmail.com>
Acked-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9dc20c5f

netpoll: don't dereference NULL dev from np · 5e392739

由 Pavel Emelyanov 提交于 5月 11, 2009

It looks like the dev in netpoll_poll can be NULL - at lease it's
checked at the function beginning. Thus the dev->netde_ops dereference
looks dangerous.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e392739

R
ipv4: remove an unused parameter from configure method of fib_rules_ops. · 8b3521ee
由 Rami Rosen 提交于 5月 11, 2009
```
Signed-off-by: NRami Rosen <ramirose@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
8b3521ee

10 5月, 2009 1 次提交

net: check retval of dev_addr_init() · ab9c73cc

由 Jiri Pirko 提交于 5月 08, 2009

Add missed checking of dev_addr_init return value in alloc_netdev_mq.
Signed-off-by: NJiri Pirko <jpirko@redhat.com>

 net/core/dev.c |   15 ++++++++++++---
 1 files changed, 12 insertions(+), 3 deletions(-)
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab9c73cc

09 5月, 2009 1 次提交

Network Drop Monitor: Fix skb_kill_datagram · 61de71c6

由 John Dykstra 提交于 5月 08, 2009

Commit ead2ceb0 ("Network Drop Monitor:
Adding kfree_skb_clean for non-drops and modifying end-of-line points
for skbs") established new conventions for identifying dropped packets.

Align skb_kill_datagram() with these conventions so that packets that
get dropped just before the copy to userspace are properly tracked.
Signed-off-by: NJohn Dykstra <john.dykstra1@gmail.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

61de71c6

07 5月, 2009 1 次提交

net: update skb_recycle_check() for hardware timestamping changes · b8050075

由 Lennert Buytenhek 提交于 5月 06, 2009

Commit ac45f602 ("net: infrastructure
for hardware time stamping") added two skb initialization actions to
__alloc_skb(), which need to be added to skb_recycle_check() as well.
Signed-off-by: NLennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: NPatrick Ohly <patrick.ohly@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b8050075

06 5月, 2009 1 次提交

net: introduce a list of device addresses dev_addr_list (v6) · f001fde5

由 Jiri Pirko 提交于 5月 05, 2009

v5 -> v6 (current):
-removed so far unused static functions
-corrected dev_addr_del_multiple to call del instead of add

v4 -> v5:
-added device address type (suggested by davem)
-removed refcounting (better to have simplier code then safe potentially few
 bytes)

v3 -> v4:
-changed kzalloc to kmalloc in __hw_addr_add_ii()
-ASSERT_RTNL() avoided in dev_addr_flush() and dev_addr_init()

v2 -> v3:
-removed unnecessary rcu read locking
-moved dev_addr_flush() calling to ensure no null dereference of dev_addr

v1 -> v2:
-added forgotten ASSERT_RTNL to dev_addr_init and dev_addr_flush
-removed unnecessary rcu_read locking in dev_addr_init
-use compare_ether_addr_64bits instead of compare_ether_addr
-use L1_CACHE_BYTES as size for allocating struct netdev_hw_addr
-use call_rcu instead of rcu_synchronize
-moved is_etherdev_addr into __KERNEL__ ifdef

This patch introduces a new list in struct net_device and brings a set of
functions to handle the work with device address list. The list is a replacement
for the original dev_addr field and because in some situations there is need to
carry several device addresses with the net device. To be backward compatible,
dev_addr is made to point to the first member of the list so original drivers
sees no difference.
Signed-off-by: NJiri Pirko <jpirko@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f001fde5

05 5月, 2009 2 次提交

netns 2/2: extract net_create() · 088eb2d9

由 Alexey Dobriyan 提交于 5月 04, 2009

net_create() will be used by C/R to create fresh netns on restart.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

088eb2d9

netns 1/2: don't get/put old netns on CLONE_NEWNET · 4a84822c

由 Alexey Dobriyan 提交于 5月 04, 2009

copy_net_ns() doesn't copy anything, it creates fresh netns, so
get/put of old netns isn't needed.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4a84822c

04 5月, 2009 1 次提交

net: Avoid modulus in skb_tx_hash() for forwarding case. · 513de11b

由 David S. Miller 提交于 5月 03, 2009

Based almost entirely upon a patch by Eric Dumazet.

The common case is to have num-tx-queues <= num_rx_queues
and even if num_tx_queues is larger it will not be significantly
larger.

Therefore, a subtraction loop is always going to be faster than
modulus.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

513de11b

02 5月, 2009 1 次提交

net: Fix skb_tx_hash() for forwarding workloads. · ec581f6a

由 Eric Dumazet 提交于 5月 01, 2009

When skb_rx_queue_recorded() is true, we dont want to use jash distribution
as the device driver exactly told us which queue was selected at RX time.
jhash makes a statistical shuffle, but this wont work with 8 static inputs.

Later improvements would be to compute reciprocal value of real_num_tx_queues
to avoid a divide here. But this computation should be done once,
when real_num_tx_queues is set. This needs a separate patch, and a new
field in struct net_device.
Reported-by: NAndrew Dickinson <andrew@whydna.net>
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec581f6a

30 4月, 2009 1 次提交

net: Fix oops when splicing skbs from a frag_list. · 7a67e56f

由 Jarek Poplawski 提交于 4月 30, 2009

Lennert Buytenhek wrote:
> Since 4fb66994 ("net: Optimize memory
> usage when splicing from sockets.") I'm seeing this oops (e.g. in
> 2.6.30-rc3) when splicing from a TCP socket to /dev/null on a driver
> (mv643xx_eth) that uses LRO in the skb mode (lro_receive_skb) rather
> than the frag mode:

My patch incorrectly assumed skb->sk was always valid, but for
"frag_listed" skbs we can only use skb->sk of their parent.
Reported-by: NLennert Buytenhek <buytenh@wantstofly.org>
Debugged-by: NLennert Buytenhek <buytenh@wantstofly.org>
Tested-by: NLennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a67e56f

28 4月, 2009 1 次提交

net: Avoid extra wakeups of threads blocked in wait_for_packet() · bf368e4e

由 Eric Dumazet 提交于 4月 28, 2009

In 2.6.25 we added UDP mem accounting.

This unfortunatly added a penalty when a frame is transmitted, since
we have at TX completion time to call sock_wfree() to perform necessary
memory accounting. This calls sock_def_write_space() and utimately
scheduler if any thread is waiting on the socket.
Thread(s) waiting for an incoming frame was scheduled, then had to sleep
again as event was meaningless.

(All threads waiting on a socket are using same sk_sleep anchor)

This adds lot of extra wakeups and increases latencies, as noted
by Christoph Lameter, and slows down softirq handler.

Reference : http://marc.info/?l=linux-netdev&m=124060437012283&w=2 

Fortunatly, Davide Libenzi recently added concept of keyed wakeups
into kernel, and particularly for sockets (see commit
37e5540b 
epoll keyed wakeups: make sockets use keyed wakeups)

Davide goal was to optimize epoll, but this new wakeup infrastructure
can help non epoll users as well, if they care to setup an appropriate
handler.

This patch introduces new DEFINE_WAIT_FUNC() helper and uses it
in wait_for_packet(), so that only relevant event can wakeup a thread
blocked in this function.

Trace of function calls from bnx2 TX completion bnx2_poll_work() is :
__kfree_skb()
 skb_release_head_state()
  sock_wfree()
   sock_def_write_space()
    __wake_up_sync_key()
     __wake_up_common()
      receiver_wake_function() : Stops here since thread is waiting for an INPUT
Reported-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bf368e4e

27 4月, 2009 2 次提交

gro: Fix handling of headers that extend over the tail · edbd9e30

由 Herbert Xu 提交于 4月 27, 2009

The skb_gro_* code fails to handle the case where a header starts
in the linear area but ends in the frags area.  Since the goal
of skb_gro_* is to optimise the case of completely non-linear
packets, we can simply bail out if we have anything in the linear
area.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

edbd9e30

drop_monitor: Update netlink protocol to include netlink attribute header in alert message · 683703a2

由 Neil Horman 提交于 4月 27, 2009

When I initially implemented this protocol, I disregarded the use of netlink
attribute headers, thinking for my purposes they weren't needed. I've come to
find out that, as I'm starting to work with sending down messages with
associated data (like config messages), the kernel code spits out warnings about
trailing data in a netlink skb that doesn't have an associated header on it. As
such, I'm going to start including attribute headers in my netlink transaction,
and so for completeness, I should likely include them on messages bound from the
kernel to user space. This patch adds that header to the kernel, and bumps the
protocol version accordingly
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

683703a2

21 4月, 2009 2 次提交

tun: fix tun_chr_aio_write so that aio works · 6f26c9a7

由 Michael S. Tsirkin 提交于 4月 20, 2009

aio_write gets const struct iovec * but tun_chr_aio_write casts this to struct
iovec * and modifies the iovec. As a result, attempts to use io_submit
to send packets to a tun device fail with weird errors such as EINVAL.

Since tun is the only user of skb_copy_datagram_from_iovec, we can
fix this simply by changing the later so that it does not
touch the iovec passed to it.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f26c9a7

net: skb_copy_datagram_const_iovec() · 0a1ec07a

由 Michael S. Tsirkin 提交于 4月 20, 2009

There's an skb_copy_datagram_iovec() to copy out of a paged skb,
but it modifies the iovec, and does not support starting
at an offset in the destination. We want both in tun.c, so let's
add the function.

It's a carbon copy of skb_copy_datagram_iovec() with enough changes to
be annoying.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a1ec07a

20 4月, 2009 3 次提交

net: Fix GRO for multiple page fragments · 5db8765a

由 Ben Hutchings 提交于 4月 16, 2009

This loop over fragments in napi_fraginfo_skb() was "interesting".
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5db8765a

net: fix "compatibility" typos · eb39c57f

由 Marcin Slusarz 提交于 4月 19, 2009

Signed-off-by: NMarcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eb39c57f

net: sch_netem: Fix an inconsistency in ingress netem timestamps. · 8caf1539

由 Jarek Poplawski 提交于 4月 17, 2009

Alex Sidorenko reported:

"while experimenting with 'netem' we have found some strange behaviour. It
seemed that ingress delay as measured by 'ping' command shows up on some
hosts but not on others.

After some investigation I have found that the problem is that skbuff->tstamp
field value depends on whether there are any packet sniffers enabled. That
is:

- if any ptype_all handler is registered, the tstamp field is as expected
- if there are no ptype_all handlers, the tstamp field does not show the delay"

This patch prevents unnecessary update of tstamp in dev_queue_xmit_nit()
on ingress path (with act_mirred) adding a check, so minimal overhead on
the fast path, but only when sniffers etc. are active.

Since netem at ingress seems to logically emulate a network before a host,
tstamp is zeroed to trigger the update and pretend delays are from the
outside.
Reported-by: NAlex Sidorenko <alexandre.sidorenko@hp.com>
Tested-by: NAlex Sidorenko <alexandre.sidorenko@hp.com>
Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8caf1539

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功