提交 · 67147ba99aeb48f2863e03b68e090088a34c1b5d · openanolis / cloud-kernel

27 5月, 2009 1 次提交

gro: Inline skb_gro_header and cache frag0 virtual address · 78a478d0

由 Herbert Xu 提交于 5月 26, 2009

The function skb_gro_header is called four times per packet which
quickly adds up at 10Gb/s.  This patch inlines it to allow better
optimisations.

Some architectures perform multiplication for page_address, which
is done by each skb_gro_header invocation.  This patch caches that
value in skb->cb to avoid the unnecessary multiplications.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

78a478d0

26 5月, 2009 1 次提交

net: txq_trans_update() helper · 08baf561

由 Eric Dumazet 提交于 5月 25, 2009

We would like to get rid of netdev->trans_start = jiffies; that about all net
drivers have to use in their start_xmit() function, and use txq->trans_start
instead.

This can be done generically in core network, as suggested by David.

Some devices, (particularly loopback) dont need trans_start update, because
they dont have transmit watchdog. We could add a new device flag, or rely
on fact that txq->tran_start can be updated is txq->xmit_lock_owner is
different than -1. Use a helper function to hide our choice.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

08baf561

25 5月, 2009 1 次提交

net: remove COMPAT_NET_DEV_OPS · e3804cbe

由 Alexander Beregalov 提交于 5月 25, 2009

All drivers are already converted to new net_device_ops API
and nobody uses old API anymore.
Signed-off-by: NAlexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e3804cbe

22 5月, 2009 1 次提交

dropmon: add ability to detect when hardware dropsrxpackets · 4ea7e386

由 Neil Horman 提交于 5月 21, 2009

Patch to add the ability to detect drops in hardware interfaces via dropwatch.
Adds a tracepoint to net_rx_action to signal everytime a napi instance is
polled.  The dropmon code then periodically checks to see if the rx_frames
counter has changed, and if so, adds a drop notification to the netlink
protocol, using the reserved all-0's vector to indicate the drop location was in
hardware, rather than somewhere in the code.
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>

 include/linux/net_dropmon.h |    8 ++
 include/trace/napi.h        |   11 +++
 net/core/dev.c              |    5 +
 net/core/drop_monitor.c     |  124 ++++++++++++++++++++++++++++++++++++++++++--
 net/core/net-traces.c       |    4 +
 net/core/netpoll.c          |    2
 6 files changed, 149 insertions(+), 5 deletions(-)
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4ea7e386

19 5月, 2009 2 次提交

net: release dst entry in dev_hard_start_xmit() · 93f154b5

由 Eric Dumazet 提交于 5月 18, 2009

One point of contention in high network loads is the dst_release() performed
when a transmited skb is freed. This is because NIC tx completion calls
dev_kree_skb() long after original call to dev_queue_xmit(skb).

CPU cache is cold and the atomic op in dst_release() stalls. On SMP, this is
quite visible if one CPU is 100% handling softirqs for a network device,
since dst_clone() is done by other cpus, involving cache line ping pongs.

It seems right place to release dst is in dev_hard_start_xmit(), for most
devices but ones that are virtual, and some exceptions.

David Miller suggested to define a new device flag, set in alloc_netdev_mq()
(so that most devices set it at init time), and carefuly unset in devices
which dont want a NULL skb->dst in their ndo_start_xmit().

List of devices that must clear this flag is :

- loopback device, because it calls netif_rx() and quoting Patrick :
    "ip_route_input() doesn't accept loopback addresses, so loopback packets
     already need to have a dst_entry attached."
- appletalk/ipddp.c : needs skb->dst in its xmit function

- And all devices that call again dev_queue_xmit() from their xmit function
(as some classifiers need skb->dst) : bonding, vlan, macvlan, eql, ifb, hdlc_fr
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93f154b5

net: add tx_packets/tx_bytes/tx_dropped counters in struct netdev_queue · 7004bf25

由 Eric Dumazet 提交于 5月 18, 2009

offsetof(struct net_device, features)=0x44
offsetof(struct net_device, stats.tx_packets)=0x54
offsetof(struct net_device, stats.tx_bytes)=0x5c
offsetof(struct net_device, stats.tx_dropped)=0x6c

Network drivers that touch dev->stats.tx_packets/stats.tx_bytes in their
tx path can slow down SMP operations, since they dirty a cache line
that should stay shared (dev->features is needed in rx and tx paths)

We could move away stats field in net_device but it wont help that much.
(Two cache lines dirtied in tx path, we can do one only)

Better solution is to add tx_packets/tx_bytes/tx_dropped in struct
netdev_queue because this structure is already touched in tx path and
counters updates will then be free (no increase in size)
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7004bf25

10 5月, 2009 1 次提交

net: check retval of dev_addr_init() · ab9c73cc

由 Jiri Pirko 提交于 5月 08, 2009

Add missed checking of dev_addr_init return value in alloc_netdev_mq.
Signed-off-by: NJiri Pirko <jpirko@redhat.com>

 net/core/dev.c |   15 ++++++++++++---
 1 files changed, 12 insertions(+), 3 deletions(-)
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab9c73cc

06 5月, 2009 1 次提交

net: introduce a list of device addresses dev_addr_list (v6) · f001fde5

由 Jiri Pirko 提交于 5月 05, 2009

v5 -> v6 (current):
-removed so far unused static functions
-corrected dev_addr_del_multiple to call del instead of add

v4 -> v5:
-added device address type (suggested by davem)
-removed refcounting (better to have simplier code then safe potentially few
 bytes)

v3 -> v4:
-changed kzalloc to kmalloc in __hw_addr_add_ii()
-ASSERT_RTNL() avoided in dev_addr_flush() and dev_addr_init()

v2 -> v3:
-removed unnecessary rcu read locking
-moved dev_addr_flush() calling to ensure no null dereference of dev_addr

v1 -> v2:
-added forgotten ASSERT_RTNL to dev_addr_init and dev_addr_flush
-removed unnecessary rcu_read locking in dev_addr_init
-use compare_ether_addr_64bits instead of compare_ether_addr
-use L1_CACHE_BYTES as size for allocating struct netdev_hw_addr
-use call_rcu instead of rcu_synchronize
-moved is_etherdev_addr into __KERNEL__ ifdef

This patch introduces a new list in struct net_device and brings a set of
functions to handle the work with device address list. The list is a replacement
for the original dev_addr field and because in some situations there is need to
carry several device addresses with the net device. To be backward compatible,
dev_addr is made to point to the first member of the list so original drivers
sees no difference.
Signed-off-by: NJiri Pirko <jpirko@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f001fde5

04 5月, 2009 1 次提交

net: Avoid modulus in skb_tx_hash() for forwarding case. · 513de11b

由 David S. Miller 提交于 5月 03, 2009

Based almost entirely upon a patch by Eric Dumazet.

The common case is to have num-tx-queues <= num_rx_queues
and even if num_tx_queues is larger it will not be significantly
larger.

Therefore, a subtraction loop is always going to be faster than
modulus.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

513de11b

02 5月, 2009 1 次提交

net: Fix skb_tx_hash() for forwarding workloads. · ec581f6a

由 Eric Dumazet 提交于 5月 01, 2009

When skb_rx_queue_recorded() is true, we dont want to use jash distribution
as the device driver exactly told us which queue was selected at RX time.
jhash makes a statistical shuffle, but this wont work with 8 static inputs.

Later improvements would be to compute reciprocal value of real_num_tx_queues
to avoid a divide here. But this computation should be done once,
when real_num_tx_queues is set. This needs a separate patch, and a new
field in struct net_device.
Reported-by: NAndrew Dickinson <andrew@whydna.net>
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec581f6a

27 4月, 2009 1 次提交

gro: Fix handling of headers that extend over the tail · edbd9e30

由 Herbert Xu 提交于 4月 27, 2009

The skb_gro_* code fails to handle the case where a header starts
in the linear area but ends in the frags area.  Since the goal
of skb_gro_* is to optimise the case of completely non-linear
packets, we can simply bail out if we have anything in the linear
area.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

edbd9e30

20 4月, 2009 3 次提交

net: Fix GRO for multiple page fragments · 5db8765a

由 Ben Hutchings 提交于 4月 16, 2009

This loop over fragments in napi_fraginfo_skb() was "interesting".
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5db8765a

net: fix "compatibility" typos · eb39c57f

由 Marcin Slusarz 提交于 4月 19, 2009

Signed-off-by: NMarcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eb39c57f

net: sch_netem: Fix an inconsistency in ingress netem timestamps. · 8caf1539

由 Jarek Poplawski 提交于 4月 17, 2009

Alex Sidorenko reported:

"while experimenting with 'netem' we have found some strange behaviour. It
seemed that ingress delay as measured by 'ping' command shows up on some
hosts but not on others.

After some investigation I have found that the problem is that skbuff->tstamp
field value depends on whether there are any packet sniffers enabled. That
is:

- if any ptype_all handler is registered, the tstamp field is as expected
- if there are no ptype_all handlers, the tstamp field does not show the delay"

This patch prevents unnecessary update of tstamp in dev_queue_xmit_nit()
on ingress path (with act_mirred) adding a check, so minimal overhead on
the fast path, but only when sniffers etc. are active.

Since netem at ingress seems to logically emulate a network before a host,
tstamp is zeroed to trigger the update and pretend delays are from the
outside.
Reported-by: NAlex Sidorenko <alexandre.sidorenko@hp.com>
Tested-by: NAlex Sidorenko <alexandre.sidorenko@hp.com>
Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8caf1539

16 4月, 2009 1 次提交

gro: New frags interface to avoid copying shinfo · 76620aaf

由 Herbert Xu 提交于 4月 16, 2009

It turns out that copying a 16-byte area at ~800k times a second
can be really expensive :) This patch redesigns the frags GRO
interface to avoid copying that area twice.

The two disciples of the frags interface have been converted.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

76620aaf

15 4月, 2009 1 次提交

gro: Restore correct value to gso_size · fc59f9a3

由 Herbert Xu 提交于 4月 14, 2009

Since everybody has been focusing on baremetal GRO performance
no one noticed when I added a bug that zapped gso_size for all
GRO packets.  This only gets picked up when you forward the skb
out of an interface.

Thanks to Mark Wagner for noticing this bug when testing kvm.
Reported-by: NMark Wagner <mwagner@redhat.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc59f9a3

11 4月, 2009 1 次提交

net: netif_device_attach/detach should start/stop all queues · d543103a

由 Alexander Duyck 提交于 4月 08, 2009

Currently netif_device_attach/detach are only stopping one queue. They
should be starting and stopping all the queues on a given device.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d543103a

02 4月, 2009 1 次提交

net: allow multiple dev per napi with GRO · f2bde732

由 Stephen Hemminger 提交于 4月 01, 2009

GRO assumes that there is a one-to-one relationship between NAPI
structure and network device. Some devices like sky2 share multiple
devices on a single interrupt so only have one NAPI handler. Rather than
split GRO from NAPI, just have GRO assume if device changes that
it is a different flow.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f2bde732

27 3月, 2009 1 次提交

GRO: Disable GRO on legacy netif_rx path · 8f1ead2d

由 Herbert Xu 提交于 3月 26, 2009

When I fixed the GRO crash in the legacy receive path I used
napi_complete to replace __napi_complete.  Unfortunately they're
not the same when NETPOLL is enabled, which may result in us
not calling __napi_complete at all.

What's more, we really do need to keep the __napi_complete call
within the IRQ-off section since in theory an IRQ can occur in
between and fill up the backlog to the maximum, causing us to
lock up.

Since we can't seem to find a fix that works properly right now,
this patch reverts all the GRO support from the netif_rx path.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8f1ead2d

22 3月, 2009 2 次提交

net: remove useless prefetch() call · ed734a97

由 Eric Dumazet 提交于 3月 21, 2009

There is no gain using prefetch() in dev_hard_start_xmit(), since
we already had to read ops->ndo_select_queue pointer in dev_pick_tx(),
and both pointers are probably located in the same cache line.

This prefetch call slows down fast path because of a stall in address
computation.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ed734a97

skb: expose and constify hash primitives · 9247744e

由 Stephen Hemminger 提交于 3月 21, 2009

Some minor changes to queue hashing:
 1. Use const on accessor functions
 2. Export skb_tx_hash for use in drivers (see ixgbe)
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9247744e

19 3月, 2009 1 次提交

net: kfree(napi->skb) => kfree_skb · e4a389a9

由 Roel Kluin 提交于 3月 18, 2009

struct sk_buff pointers should be freed with kfree_skb.
Signed-off-by: NRoel Kluin <roel.kluin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e4a389a9

18 3月, 2009 1 次提交

gro: Fix legacy path napi_complete crash · 303c6a02

由 Herbert Xu 提交于 3月 17, 2009

On the legacy netif_rx path, I incorrectly tried to optimise
the napi_complete call by using __napi_complete before we reenable
IRQs.  This simply doesn't work since we need to flush the held
GRO packets first.

This patch fixes it by doing the obvious thing of reenabling
IRQs first and then calling napi_complete.
Reported-by: NFrank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

303c6a02

17 3月, 2009 1 次提交

GRO: Move netpoll checks to correct location · d1c76af9

由 Herbert Xu 提交于 3月 16, 2009

As my netpoll fix for net doesn't really work for net-next, we
need this update to move the checks into the right place.  As it
stands we may pass freed skbs to netpoll_receive_skb.

This patch also introduces a netpoll_rx_on function to avoid GRO
completely if we're invoked through netpoll.  This might seem
paranoid but as netpoll may have an external receive hook it's
better to be safe than sorry.  I don't think we need this for
2.6.29 though since there's nothing immediately broken by it.

This patch also moves the GRO_* return values to netdevice.h since
VLAN needs them too (I tried to avoid this originally but alas
this seems to be the easiest way out).  This fixes a bug in VLAN
where it continued to use the old return value 2 instead of the
correct GRO_DROP.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1c76af9

14 3月, 2009 1 次提交

[SCSI] net: add NETIF_F_FCOE_CRC to can_checksum_protocol · 1c8dbcf6

由 Yi Zou 提交于 2月 27, 2009

Add FC CRC offload check for ETH_P_FCOE.
Signed-off-by: NYi Zou <yi.zou@intel.com>
Acked-by: NDavid Miller <davem@davemloft.net>
Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>

1c8dbcf6

05 3月, 2009 2 次提交

vlan: Fix vlan-in-vlan crashes. · 9d40bbda

由 David S. Miller 提交于 3月 04, 2009

As analyzed by Patrick McHardy, vlan needs to reset it's
netdev_ops pointer in it's ->init() function but this
leaves the compat method pointers stale.

Add a netdev_resync_ops() and call it from the vlan code.

Any other driver which changes ->netdev_ops after register_netdevice()
will need to call this new function after doing so too.

With help from Patrick McHardy.
Tested-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d40bbda

D
net: Fix missing dev->neigh_setup in register_netdevice(). · 54acd0ef
由 David S. Miller 提交于 3月 04, 2009
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
54acd0ef

03 3月, 2009 1 次提交

netns: Remove net_alive · 17edde52

由 Eric W. Biederman 提交于 2月 22, 2009

It turns out that net_alive is unnecessary, and the original problem
that led to it being added was simply that the icmp code thought
it was a network device and wound up being unable to handle packets
while there were still packets in the network namespace.

Now that icmp and tcp have been fixed to properly register themselves
this problem is no longer present and we have a stronger guarantee
that packets will not arrive in a network namespace then that provided
by net_alive in netif_receive_skb.  So remove net_alive allowing
packet reception run a little faster.

Additionally document the strong reason why network namespace cleanup
is safe so that if something happens again someone else will have
a chance of figuring it out.
Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

17edde52

01 3月, 2009 1 次提交

netpoll: Add drop checks to all entry points · 4ead4431

由 Herbert Xu 提交于 3月 01, 2009

The netpoll entry checks are required to ensure that we don't
receive normal packets when invoked via netpoll.  Unfortunately
it only ever worked for the netif_receive_skb/netif_rx entry
points.  The VLAN (and subsequently GRO) entry point didn't
have the check and therefore can trigger all sorts of weird
problems.

This patch adds the netpoll check to all entry points.

I'm still uneasy with receiving at all under netpoll (which
apparently is only used by the out-of-tree kdump code).  The
reason is it is perfectly legal to receive all data including
headers into highmem if netpoll is off, but if you try to do
that with netpoll on and someone gets a printk in an IRQ handler                                             
you're going to get a nice BUG_ON.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4ead4431

23 2月, 2009 1 次提交

netns: Remove net_alive · ce16c533

由 Eric W. Biederman 提交于 2月 22, 2009

It turns out that net_alive is unnecessary, and the original problem
that led to it being added was simply that the icmp code thought
it was a network device and wound up being unable to handle packets
while there were still packets in the network namespace.

Now that icmp and tcp have been fixed to properly register themselves
this problem is no longer present and we have a stronger guarantee
that packets will not arrive in a network namespace then that provided
by net_alive in netif_receive_skb.  So remove net_alive allowing
packet reception run a little faster.

Additionally document the strong reason why network namespace cleanup
is safe so that if something happens again someone else will have
a chance of figuring it out.
Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce16c533

21 2月, 2009 1 次提交

net: kernel panic in dev_hard_start_xmit: remove faulty software TX time stamping · cd4d8fda

由 Patrick Ohly 提交于 2月 21, 2009

The current implementation of the TX software time stamping fallback is
faulty because it accesses the skb after ndo_start_xmit() returns
successfully. This patch removes the fallback, which fixes kernel panics
seen during stress tests. Hardware time stamping is not affected by this
removal.
Signed-off-by: NPatrick Ohly <patrick.ohly@intel.com>
Signed-off-by: NEmil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cd4d8fda

19 2月, 2009 1 次提交

net: Optimize skb_tx_hash() by eliminating a comparison · e88721f8

由 Krishna Kumar 提交于 2月 18, 2009

Optimize skb_tx_hash() by eliminating a comparison that executes for
every packet. skb_tx_hashrnd initialization is moved to a later part of
the startup sequence, namely after the "random" driver is initialized.

Rebooted the system three times and verified that the code generates
different random numbers each time.
Signed-off-by: NKrishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e88721f8

16 2月, 2009 2 次提交

net: pass new SIOCSHWTSTAMP through to device drivers · d24fff22

由 Patrick Ohly 提交于 2月 12, 2009

Signed-off-by: NPatrick Ohly <patrick.ohly@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d24fff22

net: infrastructure for hardware time stamping · ac45f602

由 Patrick Ohly 提交于 2月 12, 2009

The additional per-packet information (16 bytes for time stamps, 1
byte for flags) is stored for all packets in the skb_shared_info
struct. This implementation detail is hidden from users of that
information via skb_* accessor functions. A separate struct resp.
union is used for the additional information so that it can be
stored/copied easily outside of skb_shared_info.

Compared to previous implementations (reusing the tstamp field
depending on the context, optional additional structures) this
is the simplest solution. It does not extend sk_buff itself.

TX time stamping is implemented in software if the device driver
doesn't support hardware time stamping.

The new semantic for hardware/software time stamping around
ndo_start_xmit() is based on two assumptions about existing
network device drivers which don't support hardware time
stamping and know nothing about it:
 - they leave the new skb_shared_tx unmodified
 - the keep the connection to the originating socket in skb->sk
   alive, i.e., don't call skb_orphan()

Given that skb_shared_tx is new, the first assumption is safe.
The second is only true for some drivers. As a result, software
TX time stamping currently works with the bnx2 driver, but not
with the unmodified igb driver (the two drivers this patch series
was tested with).
Signed-off-by: NPatrick Ohly <patrick.ohly@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac45f602

09 2月, 2009 2 次提交

gro: Optimise Ethernet header comparison · aa4b9f53

由 Herbert Xu 提交于 2月 08, 2009

This patch optimises the Ethernet header comparison to use 2-byte
and 4-byte xors instead of memcmp.  In order to facilitate this,
the actual comparison is now carried out by the callers of the
shared dev_gro_receive function.

This has a significant impact when receiving 1500B packets through
10GbE.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa4b9f53

gro: Remember number of held packets instead of counting every time · 4ae5544f

由 Herbert Xu 提交于 2月 08, 2009

This patch prepares for the move of the same_flow checks out of
dev_gro_receive.  As such we need to remember the number of held
packets since doing a loop just to count them every time is silly.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4ae5544f

07 2月, 2009 1 次提交

net_dma: call dmaengine_get only if NET_DMA enabled · b4bd07c2

由 David S. Miller 提交于 2月 06, 2009

Based upon a patch from Atsushi Nemoto <anemo@mba.ocn.ne.jp>

--------------------
The commit 649274d9 ("net_dma:
acquire/release dma channels on ifup/ifdown") added unconditional call
of dmaengine_get() to net_dma.  The API should be called only if
NET_DMA was enabled.
--------------------
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NDan Williams <dan.j.williams@intel.com>

b4bd07c2

06 2月, 2009 1 次提交

gro: Fix frag_list merging on imprecisely split packets · 56035022

由 Herbert Xu 提交于 2月 05, 2009

The previous fix ad0f9904 (gro:
Fix handling of imprecisely split packets) only fixed the case
of frags merging, frag_list merging in the same circumstances
were still broken.

In particular, the packet headers end up in the data stream.

This patch fixes this plus another issue where an imprecisely
split packet header may be read incorrectly (this is mostly
harmless since it'll simply cause the packet to not match and
be rejected for GRO).

Thanks to Emil Tantilov and Jeff Kirsher for helping to track
this down.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56035022

05 2月, 2009 1 次提交

net: Partially allow skb destructors to be used on receive path · 9a279bcb

由 Herbert Xu 提交于 2月 04, 2009

As it currently stands, skb destructors are forbidden on the
receive path because the protocol end-points will overwrite
any existing destructor with their own.

This is the reason why we have to call skb_orphan in the loopback
driver before we reinject the packet back into the stack, thus
creating a period during which loopback traffic isn't charged
to any socket.

With virtualisation, we have a similar problem in that traffic
is reinjected into the stack without being associated with any
socket entity, thus providing no natural congestion push-back
for those poor folks still stuck with UDP.

Now had we been consistent in telling them that UDP simply has
no congestion feedback, I could just fob them off.  Unfortunately,
we appear to have gone to some length in catering for this on
the standard UDP path, with skb/socket accounting so that has
created a very unhealthy dependency.

Alas habits are difficult to break out of, so we may just have
to allow skb destructors on the receive path.

It turns out that making skb destructors useable on the receive path
isn't as easy as it seems.  For instance, simply adding skb_orphan
to skb_set_owner_r isn't enough.  This is because we assume all
over the IP stack that skb->sk is an IP socket if present.

The new transparent proxy code goes one step further and assumes
that skb->sk is the receiving socket if present.

Now all of this can be dealt with by adding simple checks such
as only treating skb->sk as an IP socket if skb->sk->sk_family
matches.  However, it turns out that for bridging at least we
don't need to do all of this work.

This is of interest because most virtualisation setups use bridging
so we don't actually go through the IP stack on the host (with
the exception of our old nemesis the bridge netfilter, but that's
easily taken care of).

So this patch simply adds skb_orphan to the point just before we
enter the IP stack, but after we've gone through the bridge on the
receive path.  It also adds an skb_orphan to the one place in
netfilter that touches skb->sk/skb->destructor, that is, tproxy.

One word of caution, because of the internal code structure, anyone
wishing to deploy this must use skb_set_owner_w as opposed to
skb_set_owner_r since many functions that create a new skb from
an existing one will invoke skb_set_owner_w on the new skb.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a279bcb

01 2月, 2009 1 次提交

gro: Fix handling of imprecisely split packets · ad0f9904

由 Herbert Xu 提交于 2月 01, 2009

The commit 89a1b249edcf9be884e71f92df84d48355c576aa (gro: Avoid
copying headers of unmerged packets) only worked for packets
which are either completely linear, completely non-linear, or
packets which exactly split at the boundary between headers and
payload.

Anything else would cause bits in the header to go missing if
the packet is held by GRO.

This may have broken drivers such as ixgbe.

This patch fixes the places that assumed or only worked with
the above cases.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad0f9904

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功