提交 · b82f08ea162edeee6c2c70c6c4321bea4763fa35 · bug2833 / cloud-kernel

04 6月, 2009 1 次提交

由 Herbert Xu 提交于 6月 03, 2009

When I added socket accounting to tun I inadvertently introduced
spurious wake-up events that kills qemu performance.  The problem
occurs when qemu polls on the tun fd for read, and then transmits
packets.  For each packet transmitted, we will wake up qemu even
if it only cares about read events.

Now this affects all sockets, but it is only a new problem for
tun.  So this patch tries to fix it for tun first and we can then
look at the problem in general.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c722c625

10 5月, 2009 1 次提交

tun: add tun_flags, owner, group attributes in sysfs · 980c9e8c

由 David Woodhouse 提交于 5月 09, 2009

This patch adds three attribute files in /sys/class/net/$dev/ for tun
devices; allowing userspace to obtain the information which TUNGETIFF
offers, and more, but without having to attach to the device in question
(which may not be possible if it's in use).

It also fixes a bug which has been present in the TUNGETIFF ioctl since
its inception, where it would never set IFF_TUN or IFF_TAP according to
the device type. (Look carefully at the code which I remove from
tun_get_iff() and how the new tun_flags() helper is subtly different).
Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

980c9e8c

27 4月, 2009 1 次提交

tun: add IFF_TUN_EXCL flag to avoid opening a persistent device. · f85ba780

由 David Woodhouse 提交于 4月 27, 2009

When creating a certain types of VPN, NetworkManager will first attempt
to find an available tun device by iterating through 'vpn%d' until it
finds one that isn't already busy. Then it'll set that to be persistent
and owned by the otherwise unprivileged user that the VPN dæmon itself
runs as.

There's a race condition here -- during the period where the vpn%d
device is created and we're waiting for the VPN dæmon to actually
connect and use it, if we try to create _another_ device we could end up
re-using the same one -- because trying to open it again doesn't get
-EBUSY as it would while it's _actually_ busy.

So solve this, we add an IFF_TUN_EXCL flag which causes tun_set_iff() to
fail if it would be opening an existing persistent tundevice -- so that
we can make sure we're getting an entirely _new_ device.
Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f85ba780

21 4月, 2009 2 次提交

tun: fix tun_chr_aio_write so that aio works · 6f26c9a7

由 Michael S. Tsirkin 提交于 4月 20, 2009

aio_write gets const struct iovec * but tun_chr_aio_write casts this to struct
iovec * and modifies the iovec. As a result, attempts to use io_submit
to send packets to a tun device fail with weird errors such as EINVAL.

Since tun is the only user of skb_copy_datagram_from_iovec, we can
fix this simply by changing the later so that it does not
touch the iovec passed to it.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f26c9a7

tun: fix tun_chr_aio_read so that aio works · 43b39dcd

由 Michael S. Tsirkin 提交于 4月 20, 2009

aio_read gets const struct iovec * but tun_chr_aio_read casts this to struct
iovec * and modifies the iovec. As a result, attempts to use io_submit
to get packets from a tun device fail with weird errors such as EINVAL.

Fix by using the new skb_copy_datagram_const_iovec.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

43b39dcd

20 4月, 2009 2 次提交

tun: Fix sk_sleep races when attaching/detaching · c40af84a

由 Herbert Xu 提交于 4月 19, 2009

As the sk_sleep wait queue actually lives in tfile, which may be
detached from the tun device, bad things will happen when we use
sk_sleep after detaching.

Since the tun device is the persistent data structure here (when
requested by the user), it makes much more sense to have the wait
queue live there.  There is no reason to have it in tfile at all
since the only time we can wait is if we have a tun attached.
In fact we already have a wait queue in tun_struct, so we might
as well use it.
Reported-by: NEric W. Biederman <ebiederm@xmission.com>
Tested-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Tested-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c40af84a

tun: Only free a netdev when all tun descriptors are closed · 9c3fea6a

由 Herbert Xu 提交于 4月 18, 2009

The commit c70f1829 ("tun: Fix
races between tun_net_close and free_netdev") fixed a race where
an asynchronous deletion of a tun device can hose a poll(2) on
a tun fd attached to that device.

However, this came at the cost of moving the tun wait queue into
the tun file data structure.  The problem with this is that it
imposes restrictions on when and where the tun device can access
the wait queue since the tun file may change at any time due to
detaching and reattaching.

In particular, now that we need to use the wait queue on the
receive path it becomes difficult to properly synchronise this
with the detachment of the tun device.

This patch solves the original race in a different way.  Since
the race is only because the underlying memory gets freed, we
can prevent it simply by ensuring that we don't do that until
all tun descriptors ever attached to the device (even if they
have since be detached because they may still be sitting in poll)
have been closed.

This is done by using reference counting the attached tun file
descriptors.  The refcount in tun->sk has been reappropriated
for this purpose since it was already being used for that, albeit
from the opposite angle.

Note that we no longer zero tfile->tun since tun_get will return
NULL anyway after the refcount on tfile hits zero.  Instead it
represents whether this device has ever been attached to a device.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9c3fea6a

14 4月, 2009 1 次提交

tun: Fix crash with non-GSO users · 0eca93bc

由 Herbert Xu 提交于 4月 14, 2009

When I made the tun driver use non-linear packets as the preferred
option, it broke non-GSO users because they would end up allocating
a completely non-linear packet, which triggers a crash when we call
eth_type_trans.

This patch reverts non-GSO users to using linear packets and adds
a check to ensure that GSO users can't cause crashes in the same
way.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0eca93bc

15 2月, 2009 1 次提交

tun: Fix merge error · ab46d779

由 Herbert Xu 提交于 2月 14, 2009

When forward-porting the tun accounting patch I managed to break
the send path compltely by dropping the tun_get call.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab46d779

09 2月, 2009 1 次提交

tun: Fix unicast filter overflow · cfbf84fc

由 Alex Williamson 提交于 2月 08, 2009

Tap devices can make use of a small MAC filter set via the
TUNSETTXFILTER ioctl.  The filter has a set of exact matches
plus a hash for imperfect filtering of additional multicast
addresses.  The current code is unbalanced, adding unicast
addresses to the multicast hash, but only checking the hash
against multicast addresses.  This results in the filter
dropping unicast addresses that overflow the exact filter.
The fix is simply to disable the filter by leaving count set
to zero if we find non-multicast addresses after the exact
match table is filled.
Signed-off-by: NAlex Williamson <alex.williamson@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cfbf84fc

06 2月, 2009 1 次提交

tun: Limit amount of queued packets per device · 33dccbb0

由 Herbert Xu 提交于 2月 05, 2009

Unlike a normal socket path, the tuntap device send path does
not have any accounting.  This means that the user-space sender
may be able to pin down arbitrary amounts of kernel memory by
continuing to send data to an end-point that is congested.

Even when this isn't an issue because of limited queueing at
most end points, this can also be a problem because its only
response to congestion is packet loss.  That is, when those
local queues at the end-point fills up, the tuntap device will
start wasting system time because it will continue to send
data there which simply gets dropped straight away.

Of course one could argue that everybody should do congestion
control end-to-end, unfortunately there are people in this world
still hooked on UDP, and they don't appear to be going away
anywhere fast.  In fact, we've always helped them by performing
accounting in our UDP code, the sole purpose of which is to
provide congestion feedback other than through packet loss.

This patch attempts to apply the same bandaid to the tuntap device.
It creates a pseudo-socket object which is used to account our
packets just as a normal socket does for UDP.  Of course things
are a little complex because we're actually reinjecting traffic
back into the stack rather than out of the stack.

The stack complexities however should have been resolved by preceding
patches.  So this one can simply start using skb_set_owner_w.

For now the accounting is essentially disabled by default for
backwards compatibility.  In particular, we set the cap to INT_MAX.
This is so that existing applications don't get confused by the
sudden arrival EAGAIN errors.

In future we may wish (or be forced to) do this by default.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

33dccbb0

03 2月, 2009 1 次提交

tun: Check supplemental groups in TUN/TAP driver. · 1bded710

由 Michael Tokarev 提交于 2月 02, 2009

Michael Tokarev wrote:
[]
> 2, and this is the main one: How about supplementary groups?
>
> Here I have a valid usage case: a group of testers running various
> versions of windows using KVM (kernel virtual machine), 1 at a time,
> to test some software.  kvm is set up to use bridge with a tap device
> (there should be a way to connect to the machine).  Anyone on that group
> has to be able to start/stop the virtual machines.
>
> My first attempt - pretty obvious when I saw -g option of tunctl - is
> to add group ownership for the tun device and add a supplementary group
> to each user (their primary group should be different).  But that fails,
> since kernel only checks for egid, not any other group ids.
>
> What's the reasoning to not allow supplementary groups and to only check
> for egid?
Signed-off-by: NMichael Tokarev <mjt@tls.msk.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1bded710

01 2月, 2009 1 次提交

net: replace uses of __constant_{endian} · 09640e63

由 Harvey Harrison 提交于 2月 01, 2009

Base versions handle constant folding now.
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

09640e63

22 1月, 2009 10 次提交

tun: Implement ip link del tunXXX · f019a7a5

由 Eric W. Biederman 提交于 1月 21, 2009

This greatly simplifies testing to verify I have fixed the problems
with a tun device disappearing when the tun file descriptor is still
held open.

Further it allows removal network namespace operations for the tun
driver.  Reducing the network namespace handling in the driver to the
minimum.  i.e. When we are creating a tun device.
Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f019a7a5

tun: There is no longer any need to deny changing network namespaces · aec191aa

由 Eric W. Biederman 提交于 1月 20, 2009

With the awkward case between free_netdev and dev_chr_close fixed
there is no longer any need to limit tun and tap devices to the
network namespace they were created in.  So remove the
NETIF_F_NETNS_LOCAL flag on the network device.
Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aec191aa

tun: Fix races between tun_net_close and free_netdev. · c70f1829

由 Eric W. Biederman 提交于 1月 20, 2009

The tun code does not cope gracefully if the network device goes away before
the tun file descriptor is closed. It looks like we can trigger this with
rmmod, and moving tun devices between network namespaces will allow this
to be triggered when network namespaces exit.

To fix this I introduce an intermediate data structure tun_file which
holds a count of users and a pointer to the struct tun_struct. tun_get
increments that reference count if it is greater than 0. tun_put decrements
that reference count and detaches from the network device if the count is 0.

While we have a file attached to the network device I hold a reference
to the network device keeping it from going away completely.

When a network device is unregistered I decrement the count of the
attached tun_file and if that was the last user I detach the tun_file,
and all processes on read_wait are woken up to ensure they do not
sleep indefinitely. As some of those sleeps happen with the count on
the tun device elevated waking up the read waiters ensures that
tun_file will be detached in a timely manner.
Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c70f1829

tun: Move read_wait into tun_file · b2430de3

由 Eric W. Biederman 提交于 1月 20, 2009

The poll interface requires that the waitqueue exist while the struct
file is open.  In the rare case when a tun device disappears before
the tun file closes we fail to provide this property, so move
read_wait.

This is safe now that tun_net_xmit is atomic with tun_detach.
Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b2430de3

tun: Make tun_net_xmit atomic wrt tun_attach && tun_detach · 38231b7a

由 Eric W. Biederman 提交于 1月 20, 2009

Currently this small race allows for a packet to be received when we
detach from an tun device and still be enqueued.  Not especially
important but not what the code is trying to do.
Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

38231b7a

tun: Grab the netns in open. · 36b50bab

由 Eric W. Biederman 提交于 1月 20, 2009

Grabbing namespaces in open, and putting them in close always seems to
be the cleanest approach with the fewest surprises.

So now that we have tun_file so we have somepleace to put the network
namespace, let's grab the network namespace on file open and put on
file close.
Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

36b50bab

tun: Introduce tun_file · 631ab46b

由 Eric W. Biederman 提交于 1月 20, 2009

Currently the tun code suffers from only having a single word of
data that exists for the entire life of the tun file descriptor.

This results in peculiar holding of references to the network namespace
as well as races between free_netdevice and tun_chr_close.

Fix this by introducing tun_file which will hold the per file state.
For the moment it still holds just a single word so the differences
are all logic changes with no changes in semantics.
Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

631ab46b

tun: Use POLLERR not EBADF in tun_chr_poll · eac9e902

由 Eric W. Biederman 提交于 1月 20, 2009

EBADF is meaningless in the context of a poll mask so use POLLERR
instead.
Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eac9e902

tun: Fix races in tun_set_iff · a7385ba2

由 Eric W. Biederman 提交于 1月 20, 2009

It is possible for two different tasks with access to the same file
descriptor to call tun_set_iff on it at the same time and race to
attach to a tap device.  Prevent this by placing all of the logic to
attach to a file descriptor in one function and testing the file
descriptor to be certain it is not already attached to another tun
device.
Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a7385ba2

tun: Remove unnecessary tun_get_by_name · 74a3e5a7

由 Eric W. Biederman 提交于 1月 20, 2009

Currently the tun driver keeps a private list of tun devices for what
appears to be a small gain in performance when reconnecting a file
descriptor to an existing tun or tap device.  So simplify the code by
removing it.
Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

74a3e5a7

05 1月, 2009 1 次提交

tun: Eliminate sparse signedness warning · 745417e2

由 Gerrit Renker 提交于 1月 04, 2009

register_pernet_gen_device() expects 'int*', found via sparse.

 CHECK   drivers/net/tun.c
 drivers/net/tun.c:1245:36: warning: incorrect type in argument 1 (different signedness)
 drivers/net/tun.c:1245:36:    expected int *id
 drivers/net/tun.c:1245:36:    got unsigned int static [toplevel] *<noident>
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

745417e2

30 12月, 2008 1 次提交

tun: Fix SIOCSIFHWADDR error. · 7a0a9608

由 Kusanagi Kouichi 提交于 12月 29, 2008

Set proper operations.
Signed-off-by: NKusanagi Kouichi <slash@ma.neweb.ne.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a0a9608

21 11月, 2008 1 次提交

netdev: add more functions to netdevice ops · 00829823

由 Stephen Hemminger 提交于 11月 20, 2008

This patch moves neigh_setup and hard_start_xmit into the network device ops
structure. For bisection, fix all the previously converted drivers as well.
Bonding driver took the biggest hit on this.

Added a prefetch of the hard_start_xmit in the fast path to try and reduce
any impact this would have.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

00829823

20 11月, 2008 1 次提交

tun: convert to net_device_ops · 758e43b7

由 Stephen Hemminger 提交于 11月 19, 2008

Convert the TUN/TAP tunnel driver to net_device_ops.
Split the ops in two, and retain compatability.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

758e43b7

14 11月, 2008 2 次提交

CRED: Wrap current->cred and a few other accessors · 86a264ab

由 David Howells 提交于 11月 14, 2008

Wrap current->cred and a few other accessors to hide their actual
implementation.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NJames Morris <jmorris@namei.org>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

86a264ab

CRED: Wrap task credential accesses in the network device drivers · ee9785ad

由 David Howells 提交于 11月 14, 2008

Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NJames Morris <jmorris@namei.org>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Cc: netdev@vger.kernel.org
Signed-off-by: NJames Morris <jmorris@namei.org>

ee9785ad

04 11月, 2008 1 次提交

drivers/net: Kill now superfluous ->last_rx stores. · babcda74

由 David S. Miller 提交于 11月 03, 2008

The generic packet receive code takes care of setting
netdev->last_rx when necessary, for the sake of the
bonding ARP monitor.

Drivers need not do it any more.

Some cases had to be skipped over because the drivers
were making use of the ->last_rx value themselves.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

babcda74

02 11月, 2008 1 次提交

saner FASYNC handling on file close · 233e70f4

由 Al Viro 提交于 10月 31, 2008

As it is, all instances of ->release() for files that have ->fasync()
need to remember to evict file from fasync lists; forgetting that
creates a hole and we actually have a bunch that *does* forget.

So let's keep our lives simple - let __fput() check FASYNC in
file->f_flags and call ->fasync() there if it's been set.  And lose that
crap in ->release() instances - leaving it there is still valid, but we
don't have to bother anymore.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

233e70f4

28 10月, 2008 1 次提交

net: convert print_mac to %pM · e174961c

由 Johannes Berg 提交于 10月 27, 2008

This converts pretty much everything to print_mac. There were
a few things that had conflicts which I have just dropped for
now, no harm done.

I've built an allyesconfig with this and looked at the files
that weren't built very carefully, but it's a huge patch.
Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e174961c

16 8月, 2008 2 次提交

tun: fallback if skb_alloc() fails on big packets · f42157cb

由 Rusty Russell 提交于 8月 15, 2008

skb_alloc produces linear packets (using kmalloc()).  That can fail,
so should we fall back to making paged skbs.

My original version of this patch always allocate paged skbs for big
packets.  But that made performance drop from 8.4 seconds to 8.8
seconds on 1G lguest->Host TCP xmit.  So now we only do that as a
fallback.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Acked-by: NMax Krasnyansky <maxk@qualcomm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f42157cb

tun: TUNGETIFF interface to query name and flags · e3b99556

由 Mark McLoughlin 提交于 8月 15, 2008

Add a TUNGETIFF interface so that userspace can query a
tun/tap descriptor for its name and flags.

This is needed because it is common for one app to create
a tap interface, exec another app and pass it the file
descriptor for the interface. Without TUNGETIFF the spawned
app has no way of detecting wheter the interface has e.g.
IFF_VNET_HDR set.
Signed-off-by: NMark McLoughlin <markmc@redhat.com>
Acked-by: NMax Krasnyansky <maxk@qualcomm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e3b99556

23 7月, 2008 1 次提交

net: tun.c fix cast · c0e5a8c2

由 Harvey Harrison 提交于 7月 16, 2008

Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

c0e5a8c2

15 7月, 2008 1 次提交

tun: Fix/rewrite packet filtering logic · f271b2cc

由 Max Krasnyansky 提交于 7月 14, 2008

Please see the following thread to get some context on this
	http://marc.info/?l=linux-netdev&m=121564433018903&w=2

Basically the issue is that current multi-cast filtering stuff in
the TUN/TAP driver is seriously broken.
Original patch went in without proper review and ACK. It was broken and
confusing to start with and subsequent patches broke it completely.
To give you an idea of what's broken here are some of the issues:

- Very confusing comments throughout the code that imply that the
character device is a network interface in its own right, and that packets
are passed between the two nics. Which is completely wrong.

- Wrong set of ioctls is used for setting up filters. They look like
shortcuts for manipulating state of the tun/tap network interface but
in reality manipulate the state of the TX filter.

- ioctls that were originally used for setting address of the the TX filter
got "fixed" and now set the address of the network interface itself. Which
made filter totaly useless.

- Filtering is done too late. Instead of filtering early on, to avoid
unnecessary wakeups, filtering is done in the read() call.

The list goes on and on :)

So the patch cleans all that up. It introduces simple and clean interface for
setting up TX filters (TUNSETTXFILTER + tun_filter spec) and does filtering
before enqueuing the packets.

TX filtering is useful in the scenarios where TAP is part of a bridge, in
which case it gets all broadcast, multicast and potentially other packets when
the bridge is learning. So for example Ethernet tunnelling app may want to
setup TX filters to avoid tunnelling multicast traffic. QEMU and other
hypervisors can push RX filtering that is currently done in the guest into the
host context therefore saving wakeups and unnecessary data transfer.
Signed-off-by: NMax Krasnyansky <maxk@qualcomm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f271b2cc

11 7月, 2008 1 次提交

tun: Persistent devices can get stuck in xoff state · e35259a9

由 Max Krasnyansky 提交于 7月 10, 2008

The scenario goes like this. App stops reading from tun/tap.
TX queue gets full and driver does netif_stop_queue().
App closes fd and TX queue gets flushed as part of the cleanup.
Next time the app opens tun/tap and starts reading from it but
the xoff state is not cleared. We're stuck.
Normally xoff state is cleared when netdev is brought up. But
in the case of persistent devices this happens only during
initial setup.

The fix is trivial. If device is already up when an app opens
it we clear xoff state and that gets things moving again.
Signed-off-by: NMax Krasnyansky <maxk@qualcomm.com>
Tested-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e35259a9

03 7月, 2008 3 次提交

tun: Allow GSO using virtio_net_hdr · f43798c2

由 Rusty Russell 提交于 7月 03, 2008

Add a IFF_VNET_HDR flag.  This uses the same ABI as virtio_net
(ie. prepending struct virtio_net_hdr to packets) to indicate GSO and
checksum information.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Acked-by: NMax Krasnyansky <maxk@qualcomm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f43798c2

tun: TUNSETFEATURES to set gso features. · 5228ddc9

由 Rusty Russell 提交于 7月 03, 2008

ethtool is useful for setting (some) device fields, but it's
root-only.  Finer feature control is available through a tun-specific
ioctl.

(Includes Mark McLoughlin <markmc@redhat.com>'s fix to hold rtnl sem).
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Acked-by: NMax Krasnyansky <maxk@qualcomm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5228ddc9

tun: Interface to query tun/tap features. · 07240fd0

由 Rusty Russell 提交于 7月 03, 2008

The problem with introducing checksum offload and gso to tun is they
need to set dev->features to enable GSO and/or checksumming, which is
supposed to be done before register_netdevice(), ie. as part of
TUNSETIFF.

Unfortunately, TUNSETIFF has always just ignored flags it doesn't
understand, so there's no good way of detecting whether the kernel
supports new IFF_ flags.

This patch implements a TUNGETFEATURES ioctl which returns all the valid IFF
flags.  It could be extended later to include other features.

Here's an example program which uses it:

#include <linux/if_tun.h>
#include <sys/types.h>
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <err.h>
#include <stdio.h>

static struct {
	unsigned int flag;
	const char *name;
} known_flags[] = {
	{ IFF_TUN, "TUN" },
	{ IFF_TAP, "TAP" },
	{ IFF_NO_PI, "NO_PI" },
	{ IFF_ONE_QUEUE, "ONE_QUEUE" },
};

int main()
{
	unsigned int features, i;

	int netfd = open("/dev/net/tun", O_RDWR);
	if (netfd < 0)
		err(1, "Opening /dev/net/tun");

	if (ioctl(netfd, TUNGETFEATURES, &features) != 0) {
		printf("Kernel does not support TUNGETFEATURES, guessing\n");
		features = (IFF_TUN|IFF_TAP|IFF_NO_PI|IFF_ONE_QUEUE);
	}
	printf("Available features are: ");
	for (i = 0; i < sizeof(known_flags)/sizeof(known_flags[0]); i++) {
		if (features & known_flags[i].flag) {
			features &= ~known_flags[i].flag;
			printf("%s ", known_flags[i].name);
		}
	}
	if (features)
		printf("(UNKNOWN %#x)", features);
	printf("\n");
	return 0;
}
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Acked-by: NMax Krasnyansky <maxk@qualcomm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

07240fd0

bug2833 / cloud-kernel 与 Fork 源项目一致

bug2833 / cloud-kernel
与 Fork 源项目一致