提交 · eac779aa509d453a55da0ea4302bdb79c4e0854f · openanolis / cloud-kernel

You need to sign in or sign up before continuing.

31 8月, 2017 1 次提交

xen-netfront: be more drop monitor friendly · 62f3250f

由 Eric Dumazet 提交于 8月 30, 2017

xennet_start_xmit() might copy skb with inappropriate layout
into a fresh one.

Old skb is freed, and at this point it is not a drop, but
a consume. New skb will then be either consumed or dropped.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

62f3250f

12 5月, 2017 1 次提交

xen-netfront: avoid crashing on resume after a failure in talk_to_netback() · d86b5672

由 Vitaly Kuznetsov 提交于 5月 11, 2017

Unavoidable crashes in netfront_resume() and netback_changed() after a
previous fail in talk_to_netback() (e.g. when we fail to read MAC from
xenstore) were discovered. The failure path in talk_to_netback() does
unregister/free for netdev but we don't reset drvdata and we try accessing
it after resume.

Fix the bug by removing the whole xen device completely with
device_unregister(), this guarantees we won't have any calls into netfront
after a failure.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d86b5672

11 2月, 2017 1 次提交

xen-netfront: Delete rx_refill_timer in xennet_disconnect_backend() · 74470954

由 Boris Ostrovsky 提交于 1月 30, 2017

rx_refill_timer should be deleted as soon as we disconnect from the
backend since otherwise it is possible for the timer to go off before
we get to xennet_destroy_queues(). If this happens we may dereference
queue->rx.sring which is set to NULL in xennet_disconnect_backend().
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
CC: stable@vger.kernel.org
Reviewed-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

74470954

10 2月, 2017 2 次提交

xen-netfront: Improve error handling during initialization · e2e004ac

由 Ross Lagerwall 提交于 2月 08, 2017

This fixes a crash when running out of grant refs when creating many
queues across many netdevs.

* If creating queues fails (i.e. there are no grant refs available),
call xenbus_dev_fatal() to ensure that the xenbus device is set to the
closed state.
* If no queues are created, don't call xennet_disconnect_backend as
netdev->real_num_tx_queues will not have been set correctly.
* If setup_netfront() fails, ensure that all the queues created are
cleaned up, not just those that have been set up.
* If any queues were set up and an error occurs, call
xennet_destroy_queues() to clean up the napi context.
* If any fatal error occurs, unregister and destroy the netdev to avoid
leaving around a half setup network device.
Signed-off-by: NRoss Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e2e004ac

xen-netfront: Rework the fix for Rx stall during OOM and network stress · 538d9291

由 Vineeth Remanan Pillai 提交于 2月 07, 2017

The commit 90c311b0 ("xen-netfront: Fix Rx stall during network
stress and OOM") caused the refill timer to be triggerred almost on
all invocations of xennet_alloc_rx_buffers for certain workloads.
This reworks the fix by reverting to the old behaviour and taking into
consideration the skb allocation failure. Refill timer is now triggered
on insufficient requests or skb allocation failure.
Signed-off-by: NVineeth Remanan Pillai <vineethp@amazon.com>
Fixes: 90c311b0 (xen-netfront: Fix Rx stall during network stress and OOM)
Reported-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

538d9291

31 1月, 2017 1 次提交

drivers: net: generalize napi_complete_done() · 6ad20165

由 Eric Dumazet 提交于 1月 30, 2017

napi_complete_done() allows to opt-in for gro_flush_timeout,
added back in linux-3.19, commit 3b47d303
("net: gro: add a per device gro flush timer")

This allows for more efficient GRO aggregation without
sacrifying latencies.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6ad20165

30 1月, 2017 1 次提交

xen/netfront: set default upper limit of tx/rx queues to 8 · 034702a6

由 Juergen Gross 提交于 1月 10, 2017

The default for the number of tx/rx queues of one interface is the
number of vcpus of the system today. As each queue pair reserves 512
grant pages this default consumes a ridiculous number of grants for
large guests.

Limit the queue number to 8 as default. This value can be modified
via a module parameter if required.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

034702a6

21 1月, 2017 1 次提交

xen-netfront: Fix Rx stall during network stress and OOM · 90c311b0

由 Vineeth Remanan Pillai 提交于 1月 19, 2017

During an OOM scenario, request slots could not be created as skb
allocation fails. So the netback cannot pass in packets and netfront
wrongly assumes that there is no more work to be done and it disables
polling. This causes Rx to stall.

The issue is with the retry logic which schedules the timer if the
created slots are less than NET_RX_SLOTS_MIN. The count of new request
slots to be pushed are calculated as a difference between new req_prod
and rsp_cons which could be more than the actual slots, if there are
unconsumed responses.

The fix is to calculate the count of newly created slots as the
difference between new req_prod and old req_prod.
Signed-off-by: NVineeth Remanan Pillai <vineethp@amazon.com>
Reviewed-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

90c311b0

09 1月, 2017 1 次提交

net: make ndo_get_stats64 a void function · bc1f4470

由 stephen hemminger 提交于 1月 06, 2017

The network device operation for reading statistics is only called
in one place, and it ignores the return value. Having a structure
return value is potentially confusing because some future driver could
incorrectly assume that the return value was used.

Fix all drivers with ndo_get_stats64 to have a void function.
Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc1f4470

07 11月, 2016 1 次提交

xen: make use of xenbus_read_unsigned() in xen-netfront · 2890ea5c

由 Juergen Gross 提交于 10月 31, 2016

Use xenbus_read_unsigned() instead of xenbus_scanf() when possible.
This requires to change the type of some reads from int to unsigned,
but these cases have been wrong before: negative values are not allowed
for the modified cases.

Cc: netdev@vger.kernel.org
Signed-off-by: NJuergen Gross <jgross@suse.com>
Acked-by: NDavid Vrabel <david.vrabel@citrix.com>

2890ea5c

03 11月, 2016 1 次提交

xen-netfront: cast grant table reference first to type int · 269ebce4

由 Dongli Zhang 提交于 11月 02, 2016

IS_ERR_VALUE() in commit 87557efc
("xen-netfront: do not cast grant table reference to signed short") would
not return true for error code unless we cast ref first to type int.
Signed-off-by: NDongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

269ebce4

01 11月, 2016 1 次提交

xen-netfront: do not cast grant table reference to signed short · 87557efc

由 Dongli Zhang 提交于 10月 31, 2016

While grant reference is of type uint32_t, xen-netfront erroneously casts
it to signed short in BUG_ON().

This would lead to the xen domU panic during boot-up or migration when it
is attached with lots of paravirtual devices.
Signed-off-by: NDongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

87557efc

21 10月, 2016 1 次提交

net: use core MTU range checking in virt drivers · d0c2c997

由 Jarod Wilson 提交于 10月 20, 2016

hyperv_net:
- set min/max_mtu, per Haiyang, after rndis_filter_device_add

virtio_net:
- set min/max_mtu
- remove virtnet_change_mtu

vmxnet3:
- set min/max_mtu

xen-netback:
- min_mtu = 0, max_mtu = 65517

xen-netfront:
- min_mtu = 0, max_mtu = 65535

unisys/visor:
- clean up defines a little to not clash with network core or add
  redundat definitions

CC: netdev@vger.kernel.org
CC: virtualization@lists.linux-foundation.org
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: "Michael S. Tsirkin" <mst@redhat.com>
CC: Shrikrishna Khare <skhare@vmware.com>
CC: "VMware, Inc." <pv-drivers@vmware.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Paul Durrant <paul.durrant@citrix.com>
CC: David Kershner <david.kershner@unisys.com>
Signed-off-by: NJarod Wilson <jarod@redhat.com>
Reviewed-by: NHaiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d0c2c997

20 9月, 2016 1 次提交

xen-netfront: avoid packet loss when ethernet header crosses page boundary · fd07160b

由 Vitaly Kuznetsov 提交于 9月 19, 2016

Small packet loss is reported on complex multi host network configurations
including tunnels, NAT, ... My investigation led me to the following check
in netback which drops packets:

        if (unlikely(txreq.size < ETH_HLEN)) {
                netdev_err(queue->vif->dev,
                           "Bad packet size: %d\n", txreq.size);
                xenvif_tx_err(queue, &txreq, extra_count, idx);
                break;
        }

But this check itself is legitimate. SKBs consist of a linear part (which
has to have the ethernet header) and (optionally) a number of frags.
Netfront transmits the head of the linear part up to the page boundary
as the first request and all the rest becomes frags so when we're
reconstructing the SKB in netback we can't distinguish between original
frags and the 'tail' of the linear part. The first SKB needs to be at
least ETH_HLEN size. So in case we have an SKB with its linear part
starting too close to the page boundary the packet is lost.

I see two ways to fix the issue:
- Change the 'wire' protocol between netfront and netback to start keeping
  the original SKB structure. We'll have to add a flag indicating the fact
  that the particular request is a part of the original linear part and not
  a frag. We'll need to know the length of the linear part to pre-allocate
  memory.
- Avoid transmitting SKBs with linear parts starting too close to the page
  boundary. That seems preferable short-term and shouldn't bring
  significant performance degradation as such packets are rare. That's what
  this patch is trying to achieve with skb_copy().
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: NDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd07160b

29 1月, 2016 1 次提交

xen-netfront: request Tx response events more often · 7d0105b5

由 Malcolm Crossley 提交于 1月 26, 2016

Trying to batch Tx response events results in poor performance because
this delays freeing the transmitted skbs.

Instead use the standard RING_FINAL_CHECK_FOR_RESPONSES() macro to be
notified once the next Tx response is placed on the ring.
Signed-off-by: NMalcolm Crossley <malcolm.crossley@citrix.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7d0105b5

23 10月, 2015 1 次提交

net/xen-netfront: Make it running on 64KB page granularity · 30c5d7f0

由 Julien Grall 提交于 4月 10, 2015

The PV network protocol is using 4KB page granularity. The goal of this
patch is to allow a Linux using 64KB page granularity using network
device on a non-modified Xen.

It's only necessary to adapt the ring size and break skb data in small
chunk of 4KB. The rest of the code is relying on the grant table code.

Note that we allocate a Linux page for each rx skb but only the first
4KB is used. We may improve the memory usage by extending the size of
the rx skb.
Signed-off-by: NJulien Grall <julien.grall@citrix.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

30c5d7f0

21 10月, 2015 1 次提交

xen-netfront: update num_queues to real created · ca88ea12

由 Joe Jin 提交于 10月 19, 2015

Sometimes xennet_create_queues() may failed to created all requested
queues, we need to update num_queues to real created to avoid NULL
pointer dereference.
Signed-off-by: NJoe Jin <joe.jin@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: David S. Miller <davem@davemloft.net>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ca88ea12

21 9月, 2015 1 次提交

xen-netfront: always set num queues if possible · 812494d9

由 chas williams 提交于 9月 16, 2015

If netfront connects with two (or more) queues and then reconnects with
only one queue it fails to delete or rewrite the multi-queue-num-queues
key and netback will try to use the wrong number of queues.

Always write the num-queues field if the backend has multi-queue support.
Signed-off-by: NChas Williams <3chas3@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

812494d9

11 9月, 2015 1 次提交

xen-netfront: respect user provided max_queues · 32a84405

由 Wei Liu 提交于 9月 10, 2015

Originally that parameter was always reset to num_online_cpus during
module initialisation, which renders it useless.

The fix is to only set max_queues to num_online_cpus when user has not
provided a value.
Signed-off-by: NWei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
Tested-by: NDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

32a84405

09 9月, 2015 1 次提交

xen: Use correctly the Xen memory terminologies · 0df4f266

由 Julien Grall 提交于 8月 07, 2015

Based on include/xen/mm.h [1], Linux is mistakenly using MFN when GFN
is meant, I suspect this is because the first support for Xen was for
PV. This resulted in some misimplementation of helpers on ARM and
confused developers about the expected behavior.

For instance, with pfn_to_mfn, we expect to get an MFN based on the name.
Although, if we look at the implementation on x86, it's returning a GFN.

For clarity and avoid new confusion, replace any reference to mfn with
gfn in any helpers used by PV drivers. The x86 code will still keep some
reference of pfn_to_mfn which may be used by all kind of guests
No changes as been made in the hypercall field, even
though they may be invalid, in order to keep the same as the defintion
in xen repo.

Note that page_to_mfn has been renamed to xen_page_to_gfn to avoid a
name to close to the KVM function gfn_to_page.

Take also the opportunity to simplify simple construction such
as pfn_to_mfn(page_to_pfn(page)) into xen_page_to_gfn. More complex clean up
will come in follow-up patches.

[1] http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=e758ed14f390342513405dd766e874934573e6cbSigned-off-by: NJulien Grall <julien.grall@citrix.com>
Reviewed-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: NWei Liu <wei.liu2@citrix.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

0df4f266

29 8月, 2015 1 次提交

net/xen-netfront: only napi_synchronize() if running · 274b0455

由 Chas Williams 提交于 8月 27, 2015

If an interface isn't running napi_synchronize() will hang forever.

[  392.248403] rmmod           R  running task        0   359    343 0x00000000
[  392.257671]  ffff88003760fc88 ffff880037193b40 ffff880037193160 ffff88003760fc88
[  392.267644]  ffff880037610000 ffff88003760fcd8 0000000100014c22 ffffffff81f75c40
[  392.277524]  0000000000bc7010 ffff88003760fca8 ffffffff81796927 ffffffff81f75c40
[  392.287323] Call Trace:
[  392.291599]  [<ffffffff81796927>] schedule+0x37/0x90
[  392.298553]  [<ffffffff8179985b>] schedule_timeout+0x14b/0x280
[  392.306421]  [<ffffffff810f91b9>] ? irq_free_descs+0x69/0x80
[  392.314006]  [<ffffffff811084d0>] ? internal_add_timer+0xb0/0xb0
[  392.322125]  [<ffffffff81109d07>] msleep+0x37/0x50
[  392.329037]  [<ffffffffa00ec79a>] xennet_disconnect_backend.isra.24+0xda/0x390 [xen_netfront]
[  392.339658]  [<ffffffffa00ecadc>] xennet_remove+0x2c/0x80 [xen_netfront]
[  392.348516]  [<ffffffff81481c69>] xenbus_dev_remove+0x59/0xc0
[  392.356257]  [<ffffffff814e7217>] __device_release_driver+0x87/0x120
[  392.364645]  [<ffffffff814e7cf8>] driver_detach+0xb8/0xc0
[  392.371989]  [<ffffffff814e6e69>] bus_remove_driver+0x59/0xe0
[  392.379883]  [<ffffffff814e84f0>] driver_unregister+0x30/0x70
[  392.387495]  [<ffffffff814814b2>] xenbus_unregister_driver+0x12/0x20
[  392.395908]  [<ffffffffa00ed89b>] netif_exit+0x10/0x775 [xen_netfront]
[  392.404877]  [<ffffffff81124e08>] SyS_delete_module+0x1d8/0x230
[  392.412804]  [<ffffffff8179a8ee>] system_call_fastpath+0x12/0x71
Signed-off-by: NChas Williams <3chas3@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

274b0455

24 8月, 2015 1 次提交

net/xen-netfront: only clean up queues if present · 9a873c71

由 Chas Williams 提交于 8月 19, 2015

If you simply load and unload the module without starting the interfaces,
the queues are never created and you get a bad pointer dereference.
Signed-off-by: NChas Williams <3chas3@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a873c71

29 6月, 2015 1 次提交

xen-netfront: Remove the meaningless code · 905726c1

由 Li, Liang Z 提交于 6月 27, 2015

The function netif_set_real_num_tx_queues() will return -EINVAL if
the second parameter < 1, so call this function with the second
parameter set to 0 is meaningless.
Signed-off-by: NLiang Li <liang.z.li@intel.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

905726c1

22 6月, 2015 1 次提交

net/xen-netfront: Correct printf format in xennet_get_responses · 6c10127d

由 Julien Grall 提交于 6月 16, 2015

rx->status is an int16_t, print it using %d rather than %u in order to
have a meaningful value when the field is negative.

Also use %u rather than %x for rx->offset.
Signed-off-by: NJulien Grall <julien.grall@citrix.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: netdev@vger.kernel.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c10127d

17 6月, 2015 1 次提交

xen: Include xen/page.h rather than asm/xen/page.h · a9fd60e2

由 Julien Grall 提交于 6月 17, 2015

Using xen/page.h will be necessary later for using common xen page
helpers.

As xen/page.h already include asm/xen/page.h, always use the later.
Signed-off-by: NJulien Grall <julien.grall@citrix.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: netdev@vger.kernel.org
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

a9fd60e2

01 6月, 2015 1 次提交

xen-netfront: Use setup_timer · 493be55a

由 Vaishali Thakkar 提交于 6月 01, 2015

Use the timer API function setup_timer instead of structure field
assignments to initialize a timer.

A simplified version of the Coccinelle semantic patch that performs
this transformation is as follows:

@change@
expression e, func, da;
@@

-init_timer (&e);
+setup_timer (&e, func, da);
-e.data = da;
-e.function = func;
Signed-off-by: NVaishali Thakkar <vthakkar1994@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

493be55a

28 5月, 2015 1 次提交

xen-netfront: properly destroy queues when removing device · ad068118

由 David Vrabel 提交于 5月 27, 2015

xennet_remove() freed the queues before freeing the netdevice which
results in a use-after-free when free_netdev() tries to delete the
napi instances that have already been freed.

Fix this by fully destroy the queues (which includes deleting the napi
instances) before freeing the netdevice.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad068118

18 4月, 2015 1 次提交

net: remove unused 'dev' argument from netif_needs_gso() · 8b86a61d

由 Johannes Berg 提交于 4月 17, 2015

In commit 04ffcb25 ("net: Add ndo_gso_check") Tom originally
added the 'dev' argument to be able to call ndo_gso_check().

Then later, when generalizing this in commit 5f35227e
("net: Generalize ndo_gso_check to ndo_features_check")
Jesse removed the call to ndo_gso_check() in netif_needs_gso()
by calling the new ndo_features_check() in a different place.
This made the 'dev' argument unused.

Remove the unused argument and go back to the code as before.

Cc: Tom Herbert <therbert@google.com>
Cc: Jesse Gross <jesse@nicira.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8b86a61d

15 4月, 2015 1 次提交

xenbus_client: Extend interface to support multi-page ring · ccc9d90a

由 Wei Liu 提交于 4月 03, 2015

Originally Xen PV drivers only use single-page ring to pass along
information. This might limit the throughput between frontend and
backend.

The patch extends Xenbus driver to support multi-page ring, which in
general should improve throughput if ring is the bottleneck. Changes to
various frontend / backend to adapt to the new interface are also
included.

Affected Xen drivers:
* blkfront/back
* netfront/back
* pcifront/back
* scsifront/back
* vtpmfront

The interface is documented, as before, in xenbus_client.c.
Signed-off-by: NWei Liu <wei.liu2@citrix.com>
Signed-off-by: NPaul Durrant <paul.durrant@citrix.com>
Signed-off-by: NBob Liu <bob.liu@oracle.com>
Cc: Konrad Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

ccc9d90a

03 4月, 2015 1 次提交

xen-netfront: transmit fully GSO-sized packets · 0c36820e

由 Jonathan Davies 提交于 3月 31, 2015

xen-netfront limits transmitted skbs to be at most 44 segments in size. However,
GSO permits up to 65536 bytes, which means a maximum of 45 segments of 1448
bytes each. This slight reduction in the size of packets means a slight loss in
efficiency.

Since c/s 9ecd1a75, xen-netfront sets gso_max_size to
XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER,
where XEN_NETIF_MAX_TX_SIZE is 65535 bytes.

The calculation used by tcp_tso_autosize (and also tcp_xmit_size_goal since c/s
6c09fa09) in determining when to split an skb into two is
sk->sk_gso_max_size - 1 - MAX_TCP_HEADER.

So the maximum permitted size of an skb is calculated to be
(XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER) - 1 - MAX_TCP_HEADER.

Intuitively, this looks like the wrong formula -- we don't need two TCP headers.
Instead, there is no need to deviate from the default gso_max_size of 65536 as
this already accommodates the size of the header.

Currently, the largest skb transmitted by netfront is 63712 bytes (44 segments
of 1448 bytes each), as observed via tcpdump. This patch makes netfront send
skbs of up to 65160 bytes (45 segments of 1448 bytes each).

Similarly, the maximum allowable mtu does not need to subtract MAX_TCP_HEADER as
it relates to the size of the whole packet, including the header.

Fixes: 9ecd1a75 ("xen-netfront: reduce gso_max_size to account for max TCP header")
Signed-off-by: NJonathan Davies <jonathan.davies@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c36820e

05 2月, 2015 1 次提交

xen-netfront: Use static attribute groups for sysfs entries · 27b917e5

由 Takashi Iwai 提交于 2月 04, 2015

Instead of manual calls of device_create_file() and
device_remove_files(), assign the static attribute groups to netdev
groups array.  This simplifies the code and avoids the possible
races.
Signed-off-by: NTakashi Iwai <tiwai@suse.de>
Acked-by: NDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

27b917e5

14 1月, 2015 3 次提交

xen-netfront: refactor making Tx requests · a55e8bb8

由 David Vrabel 提交于 1月 13, 2015

Eliminate all the duplicate code for making Tx requests by
consolidating them into a single xennet_make_one_txreq() function.

xennet_make_one_txreq() and xennet_make_txreqs() work with pages and
offsets so it will be easier to make netfront handle highmem frags in
the future.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a55e8bb8

xen-netfront: refactor skb slot counting · e84448d5

由 David Vrabel 提交于 1月 13, 2015

A function to count the number of slots an skb needs is more useful
than one that counts the slots needed for only the frags.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e84448d5

xen-netfront: use different locks for Rx and Tx stats · 900e1833

由 David Vrabel 提交于 1月 13, 2015

In netfront the Rx and Tx path are independent and use different
locks.  The Tx lock is held with hard irqs disabled, but Rx lock is
held with only BH disabled.  Since both sides use the same stats lock,
a deadlock may occur.

  [ INFO: possible irq lock inversion dependency detected ]
  3.16.2 #16 Not tainted
  ---------------------------------------------------------
  swapper/0/0 just changed the state of lock:
   (&(&queue->tx_lock)->rlock){-.....}, at: [<c03adec8>]
  xennet_tx_interrupt+0x14/0x34
  but this lock took another, HARDIRQ-unsafe lock in the past:
   (&stat->syncp.seq#2){+.-...}
  and interrupts could create inverse lock ordering between them.
  other info that might help us debug this:
   Possible interrupt unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(&stat->syncp.seq#2);
                                 local_irq_disable();
                                 lock(&(&queue->tx_lock)->rlock);
                                 lock(&stat->syncp.seq#2);
    <Interrupt>
      lock(&(&queue->tx_lock)->rlock);

Using separate locks for the Rx and Tx stats fixes this deadlock.
Reported-by: NDmitry Piotrovsky <piotrovskydmitry@gmail.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

900e1833

13 1月, 2015 1 次提交

drivers: net: xen-netfront: remove residual dead code · dd2e8bf5

由 Vincenzo Maffione 提交于 1月 10, 2015

This patch removes some unused arrays from the netfront private
data structures. These arrays were used in "flip" receive mode.
Signed-off-by: NVincenzo Maffione <v.maffione@gmail.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dd2e8bf5

17 12月, 2014 1 次提交

xen-netfront: use napi_complete() correctly to prevent Rx stalling · 6a6dc08f

由 David Vrabel 提交于 12月 16, 2014

After d75b1ade (net: less interrupt
masking in NAPI) the napi instance is removed from the per-cpu list
prior to calling the n->poll(), and is only requeued if all of the
budget was used.  This inadvertently broke netfront because netfront
does not use NAPI correctly.

If netfront had not used all of its budget it would do a final check
for any Rx responses and avoid calling napi_complete() if there were
more responses.  It would still return under budget so it would never
be rescheduled.  The final check would also not re-enable the Rx
interrupt.

Additionally, xenvif_poll() would also call napi_complete() /after/
enabling the interrupt.  This resulted in a race between the
napi_complete() and the napi_schedule() in the interrupt handler.  The
use of local_irq_save/restore() avoided by race iff the handler is
running on the same CPU but not if it was running on a different CPU.

Fix both of these by always calling napi_compete() if the budget was
not all used, and then calling napi_schedule() if the final checks
says there's more work.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6a6dc08f

10 12月, 2014 1 次提交

xen-netfront: use correct linear area after linearizing an skb · 11d3d2a1

由 David Vrabel 提交于 12月 09, 2014

Commit 97a6d1bb (xen-netfront: Fix
handling packets on compound pages with skb_linearize) attempted to
fix a problem where an skb that would have required too many slots
would be dropped causing TCP connections to stall.

However, it filled in the first slot using the original buffer and not
the new one and would use the wrong offset and grant access to the
wrong page.

Netback would notice the malformed request and stop all traffic on the
VIF, reporting:

    vif vif-3-0 vif3.0: txreq.offset: 85e, size: 4002, end: 6144
    vif vif-3-0 vif3.0: fatal error; disabling device
Reported-by: NAnthony Wright <anthony@overnetdata.com>
Tested-by: NAnthony Wright <anthony@overnetdata.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

11d3d2a1

03 12月, 2014 1 次提交

xen-netfront: Remove BUGs on paged skb data which crosses a page boundary · 8d609725

由 Seth Forshee 提交于 11月 25, 2014

These BUGs can be erroneously triggered by frags which refer to
tail pages within a compound page. The data in these pages may
overrun the hardware page while still being contained within the
compound page, but since compound_order() evaluates to 0 for tail
pages the assertion fails. The code already iterates through
subsequent pages correctly in this scenario, so the BUGs are
unnecessary and can be removed.

Fixes: f36c3747 ("xen/netfront: handle compound page fragments on transmit")
Cc: <stable@vger.kernel.org> # 3.7+
Signed-off-by: NSeth Forshee <seth.forshee@canonical.com>
Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d609725

27 10月, 2014 1 次提交

xen-netfront: always keep the Rx ring full of requests · 1f3c2eba

由 David Vrabel 提交于 10月 22, 2014

A full Rx ring only requires 1 MiB of memory.  This is not enough
memory that it is useful to dynamically scale the number of Rx
requests in the ring based on traffic rates, because:

a) Even the full 1 MiB is a tiny fraction of a typically modern Linux
   VM (for example, the AWS micro instance still has 1 GiB of memory).

b) Netfront would have used up to 1 MiB already even with moderate
   data rates (there was no adjustment of target based on memory
   pressure).

c) Small VMs are going to typically have one VCPU and hence only one
   queue.

Keeping the ring full of Rx requests handles bursty traffic better
than trying to converge on an optimal number of requests to keep
filled.

On a 4 core host, an iperf -P 64 -t 60 run from dom0 to a 4 VCPU guest
improved from 5.1 Gbit/s to 5.6 Gbit/s.  Gains with more bursty
traffic are expected to be higher.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f3c2eba

16 10月, 2014 1 次提交

net: Add ndo_gso_check · 04ffcb25

由 Tom Herbert 提交于 10月 14, 2014

Add ndo_gso_check which a device can define to indicate whether is
is capable of doing GSO on a packet. This funciton would be called from
the stack to determine whether software GSO is needed to be done. A
driver should populate this function if it advertises GSO types for
which there are combinations that it wouldn't be able to handle. For
instance a device that performs UDP tunneling might only implement
support for transparent Ethernet bridging type of inner packets
or might have limitations on lengths of inner headers.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04ffcb25

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功