提交 · 375f67df2811aafbb68f5d4f3bd27396023b36dd · openeuler / Kernel

10 3月, 2014 1 次提交

vlan: slight optimization for vlan_do_receive() · 375f67df

由 dingtianhong 提交于 3月 07, 2014

According Joe's suggestion, maybe it'd be faster to add an unlikely to
the test for PCKET_OTHERHOST, so I add it and see whether the performance
could be better, although the differences is so small and negligible, but
it is hard to catch that any lower device would set the skb type to
PACKET_OTHERHOST, so most of time, I think it make sense to add unlikely
for the test.

Cc: Joe Perches <joe@perches.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

375f67df

09 3月, 2014 2 次提交

pkt_sched: fq: do not hold qdisc lock while allocating memory · 2d8d40af

由 Eric Dumazet 提交于 3月 06, 2014

Resizing fq hash table allocates memory while holding qdisc spinlock,
with BH disabled.

This is definitely not good, as allocation might sleep.

We can drop the lock and get it when needed, we hold RTNL so no other
changes can happen at the same time.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Fixes: afe4fd06 ("pkt_sched: fq: Fair Queue packet scheduler")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d8d40af

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · d85ea93f

由 David S. Miller 提交于 3月 08, 2014

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates

This series contains updates to e1000e, ixgbevf and igb.

Majority of this series contains fixes and cleanups to e1000e,
most notably are:

Todd provides a fix to PTP in e1000e which adds a lock in e1000e_phc_adjfreq
to prevent concurrent changes to TIMINCA and SYSTIMH/L.  Then provides an
igb fix to use ARRAY_SIZE for array size calculation.

David provides the remaining e1000e which contain:
 - cleanup of pointer references that are no longer used
 - fix an issue on systems with Management Engine enabled with the
   ethernet cable unplugged
 - fix an issue on 82579 where enabling EEE LPI sooner than one second
   after link up causes link issues on some switches
 - refactor the power management flows to prevent the suspend path from
   being executed twice when hibernating
 - refactor the runtime power management to fix interfering with the
   functionality of Energy Efficient Ethernet when enabled and to fix
   the device from repeatedly flip between suspend and resume with the
   interface administratively downed
 - enable the feature PHY Ultra Low Power Mode which is a power saving
   feature that reduces the power consumption of the PHY when a cable is
   not connected
 - fix the ethtool offline tests for 82579 parts
 - fix SHRA register access for 82579 parts which was introduced by
   previous commit c3a0dce3 "e1000e: fix overrun of PHY RAR array"

Florian provides a fix for ixgbevf where skb->pkt_type was being checked
like a bitmask, but it is not a bitmask.

Fix an issue reported by Stephen Hemminger where there was a warning
about code defined but never used if IGB_HWMON is not defined.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d85ea93f

08 3月, 2014 37 次提交

igb: fix warning if !CONFIG_IGB_HWMON · 9b143d11

由 Jeff Kirsher 提交于 3月 06, 2014

Fix warning about code defined but never used if IGB_HWMON not defined.
Reported-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

9b143d11

igb: fix array size calculation · 72b36727

由 Todd Fujinaka 提交于 3月 04, 2014

Use ARRAY_SIZE for array size calculation.
Signed-off-by: NTodd Fujinaka <todd.fujinaka@intel.com>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

72b36727

ixgbevf: fix skb->pkt_type checks · bd9d5592

由 Florian Fainelli 提交于 2月 28, 2014

skb->pkt_type is not a bitmask, but contains only value at a time from
the range defined in include/uapi/linux/if_packet.h.

Checking it like if it was a bitmask of values would also cause
PACKET_OTHERHOST, PACKET_LOOPBACK and PACKET_FASTROUTE to be matched by
this check since their lower 2 bits are also set, although that does not
fix a real bug, it is still potentially confusing.

This bogus check was introduced in commit 815cccbf ("ixgbe: add setlink,
getlink support to ixgbe and ixgbevf").
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Tested-by: NPhil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

bd9d5592

e1000e: Fix SHRA register access for 82579 · 96dee024

由 David Ertman 提交于 3月 05, 2014

Previous commit c3a0dce3 fixed an overrun for the RAR on i218 devices.
This commit also attempted to homogenize the RAR/SHRA access for all parts
accessed by the e1000e driver. This change introduced an error for
assigning MAC addresses to guest OS's for 82579 devices.

Only RAR[0] is accessible to the driver for 82579 parts, and additional
addresses must be placed into the SHRA[L|H] registers. The rar_entry_count
was changed in the previous commit to an inaccurate value that accounted
for all RAR and SHRA registers, not just the ones usable by the driver.

This patch fixes the count to the correct value and adjusts the
e1000_rar_set_pch2lan() function to user the correct index.

Cc: John Greene <jogreene@redhat.com>
Signed-off-by: NDave Ertman <davidx.m.ertman@intel.com>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

96dee024

e1000e: Fix ethtool offline tests for 82579 parts · ad40064e

由 David Ertman 提交于 3月 05, 2014

Changes to the rar_entry_count value require a change to the indexing
used to access the SHRA[H|L] registers when testing them with
'ethtool -t <iface> offline'
Signed-off-by: NDave Ertman <davidx.m.ertman@intel.com>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

ad40064e

e1000e: Fix not generating an error on invalid load parameter · 5bb73176

由 David Ertman 提交于 2月 26, 2014

Valid values for InterruptThrottleRate are 10-100000, or one of
0, 1, 3, 4.  '2' is not valid.  This is a legacy from the branching
from the e1000 driver code that e1000e was based from.

Prior to this patch, if the e1000e driver  was loaded with a forced
invalid InterruptThrottleRate of '2', then no throttle rate would be
set and no error message generated.

Now, a message will be generated that an invalid value was used and the
value for InterruptThrottleRate will be set to the default value.
Signed-off-by: NDave Ertman <davidx.m.ertman@intel.com>
Tested-by: NJeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

5bb73176

e1000e: Feature Enable PHY Ultra Low Power Mode (ULP) · 74f350ee

由 David Ertman 提交于 2月 22, 2014

ULP is a power saving feature that reduces the power consumption of the
PHY when a cable is not connected.

ULP is gated on the following conditions:
1) The hardware must support ULP.  Currently this is only I218
   devices from Intel
2) ULP is initiated by the driver, so, no driver results in no ULP.
3) ULP's implementation utilizes Runtime Power Management to toggle its
   execution.  ULP is enabled/disabled based on the state of Runtime PM.
4) ULP is not active when wake-on-unicast, multicast or broadcast is active
   as these features are mutually-exclusive.

Since the PHY is in an unavailable state while ULP is active, any access
of the PHY registers will fail.  This is resolved by utilizing kernel
calls that cause the device to exit Runtime PM (e.g. pm_runtime_get_sync)
and then, after PHY access is complete,  allow the device to resume
Runtime PM (e.g. pm_runtime_put_sync).

Under certain conditions, toggling the LANPHYPC is necessary to disable
ULP mode.  Break out existing code to toggle LANPHYPC to a new function
to avoid code duplication.
Signed-off-by: NDave Ertman <davidx.m.ertman@intel.com>
Cc: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: NJeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

74f350ee

e1000e Refactor of Runtime Power Management · 63eb48f1

由 David Ertman 提交于 2月 14, 2014

Fix issues with:
RuntimePM causing the device to repeatedly flip between suspend and resume
with the interface administratively downed.
Having RuntimePM enabled interfering with the functionality of Energy
Efficient Ethernet.

Added checks to disallow functions that should not be executed if the
device is currently runtime suspended

Make runtime_idle callback to use same deterministic behavior as the igb
driver.
Signed-off-by: NDave Ertman <davidx.m.ertman@intel.com>
Acked-by: NBruce Allan <bruce.w.allan@intel.com>
Tested-by: NJeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

63eb48f1

e1000e: Refactor PM flows · 28002099

由 David Ertman 提交于 2月 14, 2014

Refactor the system power management flows to prevent the suspend path from
being executed twice when hibernating since both the freeze and
poweroff callbacks were set to e1000_suspend() via SET_SYSTEM_SLEEP_PM_OPS.
There are HW workarounds that are performed during this flow and calling
them twice was causing erroneous behavior.

Re-arrange the code to take advantage of common code paths and explicitly
set the individual dev_pm_ops callbacks for suspend, resume, freeze,
thaw, poweroff and restore.

Add a boolean parameter (reset) to the e1000e_down function to allow
for cases when the HW should not be reset when downed during a PM event.

Now that all suspend/shutdown paths result in a call to __e1000_shutdown()
that checks Wake on Lan status, removing redundant check for WoL in
e1000_power_down_phy().
Signed-off-by: NDave Ertman <davidx.m.ertman@intel.com>
Acked-by: NBruce Allan <bruce.w.allan@intel.com>
Tested-by: NJeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

28002099

e1000e: Add missing branding strings in ich8lan.c · 3b70d4f8

由 David Ertman 提交于 2月 05, 2014

Branding strings from recently released and soon to be released
hardware configurations that are supported by e1000e.
Signed-off-by: NDave Ertman <davidx.m.ertman@intel.com>
Acked-by: NBruce Allan <bruce.w.allan@intel.com>
Tested-by: NJeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

3b70d4f8

e1000e: Cleanup - Update GPL header and Copyright · e78b80b1

由 David Ertman 提交于 2月 04, 2014

This patch is to update the GPL header by removing the portion that
refers to the Free Software Foundation address.

Change the copyright date for 2014.

Reformat the header comments to conform to kernel networking coding norms
Signed-off-by: NDave Ertman <davidx.m.ertman@intel.com>
Tested-by: NJeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

e78b80b1

e1000e: Fix 82579 sets LPI too early. · a03206ed

由 David Ertman 提交于 1月 24, 2014

Enabling EEE LPI sooner than one second after link up on 82579 causes link
issues with some switches.

Remove EEE enablement for 82579 parts from the link initialization flow to
avoid initializing too early.  EEE initialization for 82579 will be done
in e1000e_update_phy_task.
Signed-off-by: NDave Ertman <davidx.m.ertman@intel.com>
Acked-by: NBruce W Allan <bruce.w.allan@intel.com>
Tested-by: NJeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

a03206ed

e1000e: Resolve issues with Management Engine (ME) briefly blocking PHY resets · f7235ef6

由 David Ertman 提交于 1月 23, 2014

On a ME enabled system with the cable out, the driver init flow would
generate an erroneous message indicating that resets were being blocked
by an active ME session.  Cause was ME clearing the semaphore bit to
block further PHY resets for up to 50 msec during power-on/cycle.  After
this interval, ME would re-set the bit and allow PHY resets.

To resolve this, change the flow of e1000e_phy_hw_reset_generic() to
utilize a delay and retry method.  Poll the FWSM register to minimize
any extra time added to the flow.  If the delay times out at 100ms
(checked in 10msec increments), then return the value E1000_BLK_PHY_RESET,
as this is the accurate state of the PHY.  Attempting to alter just the
call to e1000e_phy_hw_reset_generic() in e1000_init_phy_workarounds_pchlan()
just caused the problem to move further down the flow.
Signed-off-by: NDave Ertman <davidx.m.ertman@intel.com>
Acked-by: NBruce W. Allan <bruce.w.allan@intel.com>
Tested-by: NJeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

f7235ef6

e1000e: Cleanup unecessary references · b485dbae

由 David Ertman 提交于 1月 22, 2014

Cleaning up some pointer references that are no longer necessary
Signed-off-by: NDave Ertman <davidx.m.ertman@intel.com>
Tested-by: NJeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

b485dbae

e1000e: PTP lock in e1000e_phc_adjustfreq · 6c2ed39c

由 Todd Fujinaka 提交于 1月 18, 2014

Add lock in e1000e_phc_adjfreq to prevent concurrent changes to TIMINCA
and SYSTIMH/L.
Signed-off-by: NTodd Fujinaka <todd.fujinaka@intel.com>
Tested-by: NJeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

6c2ed39c

6lowpan: reassembly: fix return of init function · 37147652

由 Alexander Aring 提交于 3月 07, 2014

This patch adds a missing return after fragmentation init. Otherwise we
register a sysctl interface and deregister it afterwards which makes no
sense.
Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

37147652

Merge tag 'linux-can-next-for-3.15-20140307' of git://gitorious.org/linux-can/linux-can-next · d03e9d07

由 David S. Miller 提交于 3月 07, 2014

Marc Kleine-Budde says:

====================
pull-request: can-next 2014-02-12

this is a pull request of twelve patches for net-next/master.

Alexander Shiyan contributes two patches for the mcp251x, one making
the driver more quiet and the other one improves the compile time
coverage by removing the #ifdef CONFIG_PM_SLEEP. Then two patches for
the flexcan driver by me, one removing the #ifdef CONFIG_PM_SLEEP, too,
the other one making use of platform_get_device_id(). Another patch by
me which converts the janz-ican3 driver to use netdev_<level>(). The
remaining 7 patches are by Oliver Hartkopp, they add CAN FD support to
the netlink configuration interface.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d03e9d07

Merge branch 'r8152' · a5d5ff57

由 David S. Miller 提交于 3月 07, 2014

Hayes Wang says:

====================
r8152: tx/rx improvement

 - Select the suitable spin lock for each function.
 - Add additional check to reduce the spin lock.
 - Up the priority of the tx to avoid interrupted by rx.
 - Support rx checksum, large send, and IPv6 hw checksum.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5d5ff57

r8152: support IPv6 · 6128d1bb

由 hayeswang 提交于 3月 07, 2014

Support hw IPv6 checksum for TCP and UDP packets.

Note that the hw has the limitation of the range of the transport
offset. Besides, the TCP Pseudo Header of the IPv6 TSO of the hw
bases on the Microsoft document which excludes the packet length.
Signed-off-by: NHayes Wang <hayeswang@realtek.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6128d1bb

r8152: support TSO · 60c89071

由 hayeswang 提交于 3月 07, 2014

Support scatter gather and TSO.

Adjust the tx checksum function and set the max gso size to fix the
size of the tx aggregation buffer.
Signed-off-by: NHayes Wang <hayeswang@realtek.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60c89071

r8152: support rx checksum · 565cab0a

由 hayeswang 提交于 3月 07, 2014

Support hw rx checksum for TCP and UDP packets.
Signed-off-by: NHayes Wang <hayeswang@realtek.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

565cab0a

r8152: calculate the dropped packets for rx · 5e2f7485

由 hayeswang 提交于 3月 07, 2014

Continue dealing with the remain rx packets, even though the allocation
of the skb fail. This could calculate the correct dropped packets.
Signed-off-by: NHayes Wang <hayeswang@realtek.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e2f7485

r8152: up the priority of the transmission · 0c3121fc

由 hayeswang 提交于 3月 07, 2014

move the tx_bottom() from delayed_work to tasklet. It makes the rx
and tx balanced. If the device is in runtime suspend when getting
the tx packet, wakeup the device before trasmitting.
Signed-off-by: NHayes Wang <hayeswang@realtek.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c3121fc

r8152: check tx agg list before spin lock · 21949ab7

由 hayeswang 提交于 3月 07, 2014

Check tx agg list before spin lock to avoid doing spin lock every
times.
Signed-off-by: NHayes Wang <hayeswang@realtek.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

21949ab7

r8152: replace spin_lock_irqsave and spin_unlock_irqrestore · 2685d410

由 hayeswang 提交于 3月 07, 2014

Use spin_lock and spin_unlock in interrupt context.

The ndo_start_xmit would not be called in interrupt context, so
replace the relative spin_lock_irqsave and spin_unlock_irqrestore
with spin_lock_bh and spin_unlock_bh.
Signed-off-by: NHayes Wang <hayeswang@realtek.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2685d410

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · 91bd66e4

由 David S. Miller 提交于 3月 07, 2014

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates

This series contains updates to i40e and i40evf.

Most notable are:
Joseph completes the implementation of the ethtool ntuple rule
management interface by adding the get, update and delete interface
reset.

Akeem provides a fix to prevent a possible overflow due to multiplication
of number and size by using kzalloc, so use kcalloc.

Jesse provides an implementation for skb_set_hash() and adds the L4 type
return when we know it is an L4 hash.  He also adds a counter to
statistics for Tx timeouts to help users.  Lastly he provides a change
to stay away from the cache line where the done bit may be getting
written back for the transmit ring since the hardware may be writing the
whole cache line for a partial update.

Shannon cleans up code comments.

Anjali removes a firmware workaround for newer firmware since the number
of MSIx vectors are being reported correctly.

v2:
 -  dropped patch 01 of the series based on feedback from the author
    Joe Perches and Shannon Nelson.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

91bd66e4

Merge tag 'rxrpc-devel-20140304' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 38940042

由 David S. Miller 提交于 3月 07, 2014

David Howells says:

====================
net-next: AF_RXRPC fixes and development

Here are some AF_RXRPC fixes:

 (1) Fix to remove incorrect checksum calculation made during recvmsg().  It's
     unnecessary to try to do this there since we check the checksum before
     reading the RxRPC header from the packet.

 (2) Fix to prevent the sending of an ABORT packet in response to another
     ABORT packet and inducing a storm.

 (3) Fix UDP MTU calculation from parsing ICMP_FRAG_NEEDED packets where we
     don't handle the ICMP packet not specifying an MTU size.

And development patches:

 (4) Add sysctls for configuring RxRPC parameters, specifically various delays
     pertaining to ACK generation, the time before we resend a packet for
     which we don't receive an ACK, the maximum time a call is permitted to
     live and the amount of time transport, connection and dead call
     information is cached.

 (5) Improve ACK packet production by adjusting the handling of ACK_REQUESTED
     packets, ignoring the MORE_PACKETS flag, delaying the production of
     otherwise immediate ACK_IDLE packets and delaying all ACK_IDLE production
     (barring the call termination) to half a second.

 (6) Add more sysctl parameters to expose the Rx window size, the maximum
     packet size that we're willing to receive and the number of jumbo rxrpc
     packets we're willing to handle in a single UDP packet.

 (7) Request ACKs on alternate DATA packets so that the other side doesn't
     wait till we fill up the Tx window.

 (8) Use a RCU hash table to look up the rxrpc_call for an incoming packet
     rather than stepping through a hierarchy involving several spinlocks.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

38940042

Merge branch 'xen-netback-next' · 4caeccb4

由 David S. Miller 提交于 3月 07, 2014

Zoltan Kiss says:

====================
xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy

A long known problem of the upstream netback implementation that on the TX
path (from guest to Dom0) it copies the whole packet from guest memory into
Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a
huge perfomance penalty. The classic kernel version of netback used grant
mapping, and to get notified when the page can be unmapped, it used page
destructors. Unfortunately that destructor is not an upstreamable solution.
Ian Campbell's skb fragment destructor patch series [1] tried to solve this
problem, however it seems to be very invasive on the network stack's code,
and therefore haven't progressed very well.
This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
know when the skb is freed up. That is the way KVM solved the same problem,
and based on my initial tests it can do the same for us. Avoiding the extra
copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower AMD
Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node,
running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb
switch)
Based on my investigations the packet get only copied if it is delivered to
Dom0 IP stack through deliver_skb, which is due to this [2] patch. This affects
DomU->Dom0 IP traffic and when Dom0 does routing/NAT for the guest. That's a bit
unfortunate, but luckily it doesn't cause a major regression for this usecase.
In the future we should try to eliminate that copy somehow.
There are a few spinoff tasks which will be addressed in separate patches:
- grant copy the header directly instead of map and memcpy. This should help
  us avoiding TLB flushing
- use something else than ballooned pages
- fix grant map to use page->index properly
I've tried to broke it down to smaller patches, with mixed results, so I
welcome suggestions on that part as well:
1: Use skb->cb to store pending_idx
2: Some refactoring
3: Change RX path for mapped SKB fragments (moved here to keep bisectability,
review it after #4)
4: Introduce TX grant mapping
5: Remove old TX grant copy definitons and fix indentations
6: Add stat counters for zerocopy
7: Handle guests with too many frags
8: Timeout packets in RX path
9: Aggregate TX unmap operations

v2: I've fixed some smaller things, see the individual patches. I've added a
few new stat counters, and handling the important use case when an older guest
sends lots of slots. Instead of delayed copy now we timeout packets on the RX
path, based on the assumption that otherwise packets should get stucked
anywhere else. Finally some unmap batching to avoid too much TLB flush

v3: Apart from fixing a few things mentioned in responses the important change
is the use the hypercall directly for grant [un]mapping, therefore we can
avoid m2p override.

v4: Now we are using a new grant mapping API to avoid m2p_override. The RX queue
timeout logic changed also.

v5: Only minor fixes based on Wei's comments

v6: Important bugfixes for xenvif_poll exit path and zerocopy callback, see
first 2 patches. Also rework of handling packets with too many slots, and
reorder the series a bit.

v7: Small fixes in comments/log messages/error paths, and merging the frag
overflow stats patch into its parent.

[1] http://lwn.net/Articles/491522/
[2] https://lkml.org/lkml/2012/7/20/363
====================
Signed-off-by: NZoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4caeccb4

xen-netback: Aggregate TX unmap operations · e9275f5e

由 Zoltan Kiss 提交于 3月 06, 2014

Unmapping causes TLB flushing, therefore we should make it in the largest
possible batches. However we shouldn't starve the guest for too long. So if
the guest has space for at least two big packets and we don't have at least a
quarter ring to unmap, delay it for at most 1 milisec.
Signed-off-by: NZoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e9275f5e

xen-netback: Timeout packets in RX path · 09350788

由 Zoltan Kiss 提交于 3月 06, 2014

A malicious or buggy guest can leave its queue filled indefinitely, in which
case qdisc start to queue packets for that VIF. If those packets came from an
another guest, it can block its slots and prevent shutdown. To avoid that, we
make sure the queue is drained in every 10 seconds.
The QDisc queue in worst case takes 3 round to flush usually.
Signed-off-by: NZoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

09350788

xen-netback: Handle guests with too many frags · e3377f36

由 Zoltan Kiss 提交于 3月 06, 2014

Xen network protocol had implicit dependency on MAX_SKB_FRAGS. Netback has to
handle guests sending up to XEN_NETBK_LEGACY_SLOTS_MAX slots. To achieve that:
- create a new skb
- map the leftover slots to its frags (no linear buffer here!)
- chain it to the previous through skb_shinfo(skb)->frag_list
- map them
- copy and coalesce the frags into a brand new one and send it to the stack
- unmap the 2 old skb's pages

It's also introduces new stat counters, which help determine how often the guest
sends a packet with more than MAX_SKB_FRAGS frags.

NOTE: if bisect brought you here, you should apply the series up until
"xen-netback: Timeout packets in RX path", otherwise malicious guests can block
other guests by not releasing their sent packets.
Signed-off-by: NZoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e3377f36

xen-netback: Add stat counters for zerocopy · 1bb332af

由 Zoltan Kiss 提交于 3月 06, 2014

These counters help determine how often the buffers had to be copied. Also
they help find out if packets are leaked, as if "sent != success + fail",
there are probably packets never freed up properly.

NOTE: if bisect brought you here, you should apply the series up until
"xen-netback: Timeout packets in RX path", otherwise Windows guests can't work
properly and malicious guests can block other guests by not releasing their sent
packets.
Signed-off-by: NZoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1bb332af

xen-netback: Remove old TX grant copy definitons and fix indentations · 62bad319

由 Zoltan Kiss 提交于 3月 06, 2014

These became obsolete with grant mapping. I've left intentionally the
indentations in this way, to improve readability of previous patches.

NOTE: if bisect brought you here, you should apply the series up until
"xen-netback: Timeout packets in RX path", otherwise Windows guests can't work
properly and malicious guests can block other guests by not releasing their sent
packets.
Signed-off-by: NZoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

62bad319

xen-netback: Introduce TX grant mapping · f53c3fe8

由 Zoltan Kiss 提交于 3月 06, 2014

This patch introduces grant mapping on netback TX path. It replaces grant copy
operations, ditching grant copy coalescing along the way. Another solution for
copy coalescing is introduced in "xen-netback: Handle guests with too many
frags", older guests and Windows can broke before that patch applies.
There is a callback (xenvif_zerocopy_callback) from core stack to release the
slots back to the guests when kfree_skb or skb_orphan_frags called. It feeds a
separate dealloc thread, as scheduling NAPI instance from there is inefficient,
therefore we can't do dealloc from the instance.
Signed-off-by: NZoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f53c3fe8

xen-netback: Handle foreign mapped pages on the guest RX path · 3e2234b3

由 Zoltan Kiss 提交于 3月 06, 2014

RX path need to know if the SKB fragments are stored on pages from another
domain.
Logically this patch should be after introducing the grant mapping itself, as
it makes sense only after that. But to keep bisectability, I moved it here. It
shouldn't change any functionality here. xenvif_zerocopy_callback and
ubuf_to_vif are just stubs here, they will be introduced properly later on.
Signed-off-by: NZoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3e2234b3

xen-netback: Minor refactoring of netback code · 121fa4b7

由 Zoltan Kiss 提交于 3月 06, 2014

This patch contains a few bits of refactoring before introducing the grant
mapping changes:
- introducing xenvif_tx_pending_slots_available(), as this is used several
  times, and will be used more often
- rename the thread to vifX.Y-guest-rx, to signify it does RX work from the
  guest point of view
Signed-off-by: NZoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

121fa4b7

xen-netback: Use skb->cb for pending_idx · 8f13dd96

由 Zoltan Kiss 提交于 3月 06, 2014

Storing the pending_idx at the first byte of the linear buffer never looked
good, skb->cb is a more proper place for this. It also prevents the header to
be directly grant copied there, and we don't have the pending_idx after we
copied the header here, so it's time to change it.
It also introduces helpers for the RX side
Signed-off-by: NZoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8f13dd96

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功