提交 · 6d9c02171a8081f3746ea8cf7c67b0063a2272db · openeuler / Kernel

27 2月, 2018 12 次提交

ixgbevf: add build_skb support · 6d9c0217

由 Emil Tantilov 提交于 1月 30, 2018

Add support for build_skb() similar to:
commit 6f429223 ("ixgbe: Add support for build_skb")
Signed-off-by: NEmil Tantilov <emil.s.tantilov@intel.com>
Tested-by: NKrishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

6d9c0217

ixgbevf: break out Rx buffer page management · 925f5690

由 Emil Tantilov 提交于 1月 30, 2018

Based on commit e0142726 ("igb: Break out Rx buffer page management")

Consolidate Rx code paths to reduce duplication when we expand them in
the future.
Signed-off-by: NEmil Tantilov <emil.s.tantilov@intel.com>
Tested-by: NKrishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

925f5690

ixgbevf: allocate the rings as part of q_vector · 21c046e4

由 Emil Tantilov 提交于 1月 30, 2018

Make it so that all rings allocations are made as part of q_vector.
The advantage to this is that we can keep all of the memory related to
a single interrupt in one page.

The goal is to bring the logic of handling rings closer to ixgbe.
Signed-off-by: NEmil Tantilov <emil.s.tantilov@intel.com>
Tested-by: NKrishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

21c046e4

ixgbevf: make sure all frames fit minimum size requirements · 5cc0f1c0

由 Emil Tantilov 提交于 1月 30, 2018

Similar to commit a50c29dd
("ixgbe: Make certain that all frames fit minimum size requirements")

Make sure that any packet we attempt to transmit will meet minimum
size requirements.
Signed-off-by: NEmil Tantilov <emil.s.tantilov@intel.com>
Tested-by: NKrishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

5cc0f1c0

ixgbevf: add support for padding packet · 1ab37e12

由 Emil Tantilov 提交于 1月 30, 2018

Following the logic from commit 2de6aa3a
("ixgbe: Add support for padding packet")

Add support for providing a buffer with headroom and tail room
to allow for shared info, NET_SKB_PAD, and NET_IP_ALIGN.  With this
combined with the DMA changes we can start using build_skb to build frames
around an incoming Rx buffer instead of having to memcpy the headers.
Signed-off-by: NEmil Tantilov <emil.s.tantilov@intel.com>
Tested-by: NKrishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

1ab37e12

ixgbevf: setup queue counts · f2d00eca

由 Emil Tantilov 提交于 1月 30, 2018

Add calls for netif_set_real_num_t/rx_queues() in ixgbevf_open().
Make sure that calls to ixgbevf_open() are rtnl protected and improve
the error handling when setting up multiple queues.
Signed-off-by: NEmil Tantilov <emil.s.tantilov@intel.com>
Tested-by: NKrishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

f2d00eca

ixgbevf: add support for using order 1 pages to receive large frames · f15c5ba5

由 Emil Tantilov 提交于 1月 30, 2018

Based on commit 8649aaef
("igb: Add support for using order 1 pages to receive large frames")

Add support for using 3K buffers in order 1 page. We are reserving 1K for
now to have space available for future tail room and head room when we
enable build_skb support.
Signed-off-by: NEmil Tantilov <emil.s.tantilov@intel.com>
Tested-by: NKrishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

f15c5ba5

ixgbevf: add ethtool private flag for legacy Rx · bc04347f

由 Emil Tantilov 提交于 1月 30, 2018

Introduce legacy-rx private flag that will allow switching between the
old and new (build_skb based) Rx code paths. The implementation is the
same as in commit e0891298
("igb: Add support for ethtool private flag to allow use of legacy Rx")

This provides a means of validating the legacy Rx path in the event that
we are forced to fall back.  At some point in the future when we are
convinced we don't need it anymore we might be able to drop the legacy-rx
flag.
Signed-off-by: NEmil Tantilov <emil.s.tantilov@intel.com>
Tested-by: NKrishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

bc04347f

ixgbevf: use page_address offset from page · 9913db03

由 Emil Tantilov 提交于 1月 30, 2018

Based on commit 3456fd53
("igb: Use page_address offset from page instead of masking virtual address")

Update the handling of page addresses so that we always refer to them using
a void pointer, and try to use the consistent name of va indicating we are
working with a virtual address.
Signed-off-by: NEmil Tantilov <emil.s.tantilov@intel.com>
Tested-by: NKrishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

9913db03

ixgbe: prevent ptp_rx_hang from running when in FILTER_ALL mode · 6704a3ab

由 Jacob Keller 提交于 1月 29, 2018

On hardware which supports timestamping all packets, the timestamps are
recorded in the packet buffer, and the driver no longer uses or reads
the registers. This makes the logic for checking and clearing Rx
timestamp hangs meaningless.

If we run the ixgbe_ptp_rx_hang() function in this case, then the driver
will continuously spam the log output with "Clearing Rx timestamp hang".
These messages are spurious, and confusing to end users.

The original code in commit a9763f3c ("ixgbe: Update PTP to support
X550EM_x devices", 2015-12-03) did have a flag PTP_RX_TIMESTAMP_IN_REGISTER
which was intended to be used to avoid the Rx timestamp hang check,
however it did not actually check the flag before calling the function.

Do so now in order to stop the checks and prevent the spurious log
messages.

Fixes: a9763f3c ("ixgbe: Update PTP to support X550EM_x devices", 2015-12-03)
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

6704a3ab

ixgbe: Avoid to write the RETA table when unnecessary · 60f4b645

由 Tonghao Zhang 提交于 1月 28, 2018

If indir == 0 in the ixgbe_set_rxfh(), it is unnecessary
to write the HW. Because redirection table is not changed.
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

60f4b645

ixgbe: remove redundant initialization of 'pool' · 8f611fb0

由 Colin Ian King 提交于 1月 16, 2018

Variable pool is being assigned zero and then in the following for-loop
is it being set to zero again. Remove the redundant first assignment.

Cleans up clang warning:
drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c:61:2: warning: Value stored
to 'pool' is never read
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

8f611fb0

24 2月, 2018 4 次提交

net: fib_rules: Add new attribute to set protocol · 1b71af60

由 Donald Sharp 提交于 2月 23, 2018

For ages iproute2 has used `struct rtmsg` as the ancillary header for
FIB rules and in the process set the protocol value to RTPROT_BOOT.
Until ca56209a66 ("net: Allow a rule to track originating protocol")
the kernel rules code ignored the protocol value sent from userspace
and always returned 0 in notifications. To avoid incompatibility with
existing iproute2, send the protocol as a new attribute.

Fixes: cac56209 ("net: Allow a rule to track originating protocol")
Signed-off-by: NDonald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1b71af60

r8169: simplify and improve check for dash · 9dbe7896

由 Heiner Kallweit 提交于 2月 22, 2018

r8168_check_dash() returns false anyway for all chip versions not
supporting dash. So we can simplify the check conditions.

In addition change the check functions to return bool instead of int,
because they actually return a bool value.
Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9dbe7896

r8169: disable WOL per default · 7edf6d31

由 Heiner Kallweit 提交于 2月 22, 2018

Currently, if BIOS enables WOL in the chip, settings are inconsistent
because the device isn't marked as wakeup-enabled (if not done
explicitly via userspace tools). This causes issues with suspend/
resume because mdio_bus_phy_may_suspend() checks whether device is
wakeup-enabled. In detail MDIO bus access in phy_suspend() can fail
because the MDIO bus is disabled.

In the history of the driver we find two competing approaches:
8f9d5138 "r8169: remember WOL preferences on driver load" prefers
to preserve what the BIOS may have set, whilst bde135a6
"r8169: only enable PCI wakeups when WOL is active" disabled PCI
wakeup per default to work around a bug on one platform.

Seems like nobody complained after the latter patch about non-working
WOL, what makes me think that nobody uses WOL w/o configuring it
explicitly.

My opinion:
Vast majority of users doesn't use WOL even if the BIOS enables it in
the chip. And having WOL being active keeps the PHY(s) from powering
down if being idle.
If somebody needs WOL, he can enable it during boot, e.g. by
configuring systemd.link/WakeOnLan.

Therefore, to make WOL consistent again, disable it per default.
Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7edf6d31

gianfar: simplify FCS handling and fix memory leak · d903ec77

由 Andy Spencer 提交于 2月 22, 2018

Previously, buffer descriptors containing only the frame check sequence
(FCS) were skipped and not added to the skb. However, the page reference
count was still incremented, leading to a memory leak.

Fixing this inside gfar_add_rx_frag() is difficult due to reserved
memory handling and page reuse. Instead, move the FCS handling to
gfar_process_frame() and trim off the FCS before passing the skb up the
networking stack.
Signed-off-by: NAndy Spencer <aspencer@spacex.com>
Signed-off-by: NJim Gruen <jgruen@spacex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d903ec77

23 2月, 2018 12 次提交

macvlan: fix use-after-free in macvlan_common_newlink() · 4e14bf42

由 Alexey Kodanev 提交于 2月 22, 2018

The following use-after-free was reported by KASan when running
LTP macvtap01 test on 4.16-rc2:

[10642.528443] BUG: KASAN: use-after-free in
               macvlan_common_newlink+0x12ef/0x14a0 [macvlan]
[10642.626607] Read of size 8 at addr ffff880ba49f2100 by task ip/18450
...
[10642.963873] Call Trace:
[10642.994352]  dump_stack+0x5c/0x7c
[10643.035325]  print_address_description+0x75/0x290
[10643.092938]  kasan_report+0x28d/0x390
[10643.137971]  ? macvlan_common_newlink+0x12ef/0x14a0 [macvlan]
[10643.207963]  macvlan_common_newlink+0x12ef/0x14a0 [macvlan]
[10643.275978]  macvtap_newlink+0x171/0x260 [macvtap]
[10643.334532]  rtnl_newlink+0xd4f/0x1300
...
[10646.256176] Allocated by task 18450:
[10646.299964]  kasan_kmalloc+0xa6/0xd0
[10646.343746]  kmem_cache_alloc_trace+0xf1/0x210
[10646.397826]  macvlan_common_newlink+0x6de/0x14a0 [macvlan]
[10646.464386]  macvtap_newlink+0x171/0x260 [macvtap]
[10646.522728]  rtnl_newlink+0xd4f/0x1300
...
[10647.022028] Freed by task 18450:
[10647.061549]  __kasan_slab_free+0x138/0x180
[10647.111468]  kfree+0x9e/0x1c0
[10647.147869]  macvlan_port_destroy+0x3db/0x650 [macvlan]
[10647.211411]  rollback_registered_many+0x5b9/0xb10
[10647.268715]  rollback_registered+0xd9/0x190
[10647.319675]  register_netdevice+0x8eb/0xc70
[10647.370635]  macvlan_common_newlink+0xe58/0x14a0 [macvlan]
[10647.437195]  macvtap_newlink+0x171/0x260 [macvtap]

Commit d02fd6e7 ("macvlan: Fix one possible double free") handles
the case when register_netdevice() invokes ndo_uninit() on error and
as a result free the port. But 'macvlan_port_get_rtnl(dev))' check
(returns dev->rx_handler_data), which was added by this commit in order
to prevent double free, is not quite correct:

* for macvlan it always returns NULL because 'lowerdev' is the one that
  was used to register rx handler (port) in macvlan_port_create() as
  well as to unregister it in macvlan_port_destroy().
* for macvtap it always returns a valid pointer because macvtap registers
  its own rx handler before macvlan_common_newlink().

Fixes: d02fd6e7 ("macvlan: Fix one possible double free")
Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e14bf42

dsa: ptp: mark dummy helpers as 'inline' · 46182452

由 Arnd Bergmann 提交于 2月 22, 2018

Declaring a static function in a header leads to a warning every
time that header gets included without the function being used:

In file included from drivers/net/dsa/mv88e6xxx/chip.c:42:
drivers/net/dsa/mv88e6xxx/ptp.h:92:13: error: 'mv88e6xxx_hwtstamp_work' defined but not used [-Werror=unused-function]
 static long mv88e6xxx_hwtstamp_work(struct ptp_clock_info *ptp)
In file included from drivers/net/dsa/mv88e6xxx/chip.c:38:
drivers/net/dsa/mv88e6xxx/global2.h:355:12: error: 'mv88e6xxx_g2_wait' defined but not used [-Werror=unused-function]
 static int mv88e6xxx_g2_wait(struct mv88e6xxx_chip *chip, int reg, u16 mask)
            ^~~~~~~~~~~~~~~~~
drivers/net/dsa/mv88e6xxx/global2.h:350:12: error: 'mv88e6xxx_g2_update' defined but not used [-Werror=unused-function]
 static int mv88e6xxx_g2_update(struct mv88e6xxx_chip *chip, int reg, u16 update)
            ^~~~~~~~~~~~~~~~~~~
drivers/net/dsa/mv88e6xxx/global2.h:345:12: error: 'mv88e6xxx_g2_write' defined but not used [-Werror=unused-function]
 static int mv88e6xxx_g2_write(struct mv88e6xxx_chip *chip, int reg, u16 val)
            ^~~~~~~~~~~~~~~~~~
drivers/net/dsa/mv88e6xxx/global2.h:340:12: error: 'mv88e6xxx_g2_read' defined but not used [-Werror=unused-function]
 static int mv88e6xxx_g2_read(struct mv88e6xxx_chip *chip, int reg, u16 *val)

This marks all such functions in dsa inline to make sure we don't warn
about them.

Fixes: c6fe0ad2 ("net: dsa: mv88e6xxx: add rx/tx timestamping support")
Fixes: 0d632c3d ("net: dsa: mv88e6xxx: add accessors for PTP/TAI registers")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46182452

net: aquantia: Fix error handling in aq_pci_probe() · 370c1052

由 Dan Carpenter 提交于 2月 22, 2018

We should check "self->aq_hw" for allocation failure, and also we should
free it on the error paths.

Fixes: 23ee07ad ("net: aquantia: Cleanup pci functions module")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

370c1052

nfp: advertise firmware for mixed 10G/25G mode · 70271dad

由 Dirk van der Merwe 提交于 2月 21, 2018

The AMDA0099-0001 platform can support the 1x10G + 1x25G mixed mode
operation. Recently, firmware has been added for this configuration
mode.
Signed-off-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
Acked-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

70271dad

aquantia: add Makefiles to all directories · 420b9358

由 Jakub Kicinski 提交于 2月 21, 2018

To be able to build separate objects we need to provide
Kbuild with a Makefile in each directory.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

420b9358

nfp: add Makefiles to all directories · f7308991

由 Jakub Kicinski 提交于 2月 21, 2018

To be able to build separate objects we need to provide
Kbuild with a Makefile in each directory.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f7308991

ibmvnic: Split counters for scrq/pools/napi · 82e3be32

由 Nathan Fontenot 提交于 2月 21, 2018

The approach of one counter to rule them all when tracking the number
of active sub-crqs, pools, and napi has problems handling some failover
scenarios. This is due to the split in initializing the sub crqs,
pools and napi in different places and the placement of updating
the active counts.

This patch simplifies this by having a counter for tx and rx
sub-crqs, pools, and napi.
Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

82e3be32

net: dsa: mv88e6xxx: scratch registers and external MDIO pins · 2510babc

由 Andrew Lunn 提交于 2月 22, 2018

MV88E6352 and later switches support GPIO control through the "Scratch
& Misc" global2 register. Two of the pins controlled this way on the
mv88e6390 family are the external MDIO pins. They can either by used
as part of the MII interface for port 0, GPIOs, or MDIO. Add a
function to configure them for MDIO, if possible, and call it when
registering the external MDIO bus.
Suggested-by: NRussell King <rmk@armlinux.org.uk>
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2510babc

ibmvnic: Fix TX descriptor tracking · aa902947

由 Thomas Falcon 提交于 2月 21, 2018

With the recent change, transmissions that only needed
one descriptor were being missed. The result is that such
packets were tracked as outstanding transmissions but never
removed when its completion notification was received.

Fixes: ffc385b9 ("ibmvnic: Keep track of supplementary TX descriptors")
Signed-off-by: NThomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa902947

ibmvnic: Fix early release of login buffer · a2c0f039

由 Thomas Falcon 提交于 2月 21, 2018

The login buffer is released before the driver can perform
sanity checks between resources the driver requested and what
firmware will provide. Don't release the login buffer until
the sanity check is performed.

Fixes: 34f0f4e3 ("ibmvnic: Fix login buffer memory leaks")
Signed-off-by: NThomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a2c0f039

net/smc9194: Remove bogus CONFIG_MAC reference · 83090e7d

由 Finn Thain 提交于 2月 22, 2018

AFAIK the only version of smc9194.c with Mac support is the one in the
linux-mac68k CVS repo, which never made it to the mainline.

Despite that, from v2.3.45, arch/m68k/config.in listed CONFIG_SMC9194
under CONFIG_MAC. This mistake got carried over into Kconfig in v2.5.55.
(See pre-git era "[PATCH] add m68k dependencies to net driver config".)
Signed-off-by: NFinn Thain <fthain@telegraphics.com.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

83090e7d

smsc75xx: fix smsc75xx_set_features() · 88e80c62

由 Eric Dumazet 提交于 2月 20, 2018

If an attempt is made to disable RX checksums, USB adapter is changed
but netdev->features is not, because smsc75xx_set_features() returns a
non zero value.

This throws errors from netdev_rx_csum_fault() :
<devname>: hw csum failure
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Steve Glendinning <steve.glendinning@shawell.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

88e80c62

22 2月, 2018 12 次提交

ipvlan: selects master_l3 device instead of depending on it · 218798f4

由 Matteo Croce 提交于 2月 21, 2018

The L3 Master device is just a glue between the core networking code and
device drivers, so it should be selected automatically rather than
requiring to be enabled explicitly.
Signed-off-by: NMatteo Croce <mcroce@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

218798f4

ipvlan: drop ipv6 dependency · 94333fac

由 Matteo Croce 提交于 2月 21, 2018

IPVlan has an hard dependency on IPv6, refactor the ipvlan code to allow
compiling it with IPv6 disabled, move duplicate code into addr_equal()
and refactor series of if-else into a switch.
Signed-off-by: NMatteo Croce <mcroce@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

94333fac

net: Allow a rule to track originating protocol · cac56209

由 Donald Sharp 提交于 2月 20, 2018

Allow a rule that is being added/deleted/modified or
dumped to contain the originating protocol's id.

The protocol is handled just like a routes originating
protocol is.  This is especially useful because there
is starting to be a plethora of different user space
programs adding rules.

Allow the vrf device to specify that the kernel is the originator
of the rule created for this device.
Signed-off-by: NDonald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cac56209

amd-xgbe: Restore PCI interrupt enablement setting on resume · cfd092f2

由 Tom Lendacky 提交于 2月 20, 2018

After resuming from suspend, the PCI device support must re-enable the
interrupt setting so that interrupts are actually delivered.
Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cfd092f2

ibmvnic: Correct goto target for tx irq initialization failure · af9090c2

由 Nathan Fontenot 提交于 2月 20, 2018

When a failure occurs during initialization of the tx sub crq
irqs, we should branch to the cleanup of the tx irqs. The current
code branches to the rx irq cleanup and attempts to cleanup the
rx irqs which have not been initialized.
Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

af9090c2

virtio_net: fix ndo_xdp_xmit crash towards dev not ready for XDP · 8dcc5b0a

由 Jesper Dangaard Brouer 提交于 2月 20, 2018

When a driver implements the ndo_xdp_xmit() function, there is
(currently) no generic way to determine whether it is safe to call.

It is e.g. unsafe to call the drivers ndo_xdp_xmit, if it have not
allocated the needed XDP TX queues yet.  This is the case for
virtio_net, which first allocates the XDP TX queues once an XDP/bpf
prog is attached (in virtnet_xdp_set()).

Thus, a crash will occur for virtio_net when redirecting to another
virtio_net device's ndo_xdp_xmit, which have not attached a XDP prog.
The sample xdp_redirect_map tries to attach a dummy XDP prog to take
this into account, but it can also easily fail if the virtio_net (or
actually underlying vhost driver) have not allocated enough extra
queues for the device.

Allocating more queue this is currently a manual config.
Hint for libvirt XML add:

  <driver name='vhost' queues='16'>
    <host mrg_rxbuf='off'/>
    <guest tso4='off' tso6='off' ecn='off' ufo='off'/>
  </driver>

The solution in this patch is to check that the device have loaded an
XDP/bpf prog before proceeding.  This is similar to the check
performed in driver ixgbe.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8dcc5b0a

virtio_net: fix memory leak in XDP_REDIRECT · 11b7d897

由 Jesper Dangaard Brouer 提交于 2月 20, 2018

XDP_REDIRECT calling xdp_do_redirect() can fail for multiple reasons
(which can be inspected by tracepoints). The current semantics is that
on failure the driver calling xdp_do_redirect() must handle freeing or
recycling the page associated with this frame.  This can be seen as an
optimization, as drivers usually have an optimized XDP_DROP code path
for frame recycling in place already.

The virtio_net driver didn't handle when xdp_do_redirect() failed.
This caused a memory leak as the page refcnt wasn't decremented on
failures.

The function __virtnet_xdp_xmit() did handle one type of failure,
when the xmit queue virtqueue_add_outbuf() is full, which "hides"
releasing a refcnt on the page.  Instead the function __virtnet_xdp_xmit()
must follow API of xdp_do_redirect(), which on errors leave it up to
the caller to free the page, of the failed send operation.

Fixes: 186b3c99 ("virtio-net: support XDP_REDIRECT")
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

11b7d897

virtio_net: fix XDP code path in receive_small() · 95dbe9e7

由 Jesper Dangaard Brouer 提交于 2月 20, 2018

When configuring virtio_net to use the code path 'receive_small()',
in-order to get correct XDP_REDIRECT support, I discovered TCP packets
would get silently dropped when loading an XDP program action XDP_PASS.

The bug seems to be that receive_small() when XDP is loaded check that
hdr->hdr.flags is zero, which seems wrong as hdr.flags contains the
flags VIRTIO_NET_HDR_F_* :
 #define VIRTIO_NET_HDR_F_NEEDS_CSUM 1 /* Use csum_start, csum_offset */
 #define VIRTIO_NET_HDR_F_DATA_VALID 2 /* Csum is valid */

TCP got dropped as it had the VIRTIO_NET_HDR_F_DATA_VALID flag set.

The flags that are relevant here are the VIRTIO_NET_HDR_GSO_* flags
stored in hdr->hdr.gso_type. Thus, the fix is just check that none of
the gso_type flags have been set.

Fixes: bb91accf ("virtio-net: XDP support for small buffers")
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95dbe9e7

virtio_net: disable XDP_REDIRECT in receive_mergeable() case · 7324f539

由 Jesper Dangaard Brouer 提交于 2月 20, 2018

The virtio_net code have three different RX code-paths in receive_buf().
Two of these code paths can handle XDP, but one of them is broken for
at least XDP_REDIRECT.

Function(1): receive_big() does not support XDP.
Function(2): receive_small() support XDP fully and uses build_skb().
Function(3): receive_mergeable() broken XDP_REDIRECT uses napi_alloc_skb().

The simple explanation is that receive_mergeable() is broken because
it uses napi_alloc_skb(), which violates XDP given XDP assumes packet
header+data in single page and enough tail room for skb_shared_info.

The longer explaination is that receive_mergeable() tries to
work-around and satisfy these XDP requiresments e.g. by having a
function xdp_linearize_page() that allocates and memcpy RX buffers
around (in case packet is scattered across multiple rx buffers).  This
does currently satisfy XDP_PASS, XDP_DROP and XDP_TX (but only because
we have not implemented bpf_xdp_adjust_tail yet).

The XDP_REDIRECT action combined with cpumap is broken, and cause hard
to debug crashes.  The main issue is that the RX packet does not have
the needed tail-room (SKB_DATA_ALIGN(skb_shared_info)), causing
skb_shared_info to overlap the next packets head-room (in which cpumap
stores info).

Reproducing depend on the packet payload length and if RX-buffer size
happened to have tail-room for skb_shared_info or not.  But to make
this even harder to troubleshoot, the RX-buffer size is runtime
dynamically change based on an Exponentially Weighted Moving Average
(EWMA) over the packet length, when refilling RX rings.

This patch only disable XDP_REDIRECT support in receive_mergeable()
case, because it can cause a real crash.

IMHO we should consider NOT supporting XDP in receive_mergeable() at
all, because the principles behind XDP are to gain speed by (1) code
simplicity, (2) sacrificing memory and (3) where possible moving
runtime checks to setup time.  These principles are clearly being
violated in receive_mergeable(), that e.g. runtime track average
buffer size to save memory consumption.

In the longer run, we should consider introducing a separate receive
function when attaching an XDP program, and also change the memory
model to be compatible with XDP when attaching an XDP prog.

Fixes: 186b3c99 ("virtio-net: support XDP_REDIRECT")
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7324f539

ibmvnic: Allocate max queues stats buffers · abcae546

由 Nathan Fontenot 提交于 2月 19, 2018

To avoid losing any stats when the number of sub-crqs change, allocate
the max number of stats buffers so a stats buffer exists all possible
sub-crqs.
Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

abcae546

ibmvnic: Make napi usage dynamic · 86f669b2

由 Nathan Fontenot 提交于 2月 19, 2018

In order to handle the number of rx sub crqs changing during a driver
reset, the ibmvnic driver also needs to update the number of napi.
To do this the code to init and free napi's is moved to their own
routines so they can be called during the reset process.
Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86f669b2

ibmvnic: Free and re-allocate scrqs when tx/rx scrqs change · d7c0ef36

由 Nathan Fontenot 提交于 2月 19, 2018

When the driver resets it is possible that the number of tx/rx
sub-crqs can change. This patch handles this so that the driver does
not try to access non-existent sub-crqs.

The count for releasing sub crqs depends on the adapter state. The
active queue count is not set in probe, so if we are relasing in probe
state we use the request queue count.

Additionally, a parameter is added to release_sub_crqs() so that
we know if the h_call to free the sub-crq needs to be made. In
the reset path we have to do a reset of the main crq, which is
a free followed by a register of the main crq. The free of main
crq results in all of the sub crq's being free'ed. When updating
sub-crq count in the reset path we do not want to h_free the
sub-crqs, they are already free'ed.
Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d7c0ef36

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功