提交 337d1ccb 编写于 作者: D David S. Miller

Merge branch 'Add-gve-driver'

Catherine Sullivan says:

====================
Add gve driver

This patch series adds the gve driver which will support the
Compute Engine Virtual NIC that will be available in the future.

v2:
- Patch 1:
  - Remove gve_size_assert.h and use static_assert instead.
  - Loop forever instead of bugging if the device won't reset
  - Use module_pci_driver
- Patch 2:
  - Use be16_to_cpu in the RX Seq No define
  - Remove unneeded ndo_change_mtu
- Patch 3:
  - No Changes
- Patch 4:
  - Instead of checking netif_carrier_ok in ethtool stats, just make sure

v3:
- Patch 1:
  - Remove X86 dep
- Patch 2:
  - No changes
- Patch 3:
  - No changes
- Patch 4:
  - Remove unneeded memsets in ethtool stats

v4:
- Patch 1:
  - Use io[read|write]32be instead of [read|write]l(cpu_to_be32())
  - Explicitly add padding to gve_adminq_set_driver_parameter
  - Use static where appropriate
- Patch 2:
  - Use u64_stats_sync
  - Explicity add padding to gve_adminq_create_rx_queue
  - Fix some enianness typing issues found by kbuild
  - Use static where appropriate
  - Remove unused variables
- Patch 3:
  - Use io[read|write]32be instead of [read|write]l(cpu_to_be32())
- Patch 4:
  - Use u64_stats_sync
  - Use static where appropriate
Warnings reported by:
Reported-by: Nkbuild test robot <lkp@intel.com>
Reported-by: NJulia Lawall <julia.lawall@lip6.fr>
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
.. SPDX-License-Identifier: GPL-2.0+
==============================================================
Linux kernel driver for Compute Engine Virtual Ethernet (gve):
==============================================================
Supported Hardware
===================
The GVE driver binds to a single PCI device id used by the virtual
Ethernet device found in some Compute Engine VMs.
+--------------+----------+---------+
|Field | Value | Comments|
+==============+==========+=========+
|Vendor ID | `0x1AE0` | Google |
+--------------+----------+---------+
|Device ID | `0x0042` | |
+--------------+----------+---------+
|Sub-vendor ID | `0x1AE0` | Google |
+--------------+----------+---------+
|Sub-device ID | `0x0058` | |
+--------------+----------+---------+
|Revision ID | `0x0` | |
+--------------+----------+---------+
|Device Class | `0x200` | Ethernet|
+--------------+----------+---------+
PCI Bars
========
The gVNIC PCI device exposes three 32-bit memory BARS:
- Bar0 - Device configuration and status registers.
- Bar1 - MSI-X vector table
- Bar2 - IRQ, RX and TX doorbells
Device Interactions
===================
The driver interacts with the device in the following ways:
- Registers
- A block of MMIO registers
- See gve_register.h for more detail
- Admin Queue
- See description below
- Reset
- At any time the device can be reset
- Interrupts
- See supported interrupts below
- Transmit and Receive Queues
- See description below
Registers
---------
All registers are MMIO and big endian.
The registers are used for initializing and configuring the device as well as
querying device status in response to management interrupts.
Admin Queue (AQ)
----------------
The Admin Queue is a PAGE_SIZE memory block, treated as an array of AQ
commands, used by the driver to issue commands to the device and set up
resources.The driver and the device maintain a count of how many commands
have been submitted and executed. To issue AQ commands, the driver must do
the following (with proper locking):
1) Copy new commands into next available slots in the AQ array
2) Increment its counter by he number of new commands
3) Write the counter into the GVE_ADMIN_QUEUE_DOORBELL register
4) Poll the ADMIN_QUEUE_EVENT_COUNTER register until it equals
the value written to the doorbell, or until a timeout.
The device will update the status field in each AQ command reported as
executed through the ADMIN_QUEUE_EVENT_COUNTER register.
Device Resets
-------------
A device reset is triggered by writing 0x0 to the AQ PFN register.
This causes the device to release all resources allocated by the
driver, including the AQ itself.
Interrupts
----------
The following interrupts are supported by the driver:
Management Interrupt
~~~~~~~~~~~~~~~~~~~~
The management interrupt is used by the device to tell the driver to
look at the GVE_DEVICE_STATUS register.
The handler for the management irq simply queues the service task in
the workqueue to check the register and acks the irq.
Notification Block Interrupts
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The notification block interrupts are used to tell the driver to poll
the queues associated with that interrupt.
The handler for these irqs schedule the napi for that block to run
and poll the queues.
Traffic Queues
--------------
gVNIC's queues are composed of a descriptor ring and a buffer and are
assigned to a notification block.
The descriptor rings are power-of-two-sized ring buffers consisting of
fixed-size descriptors. They advance their head pointer using a __be32
doorbell located in Bar2. The tail pointers are advanced by consuming
descriptors in-order and updating a __be32 counter. Both the doorbell
and the counter overflow to zero.
Each queue's buffers must be registered in advance with the device as a
queue page list, and packet data can only be put in those pages.
Transmit
~~~~~~~~
gve maps the buffers for transmit rings into a FIFO and copies the packets
into the FIFO before sending them to the NIC.
Receive
~~~~~~~
The buffers for receive rings are put into a data ring that is the same
length as the descriptor ring and the head and tail pointers advance over
the rings together.
......@@ -21,6 +21,7 @@ Contents:
intel/i40e
intel/iavf
intel/ice
google/gve
mellanox/mlx5
.. only:: subproject
......
......@@ -6720,6 +6720,15 @@ L: linux-input@vger.kernel.org
S: Maintained
F: drivers/input/touchscreen/goodix.c
GOOGLE ETHERNET DRIVERS
M: Catherine Sullivan <csully@google.com>
R: Sagi Shahar <sagis@google.com>
R: Jon Olson <jonolson@google.com>
L: netdev@vger.kernel.org
S: Supported
F: Documentation/networking/device_drivers/google/gve.txt
F: drivers/net/ethernet/google
GPD POCKET FAN DRIVER
M: Hans de Goede <hdegoede@redhat.com>
L: platform-driver-x86@vger.kernel.org
......
......@@ -76,6 +76,7 @@ source "drivers/net/ethernet/ezchip/Kconfig"
source "drivers/net/ethernet/faraday/Kconfig"
source "drivers/net/ethernet/freescale/Kconfig"
source "drivers/net/ethernet/fujitsu/Kconfig"
source "drivers/net/ethernet/google/Kconfig"
source "drivers/net/ethernet/hisilicon/Kconfig"
source "drivers/net/ethernet/hp/Kconfig"
source "drivers/net/ethernet/huawei/Kconfig"
......
......@@ -39,6 +39,7 @@ obj-$(CONFIG_NET_VENDOR_EZCHIP) += ezchip/
obj-$(CONFIG_NET_VENDOR_FARADAY) += faraday/
obj-$(CONFIG_NET_VENDOR_FREESCALE) += freescale/
obj-$(CONFIG_NET_VENDOR_FUJITSU) += fujitsu/
obj-$(CONFIG_NET_VENDOR_GOOGLE) += google/
obj-$(CONFIG_NET_VENDOR_HISILICON) += hisilicon/
obj-$(CONFIG_NET_VENDOR_HP) += hp/
obj-$(CONFIG_NET_VENDOR_HUAWEI) += huawei/
......
#
# Google network device configuration
#
config NET_VENDOR_GOOGLE
bool "Google Devices"
default y
help
If you have a network (Ethernet) device belonging to this class, say Y.
Note that the answer to this question doesn't directly affect the
kernel: saying N will just cause the configurator to skip all
the questions about Google devices. If you say Y, you will be asked
for your specific device in the following questions.
if NET_VENDOR_GOOGLE
config GVE
tristate "Google Virtual NIC (gVNIC) support"
depends on PCI_MSI
help
This driver supports Google Virtual NIC (gVNIC)"
To compile this driver as a module, choose M here.
The module will be called gve.
endif #NET_VENDOR_GOOGLE
#
# Makefile for the Google network device drivers.
#
obj-$(CONFIG_GVE) += gve/
# Makefile for the Google virtual Ethernet (gve) driver
obj-$(CONFIG_GVE) += gve.o
gve-objs := gve_main.o gve_tx.o gve_rx.o gve_ethtool.o gve_adminq.o
/* SPDX-License-Identifier: (GPL-2.0 OR MIT)
* Google virtual Ethernet (gve) driver
*
* Copyright (C) 2015-2019 Google, Inc.
*/
#ifndef _GVE_H_
#define _GVE_H_
#include <linux/dma-mapping.h>
#include <linux/netdevice.h>
#include <linux/pci.h>
#include <linux/u64_stats_sync.h>
#include "gve_desc.h"
#ifndef PCI_VENDOR_ID_GOOGLE
#define PCI_VENDOR_ID_GOOGLE 0x1ae0
#endif
#define PCI_DEV_ID_GVNIC 0x0042
#define GVE_REGISTER_BAR 0
#define GVE_DOORBELL_BAR 2
/* Driver can alloc up to 2 segments for the header and 2 for the payload. */
#define GVE_TX_MAX_IOVEC 4
/* 1 for management, 1 for rx, 1 for tx */
#define GVE_MIN_MSIX 3
/* Each slot in the desc ring has a 1:1 mapping to a slot in the data ring */
struct gve_rx_desc_queue {
struct gve_rx_desc *desc_ring; /* the descriptor ring */
dma_addr_t bus; /* the bus for the desc_ring */
u32 cnt; /* free-running total number of completed packets */
u32 fill_cnt; /* free-running total number of descriptors posted */
u32 mask; /* masks the cnt to the size of the ring */
u8 seqno; /* the next expected seqno for this desc*/
};
/* The page info for a single slot in the RX data queue */
struct gve_rx_slot_page_info {
struct page *page;
void *page_address;
u32 page_offset; /* offset to write to in page */
};
/* A list of pages registered with the device during setup and used by a queue
* as buffers
*/
struct gve_queue_page_list {
u32 id; /* unique id */
u32 num_entries;
struct page **pages; /* list of num_entries pages */
dma_addr_t *page_buses; /* the dma addrs of the pages */
};
/* Each slot in the data ring has a 1:1 mapping to a slot in the desc ring */
struct gve_rx_data_queue {
struct gve_rx_data_slot *data_ring; /* read by NIC */
dma_addr_t data_bus; /* dma mapping of the slots */
struct gve_rx_slot_page_info *page_info; /* page info of the buffers */
struct gve_queue_page_list *qpl; /* qpl assigned to this queue */
u32 mask; /* masks the cnt to the size of the ring */
u32 cnt; /* free-running total number of completed packets */
};
struct gve_priv;
/* An RX ring that contains a power-of-two sized desc and data ring. */
struct gve_rx_ring {
struct gve_priv *gve;
struct gve_rx_desc_queue desc;
struct gve_rx_data_queue data;
u64 rbytes; /* free-running bytes received */
u64 rpackets; /* free-running packets received */
u32 q_num; /* queue index */
u32 ntfy_id; /* notification block index */
struct gve_queue_resources *q_resources; /* head and tail pointer idx */
dma_addr_t q_resources_bus; /* dma address for the queue resources */
struct u64_stats_sync statss; /* sync stats for 32bit archs */
};
/* A TX desc ring entry */
union gve_tx_desc {
struct gve_tx_pkt_desc pkt; /* first desc for a packet */
struct gve_tx_seg_desc seg; /* subsequent descs for a packet */
};
/* Tracks the memory in the fifo occupied by a segment of a packet */
struct gve_tx_iovec {
u32 iov_offset; /* offset into this segment */
u32 iov_len; /* length */
u32 iov_padding; /* padding associated with this segment */
};
/* Tracks the memory in the fifo occupied by the skb. Mapped 1:1 to a desc
* ring entry but only used for a pkt_desc not a seg_desc
*/
struct gve_tx_buffer_state {
struct sk_buff *skb; /* skb for this pkt */
struct gve_tx_iovec iov[GVE_TX_MAX_IOVEC]; /* segments of this pkt */
};
/* A TX buffer - each queue has one */
struct gve_tx_fifo {
void *base; /* address of base of FIFO */
u32 size; /* total size */
atomic_t available; /* how much space is still available */
u32 head; /* offset to write at */
struct gve_queue_page_list *qpl; /* QPL mapped into this FIFO */
};
/* A TX ring that contains a power-of-two sized desc ring and a FIFO buffer */
struct gve_tx_ring {
/* Cacheline 0 -- Accessed & dirtied during transmit */
struct gve_tx_fifo tx_fifo;
u32 req; /* driver tracked head pointer */
u32 done; /* driver tracked tail pointer */
/* Cacheline 1 -- Accessed & dirtied during gve_clean_tx_done */
__be32 last_nic_done ____cacheline_aligned; /* NIC tail pointer */
u64 pkt_done; /* free-running - total packets completed */
u64 bytes_done; /* free-running - total bytes completed */
/* Cacheline 2 -- Read-mostly fields */
union gve_tx_desc *desc ____cacheline_aligned;
struct gve_tx_buffer_state *info; /* Maps 1:1 to a desc */
struct netdev_queue *netdev_txq;
struct gve_queue_resources *q_resources; /* head and tail pointer idx */
u32 mask; /* masks req and done down to queue size */
/* Slow-path fields */
u32 q_num ____cacheline_aligned; /* queue idx */
u32 stop_queue; /* count of queue stops */
u32 wake_queue; /* count of queue wakes */
u32 ntfy_id; /* notification block index */
dma_addr_t bus; /* dma address of the descr ring */
dma_addr_t q_resources_bus; /* dma address of the queue resources */
struct u64_stats_sync statss; /* sync stats for 32bit archs */
} ____cacheline_aligned;
/* Wraps the info for one irq including the napi struct and the queues
* associated with that irq.
*/
struct gve_notify_block {
__be32 irq_db_index; /* idx into Bar2 - set by device, must be 1st */
char name[IFNAMSIZ + 16]; /* name registered with the kernel */
struct napi_struct napi; /* kernel napi struct for this block */
struct gve_priv *priv;
struct gve_tx_ring *tx; /* tx rings on this block */
struct gve_rx_ring *rx; /* rx rings on this block */
} ____cacheline_aligned;
/* Tracks allowed and current queue settings */
struct gve_queue_config {
u16 max_queues;
u16 num_queues; /* current */
};
/* Tracks the available and used qpl IDs */
struct gve_qpl_config {
u32 qpl_map_size; /* map memory size */
unsigned long *qpl_id_map; /* bitmap of used qpl ids */
};
struct gve_priv {
struct net_device *dev;
struct gve_tx_ring *tx; /* array of tx_cfg.num_queues */
struct gve_rx_ring *rx; /* array of rx_cfg.num_queues */
struct gve_queue_page_list *qpls; /* array of num qpls */
struct gve_notify_block *ntfy_blocks; /* array of num_ntfy_blks */
dma_addr_t ntfy_block_bus;
struct msix_entry *msix_vectors; /* array of num_ntfy_blks + 1 */
char mgmt_msix_name[IFNAMSIZ + 16];
u32 mgmt_msix_idx;
__be32 *counter_array; /* array of num_event_counters */
dma_addr_t counter_array_bus;
u16 num_event_counters;
u16 tx_desc_cnt; /* num desc per ring */
u16 rx_desc_cnt; /* num desc per ring */
u16 tx_pages_per_qpl; /* tx buffer length */
u16 rx_pages_per_qpl; /* rx buffer length */
u64 max_registered_pages;
u64 num_registered_pages; /* num pages registered with NIC */
u32 rx_copybreak; /* copy packets smaller than this */
u16 default_num_queues; /* default num queues to set up */
struct gve_queue_config tx_cfg;
struct gve_queue_config rx_cfg;
struct gve_qpl_config qpl_cfg; /* map used QPL ids */
u32 num_ntfy_blks; /* spilt between TX and RX so must be even */
struct gve_registers __iomem *reg_bar0; /* see gve_register.h */
__be32 __iomem *db_bar2; /* "array" of doorbells */
u32 msg_enable; /* level for netif* netdev print macros */
struct pci_dev *pdev;
/* metrics */
u32 tx_timeo_cnt;
/* Admin queue - see gve_adminq.h*/
union gve_adminq_command *adminq;
dma_addr_t adminq_bus_addr;
u32 adminq_mask; /* masks prod_cnt to adminq size */
u32 adminq_prod_cnt; /* free-running count of AQ cmds executed */
struct workqueue_struct *gve_wq;
struct work_struct service_task;
unsigned long service_task_flags;
unsigned long state_flags;
};
enum gve_service_task_flags {
GVE_PRIV_FLAGS_DO_RESET = BIT(1),
GVE_PRIV_FLAGS_RESET_IN_PROGRESS = BIT(2),
GVE_PRIV_FLAGS_PROBE_IN_PROGRESS = BIT(3),
};
enum gve_state_flags {
GVE_PRIV_FLAGS_ADMIN_QUEUE_OK = BIT(1),
GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK = BIT(2),
GVE_PRIV_FLAGS_DEVICE_RINGS_OK = BIT(3),
GVE_PRIV_FLAGS_NAPI_ENABLED = BIT(4),
};
static inline bool gve_get_do_reset(struct gve_priv *priv)
{
return test_bit(GVE_PRIV_FLAGS_DO_RESET, &priv->service_task_flags);
}
static inline void gve_set_do_reset(struct gve_priv *priv)
{
set_bit(GVE_PRIV_FLAGS_DO_RESET, &priv->service_task_flags);
}
static inline void gve_clear_do_reset(struct gve_priv *priv)
{
clear_bit(GVE_PRIV_FLAGS_DO_RESET, &priv->service_task_flags);
}
static inline bool gve_get_reset_in_progress(struct gve_priv *priv)
{
return test_bit(GVE_PRIV_FLAGS_RESET_IN_PROGRESS,
&priv->service_task_flags);
}
static inline void gve_set_reset_in_progress(struct gve_priv *priv)
{
set_bit(GVE_PRIV_FLAGS_RESET_IN_PROGRESS, &priv->service_task_flags);
}
static inline void gve_clear_reset_in_progress(struct gve_priv *priv)
{
clear_bit(GVE_PRIV_FLAGS_RESET_IN_PROGRESS, &priv->service_task_flags);
}
static inline bool gve_get_probe_in_progress(struct gve_priv *priv)
{
return test_bit(GVE_PRIV_FLAGS_PROBE_IN_PROGRESS,
&priv->service_task_flags);
}
static inline void gve_set_probe_in_progress(struct gve_priv *priv)
{
set_bit(GVE_PRIV_FLAGS_PROBE_IN_PROGRESS, &priv->service_task_flags);
}
static inline void gve_clear_probe_in_progress(struct gve_priv *priv)
{
clear_bit(GVE_PRIV_FLAGS_PROBE_IN_PROGRESS, &priv->service_task_flags);
}
static inline bool gve_get_admin_queue_ok(struct gve_priv *priv)
{
return test_bit(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK, &priv->state_flags);
}
static inline void gve_set_admin_queue_ok(struct gve_priv *priv)
{
set_bit(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK, &priv->state_flags);
}
static inline void gve_clear_admin_queue_ok(struct gve_priv *priv)
{
clear_bit(GVE_PRIV_FLAGS_ADMIN_QUEUE_OK, &priv->state_flags);
}
static inline bool gve_get_device_resources_ok(struct gve_priv *priv)
{
return test_bit(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK, &priv->state_flags);
}
static inline void gve_set_device_resources_ok(struct gve_priv *priv)
{
set_bit(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK, &priv->state_flags);
}
static inline void gve_clear_device_resources_ok(struct gve_priv *priv)
{
clear_bit(GVE_PRIV_FLAGS_DEVICE_RESOURCES_OK, &priv->state_flags);
}
static inline bool gve_get_device_rings_ok(struct gve_priv *priv)
{
return test_bit(GVE_PRIV_FLAGS_DEVICE_RINGS_OK, &priv->state_flags);
}
static inline void gve_set_device_rings_ok(struct gve_priv *priv)
{
set_bit(GVE_PRIV_FLAGS_DEVICE_RINGS_OK, &priv->state_flags);
}
static inline void gve_clear_device_rings_ok(struct gve_priv *priv)
{
clear_bit(GVE_PRIV_FLAGS_DEVICE_RINGS_OK, &priv->state_flags);
}
static inline bool gve_get_napi_enabled(struct gve_priv *priv)
{
return test_bit(GVE_PRIV_FLAGS_NAPI_ENABLED, &priv->state_flags);
}
static inline void gve_set_napi_enabled(struct gve_priv *priv)
{
set_bit(GVE_PRIV_FLAGS_NAPI_ENABLED, &priv->state_flags);
}
static inline void gve_clear_napi_enabled(struct gve_priv *priv)
{
clear_bit(GVE_PRIV_FLAGS_NAPI_ENABLED, &priv->state_flags);
}
/* Returns the address of the ntfy_blocks irq doorbell
*/
static inline __be32 __iomem *gve_irq_doorbell(struct gve_priv *priv,
struct gve_notify_block *block)
{
return &priv->db_bar2[be32_to_cpu(block->irq_db_index)];
}
/* Returns the index into ntfy_blocks of the given tx ring's block
*/
static inline u32 gve_tx_idx_to_ntfy(struct gve_priv *priv, u32 queue_idx)
{
return queue_idx;
}
/* Returns the index into ntfy_blocks of the given rx ring's block
*/
static inline u32 gve_rx_idx_to_ntfy(struct gve_priv *priv, u32 queue_idx)
{
return (priv->num_ntfy_blks / 2) + queue_idx;
}
/* Returns the number of tx queue page lists
*/
static inline u32 gve_num_tx_qpls(struct gve_priv *priv)
{
return priv->tx_cfg.num_queues;
}
/* Returns the number of rx queue page lists
*/
static inline u32 gve_num_rx_qpls(struct gve_priv *priv)
{
return priv->rx_cfg.num_queues;
}
/* Returns a pointer to the next available tx qpl in the list of qpls
*/
static inline
struct gve_queue_page_list *gve_assign_tx_qpl(struct gve_priv *priv)
{
int id = find_first_zero_bit(priv->qpl_cfg.qpl_id_map,
priv->qpl_cfg.qpl_map_size);
/* we are out of tx qpls */
if (id >= gve_num_tx_qpls(priv))
return NULL;
set_bit(id, priv->qpl_cfg.qpl_id_map);
return &priv->qpls[id];
}
/* Returns a pointer to the next available rx qpl in the list of qpls
*/
static inline
struct gve_queue_page_list *gve_assign_rx_qpl(struct gve_priv *priv)
{
int id = find_next_zero_bit(priv->qpl_cfg.qpl_id_map,
priv->qpl_cfg.qpl_map_size,
gve_num_tx_qpls(priv));
/* we are out of rx qpls */
if (id == priv->qpl_cfg.qpl_map_size)
return NULL;
set_bit(id, priv->qpl_cfg.qpl_id_map);
return &priv->qpls[id];
}
/* Unassigns the qpl with the given id
*/
static inline void gve_unassign_qpl(struct gve_priv *priv, int id)
{
clear_bit(id, priv->qpl_cfg.qpl_id_map);
}
/* Returns the correct dma direction for tx and rx qpls
*/
static inline enum dma_data_direction gve_qpl_dma_dir(struct gve_priv *priv,
int id)
{
if (id < gve_num_tx_qpls(priv))
return DMA_TO_DEVICE;
else
return DMA_FROM_DEVICE;
}
/* Returns true if the max mtu allows page recycling */
static inline bool gve_can_recycle_pages(struct net_device *dev)
{
/* We can't recycle the pages if we can't fit a packet into half a
* page.
*/
return dev->max_mtu <= PAGE_SIZE / 2;
}
/* buffers */
int gve_alloc_page(struct device *dev, struct page **page, dma_addr_t *dma,
enum dma_data_direction);
void gve_free_page(struct device *dev, struct page *page, dma_addr_t dma,
enum dma_data_direction);
/* tx handling */
netdev_tx_t gve_tx(struct sk_buff *skb, struct net_device *dev);
bool gve_tx_poll(struct gve_notify_block *block, int budget);
int gve_tx_alloc_rings(struct gve_priv *priv);
void gve_tx_free_rings(struct gve_priv *priv);
__be32 gve_tx_load_event_counter(struct gve_priv *priv,
struct gve_tx_ring *tx);
/* rx handling */
void gve_rx_write_doorbell(struct gve_priv *priv, struct gve_rx_ring *rx);
bool gve_rx_poll(struct gve_notify_block *block, int budget);
int gve_rx_alloc_rings(struct gve_priv *priv);
void gve_rx_free_rings(struct gve_priv *priv);
bool gve_clean_rx_done(struct gve_rx_ring *rx, int budget,
netdev_features_t feat);
/* Reset */
void gve_schedule_reset(struct gve_priv *priv);
int gve_reset(struct gve_priv *priv, bool attempt_teardown);
int gve_adjust_queues(struct gve_priv *priv,
struct gve_queue_config new_rx_config,
struct gve_queue_config new_tx_config);
/* exported by ethtool.c */
extern const struct ethtool_ops gve_ethtool_ops;
/* needed by ethtool */
extern const char gve_version_str[];
#endif /* _GVE_H_ */
// SPDX-License-Identifier: (GPL-2.0 OR MIT)
/* Google virtual Ethernet (gve) driver
*
* Copyright (C) 2015-2019 Google, Inc.
*/
#include <linux/etherdevice.h>
#include <linux/pci.h>
#include "gve.h"
#include "gve_adminq.h"
#include "gve_register.h"
#define GVE_MAX_ADMINQ_RELEASE_CHECK 500
#define GVE_ADMINQ_SLEEP_LEN 20
#define GVE_MAX_ADMINQ_EVENT_COUNTER_CHECK 100
int gve_adminq_alloc(struct device *dev, struct gve_priv *priv)
{
priv->adminq = dma_alloc_coherent(dev, PAGE_SIZE,
&priv->adminq_bus_addr, GFP_KERNEL);
if (unlikely(!priv->adminq))
return -ENOMEM;
priv->adminq_mask = (PAGE_SIZE / sizeof(union gve_adminq_command)) - 1;
priv->adminq_prod_cnt = 0;
/* Setup Admin queue with the device */
iowrite32be(priv->adminq_bus_addr / PAGE_SIZE,
&priv->reg_bar0->adminq_pfn);
gve_set_admin_queue_ok(priv);
return 0;
}
void gve_adminq_release(struct gve_priv *priv)
{
int i = 0;
/* Tell the device the adminq is leaving */
iowrite32be(0x0, &priv->reg_bar0->adminq_pfn);
while (ioread32be(&priv->reg_bar0->adminq_pfn)) {
/* If this is reached the device is unrecoverable and still
* holding memory. Continue looping to avoid memory corruption,
* but WARN so it is visible what is going on.
*/
if (i == GVE_MAX_ADMINQ_RELEASE_CHECK)
WARN(1, "Unrecoverable platform error!");
i++;
msleep(GVE_ADMINQ_SLEEP_LEN);
}
gve_clear_device_rings_ok(priv);
gve_clear_device_resources_ok(priv);
gve_clear_admin_queue_ok(priv);
}
void gve_adminq_free(struct device *dev, struct gve_priv *priv)
{
if (!gve_get_admin_queue_ok(priv))
return;
gve_adminq_release(priv);
dma_free_coherent(dev, PAGE_SIZE, priv->adminq, priv->adminq_bus_addr);
gve_clear_admin_queue_ok(priv);
}
static void gve_adminq_kick_cmd(struct gve_priv *priv, u32 prod_cnt)
{
iowrite32be(prod_cnt, &priv->reg_bar0->adminq_doorbell);
}
static bool gve_adminq_wait_for_cmd(struct gve_priv *priv, u32 prod_cnt)
{
int i;
for (i = 0; i < GVE_MAX_ADMINQ_EVENT_COUNTER_CHECK; i++) {
if (ioread32be(&priv->reg_bar0->adminq_event_counter)
== prod_cnt)
return true;
msleep(GVE_ADMINQ_SLEEP_LEN);
}
return false;
}
static int gve_adminq_parse_err(struct device *dev, u32 status)
{
if (status != GVE_ADMINQ_COMMAND_PASSED &&
status != GVE_ADMINQ_COMMAND_UNSET)
dev_err(dev, "AQ command failed with status %d\n", status);
switch (status) {
case GVE_ADMINQ_COMMAND_PASSED:
return 0;
case GVE_ADMINQ_COMMAND_UNSET:
dev_err(dev, "parse_aq_err: err and status both unset, this should not be possible.\n");
return -EINVAL;
case GVE_ADMINQ_COMMAND_ERROR_ABORTED:
case GVE_ADMINQ_COMMAND_ERROR_CANCELLED:
case GVE_ADMINQ_COMMAND_ERROR_DATALOSS:
case GVE_ADMINQ_COMMAND_ERROR_FAILED_PRECONDITION:
case GVE_ADMINQ_COMMAND_ERROR_UNAVAILABLE:
return -EAGAIN;
case GVE_ADMINQ_COMMAND_ERROR_ALREADY_EXISTS:
case GVE_ADMINQ_COMMAND_ERROR_INTERNAL_ERROR:
case GVE_ADMINQ_COMMAND_ERROR_INVALID_ARGUMENT:
case GVE_ADMINQ_COMMAND_ERROR_NOT_FOUND:
case GVE_ADMINQ_COMMAND_ERROR_OUT_OF_RANGE:
case GVE_ADMINQ_COMMAND_ERROR_UNKNOWN_ERROR:
return -EINVAL;
case GVE_ADMINQ_COMMAND_ERROR_DEADLINE_EXCEEDED:
return -ETIME;
case GVE_ADMINQ_COMMAND_ERROR_PERMISSION_DENIED:
case GVE_ADMINQ_COMMAND_ERROR_UNAUTHENTICATED:
return -EACCES;
case GVE_ADMINQ_COMMAND_ERROR_RESOURCE_EXHAUSTED:
return -ENOMEM;
case GVE_ADMINQ_COMMAND_ERROR_UNIMPLEMENTED:
return -ENOTSUPP;
default:
dev_err(dev, "parse_aq_err: unknown status code %d\n", status);
return -EINVAL;
}
}
/* This function is not threadsafe - the caller is responsible for any
* necessary locks.
*/
int gve_adminq_execute_cmd(struct gve_priv *priv,
union gve_adminq_command *cmd_orig)
{
union gve_adminq_command *cmd;
u32 status = 0;
u32 prod_cnt;
cmd = &priv->adminq[priv->adminq_prod_cnt & priv->adminq_mask];
priv->adminq_prod_cnt++;
prod_cnt = priv->adminq_prod_cnt;
memcpy(cmd, cmd_orig, sizeof(*cmd_orig));
gve_adminq_kick_cmd(priv, prod_cnt);
if (!gve_adminq_wait_for_cmd(priv, prod_cnt)) {
dev_err(&priv->pdev->dev, "AQ command timed out, need to reset AQ\n");
return -ENOTRECOVERABLE;
}
memcpy(cmd_orig, cmd, sizeof(*cmd));
status = be32_to_cpu(READ_ONCE(cmd->status));
return gve_adminq_parse_err(&priv->pdev->dev, status);
}
/* The device specifies that the management vector can either be the first irq
* or the last irq. ntfy_blk_msix_base_idx indicates the first irq assigned to
* the ntfy blks. It if is 0 then the management vector is last, if it is 1 then
* the management vector is first.
*
* gve arranges the msix vectors so that the management vector is last.
*/
#define GVE_NTFY_BLK_BASE_MSIX_IDX 0
int gve_adminq_configure_device_resources(struct gve_priv *priv,
dma_addr_t counter_array_bus_addr,
u32 num_counters,
dma_addr_t db_array_bus_addr,
u32 num_ntfy_blks)
{
union gve_adminq_command cmd;
memset(&cmd, 0, sizeof(cmd));
cmd.opcode = cpu_to_be32(GVE_ADMINQ_CONFIGURE_DEVICE_RESOURCES);
cmd.configure_device_resources =
(struct gve_adminq_configure_device_resources) {
.counter_array = cpu_to_be64(counter_array_bus_addr),
.num_counters = cpu_to_be32(num_counters),
.irq_db_addr = cpu_to_be64(db_array_bus_addr),
.num_irq_dbs = cpu_to_be32(num_ntfy_blks),
.irq_db_stride = cpu_to_be32(sizeof(priv->ntfy_blocks[0])),
.ntfy_blk_msix_base_idx =
cpu_to_be32(GVE_NTFY_BLK_BASE_MSIX_IDX),
};
return gve_adminq_execute_cmd(priv, &cmd);
}
int gve_adminq_deconfigure_device_resources(struct gve_priv *priv)
{
union gve_adminq_command cmd;
memset(&cmd, 0, sizeof(cmd));
cmd.opcode = cpu_to_be32(GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES);
return gve_adminq_execute_cmd(priv, &cmd);
}
int gve_adminq_create_tx_queue(struct gve_priv *priv, u32 queue_index)
{
struct gve_tx_ring *tx = &priv->tx[queue_index];
union gve_adminq_command cmd;
memset(&cmd, 0, sizeof(cmd));
cmd.opcode = cpu_to_be32(GVE_ADMINQ_CREATE_TX_QUEUE);
cmd.create_tx_queue = (struct gve_adminq_create_tx_queue) {
.queue_id = cpu_to_be32(queue_index),
.reserved = 0,
.queue_resources_addr = cpu_to_be64(tx->q_resources_bus),
.tx_ring_addr = cpu_to_be64(tx->bus),
.queue_page_list_id = cpu_to_be32(tx->tx_fifo.qpl->id),
.ntfy_id = cpu_to_be32(tx->ntfy_id),
};
return gve_adminq_execute_cmd(priv, &cmd);
}
int gve_adminq_create_rx_queue(struct gve_priv *priv, u32 queue_index)
{
struct gve_rx_ring *rx = &priv->rx[queue_index];
union gve_adminq_command cmd;
memset(&cmd, 0, sizeof(cmd));
cmd.opcode = cpu_to_be32(GVE_ADMINQ_CREATE_RX_QUEUE);
cmd.create_rx_queue = (struct gve_adminq_create_rx_queue) {
.queue_id = cpu_to_be32(queue_index),
.index = cpu_to_be32(queue_index),
.reserved = 0,
.ntfy_id = cpu_to_be32(rx->ntfy_id),
.queue_resources_addr = cpu_to_be64(rx->q_resources_bus),
.rx_desc_ring_addr = cpu_to_be64(rx->desc.bus),
.rx_data_ring_addr = cpu_to_be64(rx->data.data_bus),
.queue_page_list_id = cpu_to_be32(rx->data.qpl->id),
};
return gve_adminq_execute_cmd(priv, &cmd);
}
int gve_adminq_destroy_tx_queue(struct gve_priv *priv, u32 queue_index)
{
union gve_adminq_command cmd;
memset(&cmd, 0, sizeof(cmd));
cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESTROY_TX_QUEUE);
cmd.destroy_tx_queue = (struct gve_adminq_destroy_tx_queue) {
.queue_id = cpu_to_be32(queue_index),
};
return gve_adminq_execute_cmd(priv, &cmd);
}
int gve_adminq_destroy_rx_queue(struct gve_priv *priv, u32 queue_index)
{
union gve_adminq_command cmd;
memset(&cmd, 0, sizeof(cmd));
cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESTROY_RX_QUEUE);
cmd.destroy_rx_queue = (struct gve_adminq_destroy_rx_queue) {
.queue_id = cpu_to_be32(queue_index),
};
return gve_adminq_execute_cmd(priv, &cmd);
}
int gve_adminq_describe_device(struct gve_priv *priv)
{
struct gve_device_descriptor *descriptor;
union gve_adminq_command cmd;
dma_addr_t descriptor_bus;
int err = 0;
u8 *mac;
u16 mtu;
memset(&cmd, 0, sizeof(cmd));
descriptor = dma_alloc_coherent(&priv->pdev->dev, PAGE_SIZE,
&descriptor_bus, GFP_KERNEL);
if (!descriptor)
return -ENOMEM;
cmd.opcode = cpu_to_be32(GVE_ADMINQ_DESCRIBE_DEVICE);
cmd.describe_device.device_descriptor_addr =
cpu_to_be64(descriptor_bus);
cmd.describe_device.device_descriptor_version =
cpu_to_be32(GVE_ADMINQ_DEVICE_DESCRIPTOR_VERSION);
cmd.describe_device.available_length = cpu_to_be32(PAGE_SIZE);
err = gve_adminq_execute_cmd(priv, &cmd);
if (err)
goto free_device_descriptor;
priv->tx_desc_cnt = be16_to_cpu(descriptor->tx_queue_entries);
if (priv->tx_desc_cnt * sizeof(priv->tx->desc[0]) < PAGE_SIZE) {
netif_err(priv, drv, priv->dev, "Tx desc count %d too low\n",
priv->tx_desc_cnt);
err = -EINVAL;
goto free_device_descriptor;
}
priv->rx_desc_cnt = be16_to_cpu(descriptor->rx_queue_entries);
if (priv->rx_desc_cnt * sizeof(priv->rx->desc.desc_ring[0])
< PAGE_SIZE ||
priv->rx_desc_cnt * sizeof(priv->rx->data.data_ring[0])
< PAGE_SIZE) {
netif_err(priv, drv, priv->dev, "Rx desc count %d too low\n",
priv->rx_desc_cnt);
err = -EINVAL;
goto free_device_descriptor;
}
priv->max_registered_pages =
be64_to_cpu(descriptor->max_registered_pages);
mtu = be16_to_cpu(descriptor->mtu);
if (mtu < ETH_MIN_MTU) {
netif_err(priv, drv, priv->dev, "MTU %d below minimum MTU\n",
mtu);
err = -EINVAL;
goto free_device_descriptor;
}
priv->dev->max_mtu = mtu;
priv->num_event_counters = be16_to_cpu(descriptor->counters);
ether_addr_copy(priv->dev->dev_addr, descriptor->mac);
mac = descriptor->mac;
netif_info(priv, drv, priv->dev, "MAC addr: %pM\n", mac);
priv->tx_pages_per_qpl = be16_to_cpu(descriptor->tx_pages_per_qpl);
priv->rx_pages_per_qpl = be16_to_cpu(descriptor->rx_pages_per_qpl);
if (priv->rx_pages_per_qpl < priv->rx_desc_cnt) {
netif_err(priv, drv, priv->dev, "rx_pages_per_qpl cannot be smaller than rx_desc_cnt, setting rx_desc_cnt down to %d.\n",
priv->rx_pages_per_qpl);
priv->rx_desc_cnt = priv->rx_pages_per_qpl;
}
priv->default_num_queues = be16_to_cpu(descriptor->default_num_queues);
free_device_descriptor:
dma_free_coherent(&priv->pdev->dev, sizeof(*descriptor), descriptor,
descriptor_bus);
return err;
}
int gve_adminq_register_page_list(struct gve_priv *priv,
struct gve_queue_page_list *qpl)
{
struct device *hdev = &priv->pdev->dev;
u32 num_entries = qpl->num_entries;
u32 size = num_entries * sizeof(qpl->page_buses[0]);
union gve_adminq_command cmd;
dma_addr_t page_list_bus;
__be64 *page_list;
int err;
int i;
memset(&cmd, 0, sizeof(cmd));
page_list = dma_alloc_coherent(hdev, size, &page_list_bus, GFP_KERNEL);
if (!page_list)
return -ENOMEM;
for (i = 0; i < num_entries; i++)
page_list[i] = cpu_to_be64(qpl->page_buses[i]);
cmd.opcode = cpu_to_be32(GVE_ADMINQ_REGISTER_PAGE_LIST);
cmd.reg_page_list = (struct gve_adminq_register_page_list) {
.page_list_id = cpu_to_be32(qpl->id),
.num_pages = cpu_to_be32(num_entries),
.page_address_list_addr = cpu_to_be64(page_list_bus),
};
err = gve_adminq_execute_cmd(priv, &cmd);
dma_free_coherent(hdev, size, page_list, page_list_bus);
return err;
}
int gve_adminq_unregister_page_list(struct gve_priv *priv, u32 page_list_id)
{
union gve_adminq_command cmd;
memset(&cmd, 0, sizeof(cmd));
cmd.opcode = cpu_to_be32(GVE_ADMINQ_UNREGISTER_PAGE_LIST);
cmd.unreg_page_list = (struct gve_adminq_unregister_page_list) {
.page_list_id = cpu_to_be32(page_list_id),
};
return gve_adminq_execute_cmd(priv, &cmd);
}
int gve_adminq_set_mtu(struct gve_priv *priv, u64 mtu)
{
union gve_adminq_command cmd;
memset(&cmd, 0, sizeof(cmd));
cmd.opcode = cpu_to_be32(GVE_ADMINQ_SET_DRIVER_PARAMETER);
cmd.set_driver_param = (struct gve_adminq_set_driver_parameter) {
.parameter_type = cpu_to_be32(GVE_SET_PARAM_MTU),
.parameter_value = cpu_to_be64(mtu),
};
return gve_adminq_execute_cmd(priv, &cmd);
}
/* SPDX-License-Identifier: (GPL-2.0 OR MIT)
* Google virtual Ethernet (gve) driver
*
* Copyright (C) 2015-2019 Google, Inc.
*/
#ifndef _GVE_ADMINQ_H
#define _GVE_ADMINQ_H
#include <linux/build_bug.h>
/* Admin queue opcodes */
enum gve_adminq_opcodes {
GVE_ADMINQ_DESCRIBE_DEVICE = 0x1,
GVE_ADMINQ_CONFIGURE_DEVICE_RESOURCES = 0x2,
GVE_ADMINQ_REGISTER_PAGE_LIST = 0x3,
GVE_ADMINQ_UNREGISTER_PAGE_LIST = 0x4,
GVE_ADMINQ_CREATE_TX_QUEUE = 0x5,
GVE_ADMINQ_CREATE_RX_QUEUE = 0x6,
GVE_ADMINQ_DESTROY_TX_QUEUE = 0x7,
GVE_ADMINQ_DESTROY_RX_QUEUE = 0x8,
GVE_ADMINQ_DECONFIGURE_DEVICE_RESOURCES = 0x9,
GVE_ADMINQ_SET_DRIVER_PARAMETER = 0xB,
};
/* Admin queue status codes */
enum gve_adminq_statuses {
GVE_ADMINQ_COMMAND_UNSET = 0x0,
GVE_ADMINQ_COMMAND_PASSED = 0x1,
GVE_ADMINQ_COMMAND_ERROR_ABORTED = 0xFFFFFFF0,
GVE_ADMINQ_COMMAND_ERROR_ALREADY_EXISTS = 0xFFFFFFF1,
GVE_ADMINQ_COMMAND_ERROR_CANCELLED = 0xFFFFFFF2,
GVE_ADMINQ_COMMAND_ERROR_DATALOSS = 0xFFFFFFF3,
GVE_ADMINQ_COMMAND_ERROR_DEADLINE_EXCEEDED = 0xFFFFFFF4,
GVE_ADMINQ_COMMAND_ERROR_FAILED_PRECONDITION = 0xFFFFFFF5,
GVE_ADMINQ_COMMAND_ERROR_INTERNAL_ERROR = 0xFFFFFFF6,
GVE_ADMINQ_COMMAND_ERROR_INVALID_ARGUMENT = 0xFFFFFFF7,
GVE_ADMINQ_COMMAND_ERROR_NOT_FOUND = 0xFFFFFFF8,
GVE_ADMINQ_COMMAND_ERROR_OUT_OF_RANGE = 0xFFFFFFF9,
GVE_ADMINQ_COMMAND_ERROR_PERMISSION_DENIED = 0xFFFFFFFA,
GVE_ADMINQ_COMMAND_ERROR_UNAUTHENTICATED = 0xFFFFFFFB,
GVE_ADMINQ_COMMAND_ERROR_RESOURCE_EXHAUSTED = 0xFFFFFFFC,
GVE_ADMINQ_COMMAND_ERROR_UNAVAILABLE = 0xFFFFFFFD,
GVE_ADMINQ_COMMAND_ERROR_UNIMPLEMENTED = 0xFFFFFFFE,
GVE_ADMINQ_COMMAND_ERROR_UNKNOWN_ERROR = 0xFFFFFFFF,
};
#define GVE_ADMINQ_DEVICE_DESCRIPTOR_VERSION 1
/* All AdminQ command structs should be naturally packed. The static_assert
* calls make sure this is the case at compile time.
*/
struct gve_adminq_describe_device {
__be64 device_descriptor_addr;
__be32 device_descriptor_version;
__be32 available_length;
};
static_assert(sizeof(struct gve_adminq_describe_device) == 16);
struct gve_device_descriptor {
__be64 max_registered_pages;
__be16 reserved1;
__be16 tx_queue_entries;
__be16 rx_queue_entries;
__be16 default_num_queues;
__be16 mtu;
__be16 counters;
__be16 tx_pages_per_qpl;
__be16 rx_pages_per_qpl;
u8 mac[ETH_ALEN];
__be16 num_device_options;
__be16 total_length;
u8 reserved2[6];
};
static_assert(sizeof(struct gve_device_descriptor) == 40);
struct device_option {
__be32 option_id;
__be32 option_length;
};
static_assert(sizeof(struct device_option) == 8);
struct gve_adminq_configure_device_resources {
__be64 counter_array;
__be64 irq_db_addr;
__be32 num_counters;
__be32 num_irq_dbs;
__be32 irq_db_stride;
__be32 ntfy_blk_msix_base_idx;
};
static_assert(sizeof(struct gve_adminq_configure_device_resources) == 32);
struct gve_adminq_register_page_list {
__be32 page_list_id;
__be32 num_pages;
__be64 page_address_list_addr;
};
static_assert(sizeof(struct gve_adminq_register_page_list) == 16);
struct gve_adminq_unregister_page_list {
__be32 page_list_id;
};
static_assert(sizeof(struct gve_adminq_unregister_page_list) == 4);
struct gve_adminq_create_tx_queue {
__be32 queue_id;
__be32 reserved;
__be64 queue_resources_addr;
__be64 tx_ring_addr;
__be32 queue_page_list_id;
__be32 ntfy_id;
};
static_assert(sizeof(struct gve_adminq_create_tx_queue) == 32);
struct gve_adminq_create_rx_queue {
__be32 queue_id;
__be32 index;
__be32 reserved;
__be32 ntfy_id;
__be64 queue_resources_addr;
__be64 rx_desc_ring_addr;
__be64 rx_data_ring_addr;
__be32 queue_page_list_id;
u8 padding[4];
};
static_assert(sizeof(struct gve_adminq_create_rx_queue) == 48);
/* Queue resources that are shared with the device */
struct gve_queue_resources {
union {
struct {
__be32 db_index; /* Device -> Guest */
__be32 counter_index; /* Device -> Guest */
};
u8 reserved[64];
};
};
static_assert(sizeof(struct gve_queue_resources) == 64);
struct gve_adminq_destroy_tx_queue {
__be32 queue_id;
};
static_assert(sizeof(struct gve_adminq_destroy_tx_queue) == 4);
struct gve_adminq_destroy_rx_queue {
__be32 queue_id;
};
static_assert(sizeof(struct gve_adminq_destroy_rx_queue) == 4);
/* GVE Set Driver Parameter Types */
enum gve_set_driver_param_types {
GVE_SET_PARAM_MTU = 0x1,
};
struct gve_adminq_set_driver_parameter {
__be32 parameter_type;
u8 reserved[4];
__be64 parameter_value;
};
static_assert(sizeof(struct gve_adminq_set_driver_parameter) == 16);
union gve_adminq_command {
struct {
__be32 opcode;
__be32 status;
union {
struct gve_adminq_configure_device_resources
configure_device_resources;
struct gve_adminq_create_tx_queue create_tx_queue;
struct gve_adminq_create_rx_queue create_rx_queue;
struct gve_adminq_destroy_tx_queue destroy_tx_queue;
struct gve_adminq_destroy_rx_queue destroy_rx_queue;
struct gve_adminq_describe_device describe_device;
struct gve_adminq_register_page_list reg_page_list;
struct gve_adminq_unregister_page_list unreg_page_list;
struct gve_adminq_set_driver_parameter set_driver_param;
};
};
u8 reserved[64];
};
static_assert(sizeof(union gve_adminq_command) == 64);
int gve_adminq_alloc(struct device *dev, struct gve_priv *priv);
void gve_adminq_free(struct device *dev, struct gve_priv *priv);
void gve_adminq_release(struct gve_priv *priv);
int gve_adminq_execute_cmd(struct gve_priv *priv,
union gve_adminq_command *cmd_orig);
int gve_adminq_describe_device(struct gve_priv *priv);
int gve_adminq_configure_device_resources(struct gve_priv *priv,
dma_addr_t counter_array_bus_addr,
u32 num_counters,
dma_addr_t db_array_bus_addr,
u32 num_ntfy_blks);
int gve_adminq_deconfigure_device_resources(struct gve_priv *priv);
int gve_adminq_create_tx_queue(struct gve_priv *priv, u32 queue_id);
int gve_adminq_destroy_tx_queue(struct gve_priv *priv, u32 queue_id);
int gve_adminq_create_rx_queue(struct gve_priv *priv, u32 queue_id);
int gve_adminq_destroy_rx_queue(struct gve_priv *priv, u32 queue_id);
int gve_adminq_register_page_list(struct gve_priv *priv,
struct gve_queue_page_list *qpl);
int gve_adminq_unregister_page_list(struct gve_priv *priv, u32 page_list_id);
int gve_adminq_set_mtu(struct gve_priv *priv, u64 mtu);
#endif /* _GVE_ADMINQ_H */
/* SPDX-License-Identifier: (GPL-2.0 OR MIT)
* Google virtual Ethernet (gve) driver
*
* Copyright (C) 2015-2019 Google, Inc.
*/
/* GVE Transmit Descriptor formats */
#ifndef _GVE_DESC_H_
#define _GVE_DESC_H_
#include <linux/build_bug.h>
/* A note on seg_addrs
*
* Base addresses encoded in seg_addr are not assumed to be physical
* addresses. The ring format assumes these come from some linear address
* space. This could be physical memory, kernel virtual memory, user virtual
* memory. gVNIC uses lists of registered pages. Each queue is assumed
* to be associated with a single such linear address space to ensure a
* consistent meaning for seg_addrs posted to its rings.
*/
struct gve_tx_pkt_desc {
u8 type_flags; /* desc type is lower 4 bits, flags upper */
u8 l4_csum_offset; /* relative offset of L4 csum word */
u8 l4_hdr_offset; /* Offset of start of L4 headers in packet */
u8 desc_cnt; /* Total descriptors for this packet */
__be16 len; /* Total length of this packet (in bytes) */
__be16 seg_len; /* Length of this descriptor's segment */
__be64 seg_addr; /* Base address (see note) of this segment */
} __packed;
struct gve_tx_seg_desc {
u8 type_flags; /* type is lower 4 bits, flags upper */
u8 l3_offset; /* TSO: 2 byte units to start of IPH */
__be16 reserved;
__be16 mss; /* TSO MSS */
__be16 seg_len;
__be64 seg_addr;
} __packed;
/* GVE Transmit Descriptor Types */
#define GVE_TXD_STD (0x0 << 4) /* Std with Host Address */
#define GVE_TXD_TSO (0x1 << 4) /* TSO with Host Address */
#define GVE_TXD_SEG (0x2 << 4) /* Seg with Host Address */
/* GVE Transmit Descriptor Flags for Std Pkts */
#define GVE_TXF_L4CSUM BIT(0) /* Need csum offload */
#define GVE_TXF_TSTAMP BIT(2) /* Timestamp required */
/* GVE Transmit Descriptor Flags for TSO Segs */
#define GVE_TXSF_IPV6 BIT(1) /* IPv6 TSO */
/* GVE Receive Packet Descriptor */
/* The start of an ethernet packet comes 2 bytes into the rx buffer.
* gVNIC adds this padding so that both the DMA and the L3/4 protocol header
* access is aligned.
*/
#define GVE_RX_PAD 2
struct gve_rx_desc {
u8 padding[48];
__be32 rss_hash; /* Receive-side scaling hash (Toeplitz for gVNIC) */
__be16 mss;
__be16 reserved; /* Reserved to zero */
u8 hdr_len; /* Header length (L2-L4) including padding */
u8 hdr_off; /* 64-byte-scaled offset into RX_DATA entry */
__sum16 csum; /* 1's-complement partial checksum of L3+ bytes */
__be16 len; /* Length of the received packet */
__be16 flags_seq; /* Flags [15:3] and sequence number [2:0] (1-7) */
} __packed;
static_assert(sizeof(struct gve_rx_desc) == 64);
/* As with the Tx ring format, the qpl_offset entries below are offsets into an
* ordered list of registered pages.
*/
struct gve_rx_data_slot {
/* byte offset into the rx registered segment of this slot */
__be64 qpl_offset;
};
/* GVE Recive Packet Descriptor Seq No */
#define GVE_SEQNO(x) (be16_to_cpu(x) & 0x7)
/* GVE Recive Packet Descriptor Flags */
#define GVE_RXFLG(x) cpu_to_be16(1 << (3 + (x)))
#define GVE_RXF_FRAG GVE_RXFLG(3) /* IP Fragment */
#define GVE_RXF_IPV4 GVE_RXFLG(4) /* IPv4 */
#define GVE_RXF_IPV6 GVE_RXFLG(5) /* IPv6 */
#define GVE_RXF_TCP GVE_RXFLG(6) /* TCP Packet */
#define GVE_RXF_UDP GVE_RXFLG(7) /* UDP Packet */
#define GVE_RXF_ERR GVE_RXFLG(8) /* Packet Error Detected */
/* GVE IRQ */
#define GVE_IRQ_ACK BIT(31)
#define GVE_IRQ_MASK BIT(30)
#define GVE_IRQ_EVENT BIT(29)
static inline bool gve_needs_rss(__be16 flag)
{
if (flag & GVE_RXF_FRAG)
return false;
if (flag & (GVE_RXF_IPV4 | GVE_RXF_IPV6))
return true;
return false;
}
static inline u8 gve_next_seqno(u8 seq)
{
return (seq + 1) == 8 ? 1 : seq + 1;
}
#endif /* _GVE_DESC_H_ */
// SPDX-License-Identifier: (GPL-2.0 OR MIT)
/* Google virtual Ethernet (gve) driver
*
* Copyright (C) 2015-2019 Google, Inc.
*/
#include <linux/rtnetlink.h>
#include "gve.h"
static void gve_get_drvinfo(struct net_device *netdev,
struct ethtool_drvinfo *info)
{
struct gve_priv *priv = netdev_priv(netdev);
strlcpy(info->driver, "gve", sizeof(info->driver));
strlcpy(info->version, gve_version_str, sizeof(info->version));
strlcpy(info->bus_info, pci_name(priv->pdev), sizeof(info->bus_info));
}
static void gve_set_msglevel(struct net_device *netdev, u32 value)
{
struct gve_priv *priv = netdev_priv(netdev);
priv->msg_enable = value;
}
static u32 gve_get_msglevel(struct net_device *netdev)
{
struct gve_priv *priv = netdev_priv(netdev);
return priv->msg_enable;
}
static const char gve_gstrings_main_stats[][ETH_GSTRING_LEN] = {
"rx_packets", "tx_packets", "rx_bytes", "tx_bytes",
"rx_dropped", "tx_dropped", "tx_timeouts",
};
#define GVE_MAIN_STATS_LEN ARRAY_SIZE(gve_gstrings_main_stats)
#define NUM_GVE_TX_CNTS 5
#define NUM_GVE_RX_CNTS 2
static void gve_get_strings(struct net_device *netdev, u32 stringset, u8 *data)
{
struct gve_priv *priv = netdev_priv(netdev);
char *s = (char *)data;
int i;
if (stringset != ETH_SS_STATS)
return;
memcpy(s, *gve_gstrings_main_stats,
sizeof(gve_gstrings_main_stats));
s += sizeof(gve_gstrings_main_stats);
for (i = 0; i < priv->rx_cfg.num_queues; i++) {
snprintf(s, ETH_GSTRING_LEN, "rx_desc_cnt[%u]", i);
s += ETH_GSTRING_LEN;
snprintf(s, ETH_GSTRING_LEN, "rx_desc_fill_cnt[%u]", i);
s += ETH_GSTRING_LEN;
}
for (i = 0; i < priv->tx_cfg.num_queues; i++) {
snprintf(s, ETH_GSTRING_LEN, "tx_req[%u]", i);
s += ETH_GSTRING_LEN;
snprintf(s, ETH_GSTRING_LEN, "tx_done[%u]", i);
s += ETH_GSTRING_LEN;
snprintf(s, ETH_GSTRING_LEN, "tx_wake[%u]", i);
s += ETH_GSTRING_LEN;
snprintf(s, ETH_GSTRING_LEN, "tx_stop[%u]", i);
s += ETH_GSTRING_LEN;
snprintf(s, ETH_GSTRING_LEN, "tx_event_counter[%u]", i);
s += ETH_GSTRING_LEN;
}
}
static int gve_get_sset_count(struct net_device *netdev, int sset)
{
struct gve_priv *priv = netdev_priv(netdev);
switch (sset) {
case ETH_SS_STATS:
return GVE_MAIN_STATS_LEN +
(priv->rx_cfg.num_queues * NUM_GVE_RX_CNTS) +
(priv->tx_cfg.num_queues * NUM_GVE_TX_CNTS);
default:
return -EOPNOTSUPP;
}
}
static void
gve_get_ethtool_stats(struct net_device *netdev,
struct ethtool_stats *stats, u64 *data)
{
struct gve_priv *priv = netdev_priv(netdev);
u64 rx_pkts, rx_bytes, tx_pkts, tx_bytes;
unsigned int start;
int ring;
int i;
ASSERT_RTNL();
for (rx_pkts = 0, rx_bytes = 0, ring = 0;
ring < priv->rx_cfg.num_queues; ring++) {
if (priv->rx) {
do {
u64_stats_fetch_begin(&priv->rx[ring].statss);
rx_pkts += priv->rx[ring].rpackets;
rx_bytes += priv->rx[ring].rbytes;
} while (u64_stats_fetch_retry(&priv->rx[ring].statss,
start));
}
}
for (tx_pkts = 0, tx_bytes = 0, ring = 0;
ring < priv->tx_cfg.num_queues; ring++) {
if (priv->tx) {
do {
u64_stats_fetch_begin(&priv->tx[ring].statss);
tx_pkts += priv->tx[ring].pkt_done;
tx_bytes += priv->tx[ring].bytes_done;
} while (u64_stats_fetch_retry(&priv->tx[ring].statss,
start));
}
}
i = 0;
data[i++] = rx_pkts;
data[i++] = tx_pkts;
data[i++] = rx_bytes;
data[i++] = tx_bytes;
/* Skip rx_dropped and tx_dropped */
i += 2;
data[i++] = priv->tx_timeo_cnt;
i = GVE_MAIN_STATS_LEN;
/* walk RX rings */
if (priv->rx) {
for (ring = 0; ring < priv->rx_cfg.num_queues; ring++) {
struct gve_rx_ring *rx = &priv->rx[ring];
data[i++] = rx->desc.cnt;
data[i++] = rx->desc.fill_cnt;
}
} else {
i += priv->rx_cfg.num_queues * NUM_GVE_RX_CNTS;
}
/* walk TX rings */
if (priv->tx) {
for (ring = 0; ring < priv->tx_cfg.num_queues; ring++) {
struct gve_tx_ring *tx = &priv->tx[ring];
data[i++] = tx->req;
data[i++] = tx->done;
data[i++] = tx->wake_queue;
data[i++] = tx->stop_queue;
data[i++] = be32_to_cpu(gve_tx_load_event_counter(priv,
tx));
}
} else {
i += priv->tx_cfg.num_queues * NUM_GVE_TX_CNTS;
}
}
static void gve_get_channels(struct net_device *netdev,
struct ethtool_channels *cmd)
{
struct gve_priv *priv = netdev_priv(netdev);
cmd->max_rx = priv->rx_cfg.max_queues;
cmd->max_tx = priv->tx_cfg.max_queues;
cmd->max_other = 0;
cmd->max_combined = 0;
cmd->rx_count = priv->rx_cfg.num_queues;
cmd->tx_count = priv->tx_cfg.num_queues;
cmd->other_count = 0;
cmd->combined_count = 0;
}
static int gve_set_channels(struct net_device *netdev,
struct ethtool_channels *cmd)
{
struct gve_priv *priv = netdev_priv(netdev);
struct gve_queue_config new_tx_cfg = priv->tx_cfg;
struct gve_queue_config new_rx_cfg = priv->rx_cfg;
struct ethtool_channels old_settings;
int new_tx = cmd->tx_count;
int new_rx = cmd->rx_count;
gve_get_channels(netdev, &old_settings);
/* Changing combined is not allowed allowed */
if (cmd->combined_count != old_settings.combined_count)
return -EINVAL;
if (!new_rx || !new_tx)
return -EINVAL;
if (!netif_carrier_ok(netdev)) {
priv->tx_cfg.num_queues = new_tx;
priv->rx_cfg.num_queues = new_rx;
return 0;
}
new_tx_cfg.num_queues = new_tx;
new_rx_cfg.num_queues = new_rx;
return gve_adjust_queues(priv, new_rx_cfg, new_tx_cfg);
}
static void gve_get_ringparam(struct net_device *netdev,
struct ethtool_ringparam *cmd)
{
struct gve_priv *priv = netdev_priv(netdev);
cmd->rx_max_pending = priv->rx_desc_cnt;
cmd->tx_max_pending = priv->tx_desc_cnt;
cmd->rx_pending = priv->rx_desc_cnt;
cmd->tx_pending = priv->tx_desc_cnt;
}
static int gve_user_reset(struct net_device *netdev, u32 *flags)
{
struct gve_priv *priv = netdev_priv(netdev);
if (*flags == ETH_RESET_ALL) {
*flags = 0;
return gve_reset(priv, true);
}
return -EOPNOTSUPP;
}
const struct ethtool_ops gve_ethtool_ops = {
.get_drvinfo = gve_get_drvinfo,
.get_strings = gve_get_strings,
.get_sset_count = gve_get_sset_count,
.get_ethtool_stats = gve_get_ethtool_stats,
.set_msglevel = gve_set_msglevel,
.get_msglevel = gve_get_msglevel,
.set_channels = gve_set_channels,
.get_channels = gve_get_channels,
.get_link = ethtool_op_get_link,
.get_ringparam = gve_get_ringparam,
.reset = gve_user_reset,
};
此差异已折叠。
/* SPDX-License-Identifier: (GPL-2.0 OR MIT)
* Google virtual Ethernet (gve) driver
*
* Copyright (C) 2015-2019 Google, Inc.
*/
#ifndef _GVE_REGISTER_H_
#define _GVE_REGISTER_H_
/* Fixed Configuration Registers */
struct gve_registers {
__be32 device_status;
__be32 driver_status;
__be32 max_tx_queues;
__be32 max_rx_queues;
__be32 adminq_pfn;
__be32 adminq_doorbell;
__be32 adminq_event_counter;
u8 reserved[3];
u8 driver_version;
};
enum gve_device_status_flags {
GVE_DEVICE_STATUS_RESET_MASK = BIT(1),
GVE_DEVICE_STATUS_LINK_STATUS_MASK = BIT(2),
};
#endif /* _GVE_REGISTER_H_ */
// SPDX-License-Identifier: (GPL-2.0 OR MIT)
/* Google virtual Ethernet (gve) driver
*
* Copyright (C) 2015-2019 Google, Inc.
*/
#include "gve.h"
#include "gve_adminq.h"
#include <linux/etherdevice.h>
static void gve_rx_remove_from_block(struct gve_priv *priv, int queue_idx)
{
struct gve_notify_block *block =
&priv->ntfy_blocks[gve_rx_idx_to_ntfy(priv, queue_idx)];
block->rx = NULL;
}
static void gve_rx_free_ring(struct gve_priv *priv, int idx)
{
struct gve_rx_ring *rx = &priv->rx[idx];
struct device *dev = &priv->pdev->dev;
size_t bytes;
u32 slots;
gve_rx_remove_from_block(priv, idx);
bytes = sizeof(struct gve_rx_desc) * priv->rx_desc_cnt;
dma_free_coherent(dev, bytes, rx->desc.desc_ring, rx->desc.bus);
rx->desc.desc_ring = NULL;
dma_free_coherent(dev, sizeof(*rx->q_resources),
rx->q_resources, rx->q_resources_bus);
rx->q_resources = NULL;
gve_unassign_qpl(priv, rx->data.qpl->id);
rx->data.qpl = NULL;
kfree(rx->data.page_info);
slots = rx->data.mask + 1;
bytes = sizeof(*rx->data.data_ring) * slots;
dma_free_coherent(dev, bytes, rx->data.data_ring,
rx->data.data_bus);
rx->data.data_ring = NULL;
netif_dbg(priv, drv, priv->dev, "freed rx ring %d\n", idx);
}
static void gve_setup_rx_buffer(struct gve_rx_slot_page_info *page_info,
struct gve_rx_data_slot *slot,
dma_addr_t addr, struct page *page)
{
page_info->page = page;
page_info->page_offset = 0;
page_info->page_address = page_address(page);
slot->qpl_offset = cpu_to_be64(addr);
}
static int gve_prefill_rx_pages(struct gve_rx_ring *rx)
{
struct gve_priv *priv = rx->gve;
u32 slots;
int i;
/* Allocate one page per Rx queue slot. Each page is split into two
* packet buffers, when possible we "page flip" between the two.
*/
slots = rx->data.mask + 1;
rx->data.page_info = kvzalloc(slots *
sizeof(*rx->data.page_info), GFP_KERNEL);
if (!rx->data.page_info)
return -ENOMEM;
rx->data.qpl = gve_assign_rx_qpl(priv);
for (i = 0; i < slots; i++) {
struct page *page = rx->data.qpl->pages[i];
dma_addr_t addr = i * PAGE_SIZE;
gve_setup_rx_buffer(&rx->data.page_info[i],
&rx->data.data_ring[i], addr, page);
}
return slots;
}
static void gve_rx_add_to_block(struct gve_priv *priv, int queue_idx)
{
u32 ntfy_idx = gve_rx_idx_to_ntfy(priv, queue_idx);
struct gve_notify_block *block = &priv->ntfy_blocks[ntfy_idx];
struct gve_rx_ring *rx = &priv->rx[queue_idx];
block->rx = rx;
rx->ntfy_id = ntfy_idx;
}
static int gve_rx_alloc_ring(struct gve_priv *priv, int idx)
{
struct gve_rx_ring *rx = &priv->rx[idx];
struct device *hdev = &priv->pdev->dev;
u32 slots, npages;
int filled_pages;
size_t bytes;
int err;
netif_dbg(priv, drv, priv->dev, "allocating rx ring\n");
/* Make sure everything is zeroed to start with */
memset(rx, 0, sizeof(*rx));
rx->gve = priv;
rx->q_num = idx;
slots = priv->rx_pages_per_qpl;
rx->data.mask = slots - 1;
/* alloc rx data ring */
bytes = sizeof(*rx->data.data_ring) * slots;
rx->data.data_ring = dma_alloc_coherent(hdev, bytes,
&rx->data.data_bus,
GFP_KERNEL);
if (!rx->data.data_ring)
return -ENOMEM;
filled_pages = gve_prefill_rx_pages(rx);
if (filled_pages < 0) {
err = -ENOMEM;
goto abort_with_slots;
}
rx->desc.fill_cnt = filled_pages;
/* Ensure data ring slots (packet buffers) are visible. */
dma_wmb();
/* Alloc gve_queue_resources */
rx->q_resources =
dma_alloc_coherent(hdev,
sizeof(*rx->q_resources),
&rx->q_resources_bus,
GFP_KERNEL);
if (!rx->q_resources) {
err = -ENOMEM;
goto abort_filled;
}
netif_dbg(priv, drv, priv->dev, "rx[%d]->data.data_bus=%lx\n", idx,
(unsigned long)rx->data.data_bus);
/* alloc rx desc ring */
bytes = sizeof(struct gve_rx_desc) * priv->rx_desc_cnt;
npages = bytes / PAGE_SIZE;
if (npages * PAGE_SIZE != bytes) {
err = -EIO;
goto abort_with_q_resources;
}
rx->desc.desc_ring = dma_alloc_coherent(hdev, bytes, &rx->desc.bus,
GFP_KERNEL);
if (!rx->desc.desc_ring) {
err = -ENOMEM;
goto abort_with_q_resources;
}
rx->desc.mask = slots - 1;
rx->desc.cnt = 0;
rx->desc.seqno = 1;
gve_rx_add_to_block(priv, idx);
return 0;
abort_with_q_resources:
dma_free_coherent(hdev, sizeof(*rx->q_resources),
rx->q_resources, rx->q_resources_bus);
rx->q_resources = NULL;
abort_filled:
kfree(rx->data.page_info);
abort_with_slots:
bytes = sizeof(*rx->data.data_ring) * slots;
dma_free_coherent(hdev, bytes, rx->data.data_ring, rx->data.data_bus);
rx->data.data_ring = NULL;
return err;
}
int gve_rx_alloc_rings(struct gve_priv *priv)
{
int err = 0;
int i;
for (i = 0; i < priv->rx_cfg.num_queues; i++) {
err = gve_rx_alloc_ring(priv, i);
if (err) {
netif_err(priv, drv, priv->dev,
"Failed to alloc rx ring=%d: err=%d\n",
i, err);
break;
}
}
/* Unallocate if there was an error */
if (err) {
int j;
for (j = 0; j < i; j++)
gve_rx_free_ring(priv, j);
}
return err;
}
void gve_rx_free_rings(struct gve_priv *priv)
{
int i;
for (i = 0; i < priv->rx_cfg.num_queues; i++)
gve_rx_free_ring(priv, i);
}
void gve_rx_write_doorbell(struct gve_priv *priv, struct gve_rx_ring *rx)
{
u32 db_idx = be32_to_cpu(rx->q_resources->db_index);
iowrite32be(rx->desc.fill_cnt, &priv->db_bar2[db_idx]);
}
static enum pkt_hash_types gve_rss_type(__be16 pkt_flags)
{
if (likely(pkt_flags & (GVE_RXF_TCP | GVE_RXF_UDP)))
return PKT_HASH_TYPE_L4;
if (pkt_flags & (GVE_RXF_IPV4 | GVE_RXF_IPV6))
return PKT_HASH_TYPE_L3;
return PKT_HASH_TYPE_L2;
}
static struct sk_buff *gve_rx_copy(struct net_device *dev,
struct napi_struct *napi,
struct gve_rx_slot_page_info *page_info,
u16 len)
{
struct sk_buff *skb = napi_alloc_skb(napi, len);
void *va = page_info->page_address + GVE_RX_PAD +
page_info->page_offset;
if (unlikely(!skb))
return NULL;
__skb_put(skb, len);
skb_copy_to_linear_data(skb, va, len);
skb->protocol = eth_type_trans(skb, dev);
return skb;
}
static struct sk_buff *gve_rx_add_frags(struct net_device *dev,
struct napi_struct *napi,
struct gve_rx_slot_page_info *page_info,
u16 len)
{
struct sk_buff *skb = napi_get_frags(napi);
if (unlikely(!skb))
return NULL;
skb_add_rx_frag(skb, 0, page_info->page,
page_info->page_offset +
GVE_RX_PAD, len, PAGE_SIZE / 2);
return skb;
}
static void gve_rx_flip_buff(struct gve_rx_slot_page_info *page_info,
struct gve_rx_data_slot *data_ring)
{
u64 addr = be64_to_cpu(data_ring->qpl_offset);
page_info->page_offset ^= PAGE_SIZE / 2;
addr ^= PAGE_SIZE / 2;
data_ring->qpl_offset = cpu_to_be64(addr);
}
static bool gve_rx(struct gve_rx_ring *rx, struct gve_rx_desc *rx_desc,
netdev_features_t feat)
{
struct gve_rx_slot_page_info *page_info;
struct gve_priv *priv = rx->gve;
struct napi_struct *napi = &priv->ntfy_blocks[rx->ntfy_id].napi;
struct net_device *dev = priv->dev;
struct sk_buff *skb;
int pagecount;
u16 len;
u32 idx;
/* drop this packet */
if (unlikely(rx_desc->flags_seq & GVE_RXF_ERR))
return true;
len = be16_to_cpu(rx_desc->len) - GVE_RX_PAD;
idx = rx->data.cnt & rx->data.mask;
page_info = &rx->data.page_info[idx];
/* gvnic can only receive into registered segments. If the buffer
* can't be recycled, our only choice is to copy the data out of
* it so that we can return it to the device.
*/
#if PAGE_SIZE == 4096
if (len <= priv->rx_copybreak) {
/* Just copy small packets */
skb = gve_rx_copy(dev, napi, page_info, len);
goto have_skb;
}
if (unlikely(!gve_can_recycle_pages(dev))) {
skb = gve_rx_copy(dev, napi, page_info, len);
goto have_skb;
}
pagecount = page_count(page_info->page);
if (pagecount == 1) {
/* No part of this page is used by any SKBs; we attach
* the page fragment to a new SKB and pass it up the
* stack.
*/
skb = gve_rx_add_frags(dev, napi, page_info, len);
if (!skb)
return true;
/* Make sure the kernel stack can't release the page */
get_page(page_info->page);
/* "flip" to other packet buffer on this page */
gve_rx_flip_buff(page_info, &rx->data.data_ring[idx]);
} else if (pagecount >= 2) {
/* We have previously passed the other half of this
* page up the stack, but it has not yet been freed.
*/
skb = gve_rx_copy(dev, napi, page_info, len);
} else {
WARN(pagecount < 1, "Pagecount should never be < 1");
return false;
}
#else
skb = gve_rx_copy(dev, napi, page_info, len);
#endif
have_skb:
/* We didn't manage to allocate an skb but we haven't had any
* reset worthy failures.
*/
if (!skb)
return true;
rx->data.cnt++;
if (likely(feat & NETIF_F_RXCSUM)) {
/* NIC passes up the partial sum */
if (rx_desc->csum)
skb->ip_summed = CHECKSUM_COMPLETE;
else
skb->ip_summed = CHECKSUM_NONE;
skb->csum = csum_unfold(rx_desc->csum);
}
/* parse flags & pass relevant info up */
if (likely(feat & NETIF_F_RXHASH) &&
gve_needs_rss(rx_desc->flags_seq))
skb_set_hash(skb, be32_to_cpu(rx_desc->rss_hash),
gve_rss_type(rx_desc->flags_seq));
if (skb_is_nonlinear(skb))
napi_gro_frags(napi);
else
napi_gro_receive(napi, skb);
return true;
}
static bool gve_rx_work_pending(struct gve_rx_ring *rx)
{
struct gve_rx_desc *desc;
__be16 flags_seq;
u32 next_idx;
next_idx = rx->desc.cnt & rx->desc.mask;
desc = rx->desc.desc_ring + next_idx;
flags_seq = desc->flags_seq;
/* Make sure we have synchronized the seq no with the device */
smp_rmb();
return (GVE_SEQNO(flags_seq) == rx->desc.seqno);
}
bool gve_clean_rx_done(struct gve_rx_ring *rx, int budget,
netdev_features_t feat)
{
struct gve_priv *priv = rx->gve;
struct gve_rx_desc *desc;
u32 cnt = rx->desc.cnt;
u32 idx = cnt & rx->desc.mask;
u32 work_done = 0;
u64 bytes = 0;
desc = rx->desc.desc_ring + idx;
while ((GVE_SEQNO(desc->flags_seq) == rx->desc.seqno) &&
work_done < budget) {
netif_info(priv, rx_status, priv->dev,
"[%d] idx=%d desc=%p desc->flags_seq=0x%x\n",
rx->q_num, idx, desc, desc->flags_seq);
netif_info(priv, rx_status, priv->dev,
"[%d] seqno=%d rx->desc.seqno=%d\n",
rx->q_num, GVE_SEQNO(desc->flags_seq),
rx->desc.seqno);
bytes += be16_to_cpu(desc->len) - GVE_RX_PAD;
if (!gve_rx(rx, desc, feat))
gve_schedule_reset(priv);
cnt++;
idx = cnt & rx->desc.mask;
desc = rx->desc.desc_ring + idx;
rx->desc.seqno = gve_next_seqno(rx->desc.seqno);
work_done++;
}
if (!work_done)
return false;
u64_stats_update_begin(&rx->statss);
rx->rpackets += work_done;
rx->rbytes += bytes;
u64_stats_update_end(&rx->statss);
rx->desc.cnt = cnt;
rx->desc.fill_cnt += work_done;
/* restock desc ring slots */
dma_wmb(); /* Ensure descs are visible before ringing doorbell */
gve_rx_write_doorbell(priv, rx);
return gve_rx_work_pending(rx);
}
bool gve_rx_poll(struct gve_notify_block *block, int budget)
{
struct gve_rx_ring *rx = block->rx;
netdev_features_t feat;
bool repoll = false;
feat = block->napi.dev->features;
/* If budget is 0, do all the work */
if (budget == 0)
budget = INT_MAX;
if (budget > 0)
repoll |= gve_clean_rx_done(rx, budget, feat);
else
repoll |= gve_rx_work_pending(rx);
return repoll;
}
// SPDX-License-Identifier: (GPL-2.0 OR MIT)
/* Google virtual Ethernet (gve) driver
*
* Copyright (C) 2015-2019 Google, Inc.
*/
#include "gve.h"
#include "gve_adminq.h"
#include <linux/ip.h>
#include <linux/tcp.h>
#include <linux/vmalloc.h>
#include <linux/skbuff.h>
static inline void gve_tx_put_doorbell(struct gve_priv *priv,
struct gve_queue_resources *q_resources,
u32 val)
{
iowrite32be(val, &priv->db_bar2[be32_to_cpu(q_resources->db_index)]);
}
/* gvnic can only transmit from a Registered Segment.
* We copy skb payloads into the registered segment before writing Tx
* descriptors and ringing the Tx doorbell.
*
* gve_tx_fifo_* manages the Registered Segment as a FIFO - clients must
* free allocations in the order they were allocated.
*/
static int gve_tx_fifo_init(struct gve_priv *priv, struct gve_tx_fifo *fifo)
{
fifo->base = vmap(fifo->qpl->pages, fifo->qpl->num_entries, VM_MAP,
PAGE_KERNEL);
if (unlikely(!fifo->base)) {
netif_err(priv, drv, priv->dev, "Failed to vmap fifo, qpl_id = %d\n",
fifo->qpl->id);
return -ENOMEM;
}
fifo->size = fifo->qpl->num_entries * PAGE_SIZE;
atomic_set(&fifo->available, fifo->size);
fifo->head = 0;
return 0;
}
static void gve_tx_fifo_release(struct gve_priv *priv, struct gve_tx_fifo *fifo)
{
WARN(atomic_read(&fifo->available) != fifo->size,
"Releasing non-empty fifo");
vunmap(fifo->base);
}
static int gve_tx_fifo_pad_alloc_one_frag(struct gve_tx_fifo *fifo,
size_t bytes)
{
return (fifo->head + bytes < fifo->size) ? 0 : fifo->size - fifo->head;
}
static bool gve_tx_fifo_can_alloc(struct gve_tx_fifo *fifo, size_t bytes)
{
return (atomic_read(&fifo->available) <= bytes) ? false : true;
}
/* gve_tx_alloc_fifo - Allocate fragment(s) from Tx FIFO
* @fifo: FIFO to allocate from
* @bytes: Allocation size
* @iov: Scatter-gather elements to fill with allocation fragment base/len
*
* Returns number of valid elements in iov[] or negative on error.
*
* Allocations from a given FIFO must be externally synchronized but concurrent
* allocation and frees are allowed.
*/
static int gve_tx_alloc_fifo(struct gve_tx_fifo *fifo, size_t bytes,
struct gve_tx_iovec iov[2])
{
size_t overflow, padding;
u32 aligned_head;
int nfrags = 0;
if (!bytes)
return 0;
/* This check happens before we know how much padding is needed to
* align to a cacheline boundary for the payload, but that is fine,
* because the FIFO head always start aligned, and the FIFO's boundaries
* are aligned, so if there is space for the data, there is space for
* the padding to the next alignment.
*/
WARN(!gve_tx_fifo_can_alloc(fifo, bytes),
"Reached %s when there's not enough space in the fifo", __func__);
nfrags++;
iov[0].iov_offset = fifo->head;
iov[0].iov_len = bytes;
fifo->head += bytes;
if (fifo->head > fifo->size) {
/* If the allocation did not fit in the tail fragment of the
* FIFO, also use the head fragment.
*/
nfrags++;
overflow = fifo->head - fifo->size;
iov[0].iov_len -= overflow;
iov[1].iov_offset = 0; /* Start of fifo*/
iov[1].iov_len = overflow;
fifo->head = overflow;
}
/* Re-align to a cacheline boundary */
aligned_head = L1_CACHE_ALIGN(fifo->head);
padding = aligned_head - fifo->head;
iov[nfrags - 1].iov_padding = padding;
atomic_sub(bytes + padding, &fifo->available);
fifo->head = aligned_head;
if (fifo->head == fifo->size)
fifo->head = 0;
return nfrags;
}
/* gve_tx_free_fifo - Return space to Tx FIFO
* @fifo: FIFO to return fragments to
* @bytes: Bytes to free
*/
static void gve_tx_free_fifo(struct gve_tx_fifo *fifo, size_t bytes)
{
atomic_add(bytes, &fifo->available);
}
static void gve_tx_remove_from_block(struct gve_priv *priv, int queue_idx)
{
struct gve_notify_block *block =
&priv->ntfy_blocks[gve_tx_idx_to_ntfy(priv, queue_idx)];
block->tx = NULL;
}
static int gve_clean_tx_done(struct gve_priv *priv, struct gve_tx_ring *tx,
u32 to_do, bool try_to_wake);
static void gve_tx_free_ring(struct gve_priv *priv, int idx)
{
struct gve_tx_ring *tx = &priv->tx[idx];
struct device *hdev = &priv->pdev->dev;
size_t bytes;
u32 slots;
gve_tx_remove_from_block(priv, idx);
slots = tx->mask + 1;
gve_clean_tx_done(priv, tx, tx->req, false);
netdev_tx_reset_queue(tx->netdev_txq);
dma_free_coherent(hdev, sizeof(*tx->q_resources),
tx->q_resources, tx->q_resources_bus);
tx->q_resources = NULL;
gve_tx_fifo_release(priv, &tx->tx_fifo);
gve_unassign_qpl(priv, tx->tx_fifo.qpl->id);
tx->tx_fifo.qpl = NULL;
bytes = sizeof(*tx->desc) * slots;
dma_free_coherent(hdev, bytes, tx->desc, tx->bus);
tx->desc = NULL;
vfree(tx->info);
tx->info = NULL;
netif_dbg(priv, drv, priv->dev, "freed tx queue %d\n", idx);
}
static void gve_tx_add_to_block(struct gve_priv *priv, int queue_idx)
{
int ntfy_idx = gve_tx_idx_to_ntfy(priv, queue_idx);
struct gve_notify_block *block = &priv->ntfy_blocks[ntfy_idx];
struct gve_tx_ring *tx = &priv->tx[queue_idx];
block->tx = tx;
tx->ntfy_id = ntfy_idx;
}
static int gve_tx_alloc_ring(struct gve_priv *priv, int idx)
{
struct gve_tx_ring *tx = &priv->tx[idx];
struct device *hdev = &priv->pdev->dev;
u32 slots = priv->tx_desc_cnt;
size_t bytes;
/* Make sure everything is zeroed to start */
memset(tx, 0, sizeof(*tx));
tx->q_num = idx;
tx->mask = slots - 1;
/* alloc metadata */
tx->info = vzalloc(sizeof(*tx->info) * slots);
if (!tx->info)
return -ENOMEM;
/* alloc tx queue */
bytes = sizeof(*tx->desc) * slots;
tx->desc = dma_alloc_coherent(hdev, bytes, &tx->bus, GFP_KERNEL);
if (!tx->desc)
goto abort_with_info;
tx->tx_fifo.qpl = gve_assign_tx_qpl(priv);
/* map Tx FIFO */
if (gve_tx_fifo_init(priv, &tx->tx_fifo))
goto abort_with_desc;
tx->q_resources =
dma_alloc_coherent(hdev,
sizeof(*tx->q_resources),
&tx->q_resources_bus,
GFP_KERNEL);
if (!tx->q_resources)
goto abort_with_fifo;
netif_dbg(priv, drv, priv->dev, "tx[%d]->bus=%lx\n", idx,
(unsigned long)tx->bus);
tx->netdev_txq = netdev_get_tx_queue(priv->dev, idx);
gve_tx_add_to_block(priv, idx);
return 0;
abort_with_fifo:
gve_tx_fifo_release(priv, &tx->tx_fifo);
abort_with_desc:
dma_free_coherent(hdev, bytes, tx->desc, tx->bus);
tx->desc = NULL;
abort_with_info:
vfree(tx->info);
tx->info = NULL;
return -ENOMEM;
}
int gve_tx_alloc_rings(struct gve_priv *priv)
{
int err = 0;
int i;
for (i = 0; i < priv->tx_cfg.num_queues; i++) {
err = gve_tx_alloc_ring(priv, i);
if (err) {
netif_err(priv, drv, priv->dev,
"Failed to alloc tx ring=%d: err=%d\n",
i, err);
break;
}
}
/* Unallocate if there was an error */
if (err) {
int j;
for (j = 0; j < i; j++)
gve_tx_free_ring(priv, j);
}
return err;
}
void gve_tx_free_rings(struct gve_priv *priv)
{
int i;
for (i = 0; i < priv->tx_cfg.num_queues; i++)
gve_tx_free_ring(priv, i);
}
/* gve_tx_avail - Calculates the number of slots available in the ring
* @tx: tx ring to check
*
* Returns the number of slots available
*
* The capacity of the queue is mask + 1. We don't need to reserve an entry.
**/
static inline u32 gve_tx_avail(struct gve_tx_ring *tx)
{
return tx->mask + 1 - (tx->req - tx->done);
}
static inline int gve_skb_fifo_bytes_required(struct gve_tx_ring *tx,
struct sk_buff *skb)
{
int pad_bytes, align_hdr_pad;
int bytes;
int hlen;
hlen = skb_is_gso(skb) ? skb_checksum_start_offset(skb) +
tcp_hdrlen(skb) : skb_headlen(skb);
pad_bytes = gve_tx_fifo_pad_alloc_one_frag(&tx->tx_fifo,
hlen);
/* We need to take into account the header alignment padding. */
align_hdr_pad = L1_CACHE_ALIGN(hlen) - hlen;
bytes = align_hdr_pad + pad_bytes + skb->len;
return bytes;
}
/* The most descriptors we could need are 3 - 1 for the headers, 1 for
* the beginning of the payload at the end of the FIFO, and 1 if the
* payload wraps to the beginning of the FIFO.
*/
#define MAX_TX_DESC_NEEDED 3
/* Check if sufficient resources (descriptor ring space, FIFO space) are
* available to transmit the given number of bytes.
*/
static inline bool gve_can_tx(struct gve_tx_ring *tx, int bytes_required)
{
return (gve_tx_avail(tx) >= MAX_TX_DESC_NEEDED &&
gve_tx_fifo_can_alloc(&tx->tx_fifo, bytes_required));
}
/* Stops the queue if the skb cannot be transmitted. */
static int gve_maybe_stop_tx(struct gve_tx_ring *tx, struct sk_buff *skb)
{
int bytes_required;
bytes_required = gve_skb_fifo_bytes_required(tx, skb);
if (likely(gve_can_tx(tx, bytes_required)))
return 0;
/* No space, so stop the queue */
tx->stop_queue++;
netif_tx_stop_queue(tx->netdev_txq);
smp_mb(); /* sync with restarting queue in gve_clean_tx_done() */
/* Now check for resources again, in case gve_clean_tx_done() freed
* resources after we checked and we stopped the queue after
* gve_clean_tx_done() checked.
*
* gve_maybe_stop_tx() gve_clean_tx_done()
* nsegs/can_alloc test failed
* gve_tx_free_fifo()
* if (tx queue stopped)
* netif_tx_queue_wake()
* netif_tx_stop_queue()
* Need to check again for space here!
*/
if (likely(!gve_can_tx(tx, bytes_required)))
return -EBUSY;
netif_tx_start_queue(tx->netdev_txq);
tx->wake_queue++;
return 0;
}
static void gve_tx_fill_pkt_desc(union gve_tx_desc *pkt_desc,
struct sk_buff *skb, bool is_gso,
int l4_hdr_offset, u32 desc_cnt,
u16 hlen, u64 addr)
{
/* l4_hdr_offset and csum_offset are in units of 16-bit words */
if (is_gso) {
pkt_desc->pkt.type_flags = GVE_TXD_TSO | GVE_TXF_L4CSUM;
pkt_desc->pkt.l4_csum_offset = skb->csum_offset >> 1;
pkt_desc->pkt.l4_hdr_offset = l4_hdr_offset >> 1;
} else if (likely(skb->ip_summed == CHECKSUM_PARTIAL)) {
pkt_desc->pkt.type_flags = GVE_TXD_STD | GVE_TXF_L4CSUM;
pkt_desc->pkt.l4_csum_offset = skb->csum_offset >> 1;
pkt_desc->pkt.l4_hdr_offset = l4_hdr_offset >> 1;
} else {
pkt_desc->pkt.type_flags = GVE_TXD_STD;
pkt_desc->pkt.l4_csum_offset = 0;
pkt_desc->pkt.l4_hdr_offset = 0;
}
pkt_desc->pkt.desc_cnt = desc_cnt;
pkt_desc->pkt.len = cpu_to_be16(skb->len);
pkt_desc->pkt.seg_len = cpu_to_be16(hlen);
pkt_desc->pkt.seg_addr = cpu_to_be64(addr);
}
static void gve_tx_fill_seg_desc(union gve_tx_desc *seg_desc,
struct sk_buff *skb, bool is_gso,
u16 len, u64 addr)
{
seg_desc->seg.type_flags = GVE_TXD_SEG;
if (is_gso) {
if (skb_is_gso_v6(skb))
seg_desc->seg.type_flags |= GVE_TXSF_IPV6;
seg_desc->seg.l3_offset = skb_network_offset(skb) >> 1;
seg_desc->seg.mss = cpu_to_be16(skb_shinfo(skb)->gso_size);
}
seg_desc->seg.seg_len = cpu_to_be16(len);
seg_desc->seg.seg_addr = cpu_to_be64(addr);
}
static int gve_tx_add_skb(struct gve_tx_ring *tx, struct sk_buff *skb)
{
int pad_bytes, hlen, hdr_nfrags, payload_nfrags, l4_hdr_offset;
union gve_tx_desc *pkt_desc, *seg_desc;
struct gve_tx_buffer_state *info;
bool is_gso = skb_is_gso(skb);
u32 idx = tx->req & tx->mask;
int payload_iov = 2;
int copy_offset;
u32 next_idx;
int i;
info = &tx->info[idx];
pkt_desc = &tx->desc[idx];
l4_hdr_offset = skb_checksum_start_offset(skb);
/* If the skb is gso, then we want the tcp header in the first segment
* otherwise we want the linear portion of the skb (which will contain
* the checksum because skb->csum_start and skb->csum_offset are given
* relative to skb->head) in the first segment.
*/
hlen = is_gso ? l4_hdr_offset + tcp_hdrlen(skb) :
skb_headlen(skb);
info->skb = skb;
/* We don't want to split the header, so if necessary, pad to the end
* of the fifo and then put the header at the beginning of the fifo.
*/
pad_bytes = gve_tx_fifo_pad_alloc_one_frag(&tx->tx_fifo, hlen);
hdr_nfrags = gve_tx_alloc_fifo(&tx->tx_fifo, hlen + pad_bytes,
&info->iov[0]);
WARN(!hdr_nfrags, "hdr_nfrags should never be 0!");
payload_nfrags = gve_tx_alloc_fifo(&tx->tx_fifo, skb->len - hlen,
&info->iov[payload_iov]);
gve_tx_fill_pkt_desc(pkt_desc, skb, is_gso, l4_hdr_offset,
1 + payload_nfrags, hlen,
info->iov[hdr_nfrags - 1].iov_offset);
skb_copy_bits(skb, 0,
tx->tx_fifo.base + info->iov[hdr_nfrags - 1].iov_offset,
hlen);
copy_offset = hlen;
for (i = payload_iov; i < payload_nfrags + payload_iov; i++) {
next_idx = (tx->req + 1 + i - payload_iov) & tx->mask;
seg_desc = &tx->desc[next_idx];
gve_tx_fill_seg_desc(seg_desc, skb, is_gso,
info->iov[i].iov_len,
info->iov[i].iov_offset);
skb_copy_bits(skb, copy_offset,
tx->tx_fifo.base + info->iov[i].iov_offset,
info->iov[i].iov_len);
copy_offset += info->iov[i].iov_len;
}
return 1 + payload_nfrags;
}
netdev_tx_t gve_tx(struct sk_buff *skb, struct net_device *dev)
{
struct gve_priv *priv = netdev_priv(dev);
struct gve_tx_ring *tx;
int nsegs;
WARN(skb_get_queue_mapping(skb) > priv->tx_cfg.num_queues,
"skb queue index out of range");
tx = &priv->tx[skb_get_queue_mapping(skb)];
if (unlikely(gve_maybe_stop_tx(tx, skb))) {
/* We need to ring the txq doorbell -- we have stopped the Tx
* queue for want of resources, but prior calls to gve_tx()
* may have added descriptors without ringing the doorbell.
*/
/* Ensure tx descs from a prior gve_tx are visible before
* ringing doorbell.
*/
dma_wmb();
gve_tx_put_doorbell(priv, tx->q_resources, tx->req);
return NETDEV_TX_BUSY;
}
nsegs = gve_tx_add_skb(tx, skb);
netdev_tx_sent_queue(tx->netdev_txq, skb->len);
skb_tx_timestamp(skb);
/* give packets to NIC */
tx->req += nsegs;
if (!netif_xmit_stopped(tx->netdev_txq) && netdev_xmit_more())
return NETDEV_TX_OK;
/* Ensure tx descs are visible before ringing doorbell */
dma_wmb();
gve_tx_put_doorbell(priv, tx->q_resources, tx->req);
return NETDEV_TX_OK;
}
#define GVE_TX_START_THRESH PAGE_SIZE
static int gve_clean_tx_done(struct gve_priv *priv, struct gve_tx_ring *tx,
u32 to_do, bool try_to_wake)
{
struct gve_tx_buffer_state *info;
u64 pkts = 0, bytes = 0;
size_t space_freed = 0;
struct sk_buff *skb;
int i, j;
u32 idx;
for (j = 0; j < to_do; j++) {
idx = tx->done & tx->mask;
netif_info(priv, tx_done, priv->dev,
"[%d] %s: idx=%d (req=%u done=%u)\n",
tx->q_num, __func__, idx, tx->req, tx->done);
info = &tx->info[idx];
skb = info->skb;
/* Mark as free */
if (skb) {
info->skb = NULL;
bytes += skb->len;
pkts++;
dev_consume_skb_any(skb);
/* FIFO free */
for (i = 0; i < ARRAY_SIZE(info->iov); i++) {
space_freed += info->iov[i].iov_len +
info->iov[i].iov_padding;
info->iov[i].iov_len = 0;
info->iov[i].iov_padding = 0;
}
}
tx->done++;
}
gve_tx_free_fifo(&tx->tx_fifo, space_freed);
u64_stats_update_begin(&tx->statss);
tx->bytes_done += bytes;
tx->pkt_done += pkts;
u64_stats_update_end(&tx->statss);
netdev_tx_completed_queue(tx->netdev_txq, pkts, bytes);
/* start the queue if we've stopped it */
#ifndef CONFIG_BQL
/* Make sure that the doorbells are synced */
smp_mb();
#endif
if (try_to_wake && netif_tx_queue_stopped(tx->netdev_txq) &&
likely(gve_can_tx(tx, GVE_TX_START_THRESH))) {
tx->wake_queue++;
netif_tx_wake_queue(tx->netdev_txq);
}
return pkts;
}
__be32 gve_tx_load_event_counter(struct gve_priv *priv,
struct gve_tx_ring *tx)
{
u32 counter_index = be32_to_cpu((tx->q_resources->counter_index));
return READ_ONCE(priv->counter_array[counter_index]);
}
bool gve_tx_poll(struct gve_notify_block *block, int budget)
{
struct gve_priv *priv = block->priv;
struct gve_tx_ring *tx = block->tx;
bool repoll = false;
u32 nic_done;
u32 to_do;
/* If budget is 0, do all the work */
if (budget == 0)
budget = INT_MAX;
/* Find out how much work there is to be done */
tx->last_nic_done = gve_tx_load_event_counter(priv, tx);
nic_done = be32_to_cpu(tx->last_nic_done);
if (budget > 0) {
/* Do as much work as we have that the budget will
* allow
*/
to_do = min_t(u32, (nic_done - tx->done), budget);
gve_clean_tx_done(priv, tx, to_do, true);
}
/* If we still have work we want to repoll */
repoll |= (nic_done != tx->done);
return repoll;
}
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册