提交 · 5.10.0-17.0.0 · openeuler / Kernel

11 11月, 2021 40 次提交

net: phy: realtek: net: Fix less than zero comparison of a u16 · 9fda9b0c

由 Colin Ian King 提交于 11月 11, 2021

mainline inclusion
from mainline-v5.14-rc1
commit f25247d8
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4H7E6
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f25247d88708

----------------------------------------------------------------------

The comparisons of the u16 values priv->phycr1 and priv->phycr2 to less
than zero always false because they are unsigned. Fix this by using an
int for the assignment and less than zero check.

Addresses-Coverity: ("Unsigned compared against 0")
Fixes: 0a4355c2 ("net: phy: realtek: add dt property to disable CLKOUT clock")
Fixes: d90db36a ("net: phy: realtek: add dt property to enable ALDPS mode")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Reviewed-by: NYongxin Li <liyongxin1@huawei.com>
Signed-off-by: NJunxin Chen <chenjunxin1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9fda9b0c

net: phy: realtek: add dt property to enable ALDPS mode · 1a2a1573

由 Joakim Zhang 提交于 11月 11, 2021

mainline inclusion
from mainline-v5.14-rc1
commit d90db36a
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4H7E6
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d90db36a9e74

----------------------------------------------------------------------

If enable Advance Link Down Power Saving (ALDPS) mode, it will change
crystal/clock behavior, which cause RXC clock stop for dozens to hundreds
of miliseconds. This is comfirmed by Realtek engineer. For some MACs, it
needs RXC clock to support RX logic, after this patch, PHY can generate
continuous RXC clock during auto-negotiation.

ALDPS default is disabled after hardware reset, it's more reasonable to
add a property to enable this feature, since ALDPS would introduce side effect.
This patch adds dt property "realtek,aldps-enable" to enable ALDPS mode
per users' requirement.

Jisheng Zhang enables this feature, changes the default behavior. Since
mine patch breaks the rule that new implementation should not break
existing design, so Cc'ed let him know to see if it can be accepted.

Cc: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: NJoakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Reviewed-by: NYongxin Li <liyongxin1@huawei.com>
Signed-off-by: NJunxin Chen <chenjunxin1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1a2a1573

net: phy: realtek: add dt property to disable CLKOUT clock · 68267f63

由 Joakim Zhang 提交于 11月 11, 2021

mainline inclusion
from mainline-v5.14-rc1
commit 0a4355c2
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4H7E6
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0a4355c2b7f8

----------------------------------------------------------------------

CLKOUT is enabled by default after PHY hardware reset, this patch adds
"realtek,clkout-disable" property for user to disable CLKOUT clock
to save PHY power.

Per RTL8211F guide, a PHY reset should be issued after setting these
bits in PHYCR2 register. After this patch, CLKOUT clock output to be
disabled.
Signed-off-by: NJoakim Zhang <qiangqing.zhang@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Reviewed-by: NYongxin Li <liyongxin1@huawei.com>
Signed-off-by: NJunxin Chen <chenjunxin1@huawei.com>
(fix conflicts: adjust the location of RTL8211F_CLKOUT_EN)
(fix conflicts: replace rtl8211e_config_init by rtl821x_resume)
(fix conflicts: replace .read_status by .ack_interrupt)
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

68267f63

openeuler_defconfig: Build HISI PMU drivers as modules. · eebf2413

由 l00525341 提交于 11月 11, 2021

driver inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4A1WQ

------------------------------------------------------------------------

Build HISI PMU drivers as modules.
Signed-off-by: NQi Liu <liuqi115@huawei.com>
Reviewed-by: NShaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

eebf2413

configs: add config BMA to config files · a14123a6

由 Qi Deng 提交于 11月 11, 2021

driver inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4ETXO
CVE: NA

-----------------------------------------

Make CONFIG_BMA=m, except euleros_defconfig.

Link: https://lkml.org/lkml/2020/6/22/752Signed-off-by: NQi Deng <dengqi7@huawei.com>
Reviewed-by: NQidong Wang <wangqindong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a14123a6

Huawei BMA: Adding Huawei BMA driver: cdev_veth_drv · 8e695222

由 Qi Deng 提交于 11月 11, 2021

driver inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4ETXO
CVE: NA

-----------------------------------------

The BMA software is a system management software offered by Huawei. It supports
the status monitoring, performance monitoring, and event monitoring of various
components, including server CPUs, memory, hard disks, NICs, IB cards, PCIe
cards, RAID controller cards, and optical modules.

This cdev_veth_drv driver is one of the communication drivers used by BMA
software. It depends on the host_edma_drv driver. It will create a char device
once loaded, offering interfaces (open, close, read, write and poll) to BMA to
send/receive RESTful messages between BMC software. When the message is longer
than 1KB, it will be cut into packets of 1KB. The other side, BMC's cdev_veth
driver, will assemble these packets back into original mesages.

Link: https://lkml.org/lkml/2020/6/22/752Signed-off-by: NQi Deng <dengqi7@huawei.com>
Reviewed-by: NQidong Wang <wangqindong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8e695222

Huawei BMA: Adding Huawei BMA driver: host_kbox_drv · a5b9149c

由 Qi Deng 提交于 11月 11, 2021

driver inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4ETXO
CVE: NA

-----------------------------------------

The BMA software is a system management software offered by Huawei. It supports
the status monitoring, performance monitoring, and event monitoring of various
components, including server CPUs, memory, hard disks, NICs, IB cards, PCIe
cards, RAID controller cards, and optical modules.

The host_kbox_drv driver serves the function of a black box. When a panic or mce
event happen to the system, it will record the event time, system's status and
system logs and send them to BMC before the OS shutdown. This driver depends on
the host_edms_drv driver.

Link: https://lkml.org/lkml/2020/6/22/752Signed-off-by: NQi Deng <dengqi7@huawei.com>
Reviewed-by: NQidong Wang <wangqindong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a5b9149c

Huawei BMA: Adding Huawei BMA driver: host_veth_drv · 5110b9f4

由 Qi Deng 提交于 11月 11, 2021

driver inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4ETXO
CVE: NA

-----------------------------------------

The BMA software is a system management software offered by Huawei. It supports
the status monitoring, performance monitoring, and event monitoring of various
components, including server CPUs, memory, hard disks, NICs, IB cards, PCIe
cards, RAID controller cards, and optical modules.

This host_veth_drv driver is one of the communication drivers used by BMA
software. It depends on the host_edma_drv driver. The host_veth_drv driver will
create a virtual net device "veth" once loaded. BMA software will use it to
send/receive RESTful messages to/from BMC software.

Link: https://lkml.org/lkml/2020/6/22/752Signed-off-by: NQi Deng <dengqi7@huawei.com>
Reviewed-by: NQidong Wang <wangqindong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5110b9f4

Huawei BMA: Adding Huawei BMA driver: host_cdev_drv · 109f8498

由 Qi Deng 提交于 11月 11, 2021

driver inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4ETXO
CVE: NA

-----------------------------------------

The BMA software is a system management software offered by Huawei. It supports
the status monitoring, performance monitoring, and event monitoring of various
components, including server CPUs, memory, hard disks, NICs, IB cards, PCIe
cards, RAID controller cards, and optical modules.

This host_cdev_drv driver is one of the communication driver used by BMA
software. It depends on the host_edma_drv driver. The host_cdev_drv driver will
create 4 char devices(hwbmc0, hwbmc1, hwbmc2, hwbmc3) once loaded. These char
devices offer interfaces, including open, close, read, write and poll, to upper
level applications. BMA uses them to send/receive ipmi commons to/from BMC.

Link: https://lkml.org/lkml/2020/6/22/752Signed-off-by: NQi Deng <dengqi7@huawei.com>
Reviewed-by: NQidong Wang <wangqindong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

109f8498

Huawei BMA: Adding Huawei BMA driver: host_edma_drv · 915ded04

由 Qi Deng 提交于 11月 11, 2021

driver inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4ETXO
CVE: NA

-----------------------------------------

The BMA software is a system management software offered by Huawei. It supports
the status monitoring, performance monitoring, and event monitoring of various
components, including server CPUs, memory, hard disks, NICs, IB cards, PCIe
cards, RAID controller cards, and optical modules.

This host_edma_drv driver is a PCIe driver used by Huawei BMA software. The main
function of it is to control the PCIe bus between BMA software and Huawei 1711
chip. The chip will then process the data and display to users. This
host_edma_drv driver offers API to send/receive data for other BMA drivers which
want to use the PCIe channel in different ways(eg.host_cdev_drv, host_veth_drv).

Link: https://lkml.org/lkml/2020/6/22/752Signed-off-by: NQi Deng <dengqi7@huawei.com>
Reviewed-by: NQidong Wang <wangqindong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

915ded04

page_pool: disable dma mapping support for 32-bit arch with 64-bit DMA · bf9e5e0f

由 Yunsheng Lin 提交于 11月 11, 2021

mainline inclusion
from mainline-master
commit d00e60ee
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4CVS3
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d00e60ee54b1

----------------------------------------------------------------------

As the 32-bit arch with 64-bit DMA seems to rare those days,
and page pool might carry a lot of code and complexity for
systems that possibly.

So disable dma mapping support for such systems, if drivers
really want to work on such systems, they have to implement
their own DMA-mapping fallback tracking outside page_pool.
Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Reviewed-by: NYongxin Li <liyongxin1@huawei.com>
Signed-off-by: NJunxin Chen <chenjunxin1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

bf9e5e0f

page_pool: use relaxed atomic for release side accounting · 2f687b4c

由 Yunsheng Lin 提交于 11月 11, 2021

mainline inclusion
from mainline-v5.15-rc1
commit 7fb9b66d
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4CVS3
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7fb9b66dc9ce

----------------------------------------------------------------------

There is no need to synchronize the account updating, so
use the relaxed atomic to avoid some memory barrier in the
data path.
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Reviewed-by: NYongxin Li <liyongxin1@huawei.com>
Signed-off-by: NJunxin Chen <chenjunxin1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2f687b4c

net: hns3: add option to turn off page pool feature · a385b1bd

由 Yunsheng Lin 提交于 11月 11, 2021

mainline inclusion
from mainline-v5.15-rc2
commit f7ec554b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4CVS3
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f7ec554b73c5

----------------------------------------------------------------------

When page pool is added to the hns3 driver, it is always
enabled unconditionally, which means spilt page handling
in the hns3 driver is dead code.

As there is a requirement to test the performance between
spilt page handling in driver and page pool, so add a module
param to support disabling the page pool.

When the page pool is proved to perform better in most case,
the spilt page handling in driver can be removed.

Fixes: 93188e96 ("net: hns3: support skb's frag page recycling based on page pool")
Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: NGuangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Reviewed-by: NYongxin Li <liyongxin1@huawei.com>
Signed-off-by: NJunxin Chen <chenjunxin1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a385b1bd

net: hns3: support skb's frag page recycling based on page pool · 47ffdc55

由 Yunsheng Lin 提交于 11月 11, 2021

mainline inclusion
from mainline-v5.15-rc1
commit 93188e96
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4CVS3
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=93188e9642c3

----------------------------------------------------------------------

This patch adds skb's frag page recycling support based on
the frag page support in page pool.

The performance improves above 10~20% for single thread iperf
TCP flow with IOMMU disabled when iperf server and irq/NAPI
have a different CPU.

The performance improves about 135%(14Gbit to 33Gbit) for single
thread iperf TCP flow when IOMMU is in strict mode and iperf
server shares the same cpu with irq/NAPI.
Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Reviewed-by: NYongxin Li <liyongxin1@huawei.com>
Signed-off-by: NJunxin Chen <chenjunxin1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

47ffdc55

page_pool: add frag page recycling support in page pool · 26bd5962

由 Yunsheng Lin 提交于 11月 11, 2021

mainline inclusion
from mainline-v5.15-rc1
commit 53e0961d
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4CVS3
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=53e0961da1c7

----------------------------------------------------------------------

Currently page pool only support page recycling when there
is only one user of the page, and the split page reusing
implemented in the most driver can not use the page pool as
bing-pong way of reusing requires the multi user support in
page pool.

Those reusing or recycling has below limitations:
1. page from page pool can only be used be one user in order
   for the page recycling to happen.
2. Bing-pong way of reusing in most driver does not support
   multi desc using different part of the same page in order
   to save memory.

So add multi-users support and frag page recycling in page
pool to overcome the above limitation.
Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Reviewed-by: NYongxin Li <liyongxin1@huawei.com>
Signed-off-by: NJunxin Chen <chenjunxin1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

26bd5962

page_pool: add interface to manipulate frag count in page pool · 97e1a546

由 Yunsheng Lin 提交于 11月 11, 2021

mainline inclusion
from mainline-v5.15-rc1
commit 0e9d2a0a
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4CVS3
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0e9d2a0a3a83

----------------------------------------------------------------------

For 32 bit systems with 64 bit dma, dma_addr[1] is used to
store the upper 32 bit dma addr, those system should be rare
those days.

For normal system, the dma_addr[1] in 'struct page' is not
used, so we can reuse dma_addr[1] for storing frag count,
which means how many frags this page might be splited to.

In order to simplify the page frag support in the page pool,
the PAGE_POOL_DMA_USE_PP_FRAG_COUNT macro is added to indicate
the 32 bit systems with 64 bit dma, and the page frag support
in page pool is disabled for such system.

The newly added page_pool_set_frag_count() is called to reserve
the maximum frag count before any page frag is passed to the
user. The page_pool_atomic_sub_frag_count_return() is called
when user is done with the page frag.
Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Reviewed-by: NYongxin Li <liyongxin1@huawei.com>
Signed-off-by: NJunxin Chen <chenjunxin1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

97e1a546

page_pool: keep pp info as long as page pool owns the page · 156a66de

由 Yunsheng Lin 提交于 11月 11, 2021

mainline inclusion
from mainline-v5.15-rc1
commit 57f05bc2
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4CVS3
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=57f05bc2ab24

----------------------------------------------------------------------

Currently, page->pp is cleared and set everytime the page
is recycled, which is unnecessary.

So only set the page->pp when the page is added to the page
pool and only clear it when the page is released from the
page pool.

This is also a preparation to support allocating frag page
in page pool.
Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Reviewed-by: NYongxin Li <liyongxin1@huawei.com>
Signed-off-by: NJunxin Chen <chenjunxin1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

156a66de

page_pool: mask the page->signature before the checking · 64e670dd

由 Yunsheng Lin 提交于 11月 11, 2021

mainline inclusion
from mainline-v5.14-rc6
commit 0fa32ca4
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4CVS3
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0fa32ca438b4

----------------------------------------------------------------------

As mentioned in commit c07aea3e ("mm: add a signature in
struct page"):
"The page->signature field is aliased to page->lru.next and
page->compound_head."

And as the comment in page_is_pfmemalloc():
"lru.next has bit 1 set if the page is allocated from the
pfmemalloc reserves. Callers may simply overwrite it if they
do not need to preserve that information."

The page->signature is OR’ed with PP_SIGNATURE when a page is
allocated in page pool, see __page_pool_alloc_pages_slow(),
and page->signature is checked directly with PP_SIGNATURE in
page_pool_return_skb_page(), which might cause resoure leaking
problem for a page from page pool if bit 1 of lru.next is set
for a pfmemalloc page. What happens here is that the original
pp->signature is OR'ed with PP_SIGNATURE after the allocation
in order to preserve any existing bits(such as the bit 1, used
to indicate a pfmemalloc page), so when those bits are present,
those page is not considered to be from page pool and the DMA
mapping of those pages will be left stale.

As bit 0 is for page->compound_head, So mask both bit 0/1 before
the checking in page_pool_return_skb_page(). And we will return
those pfmemalloc pages back to the page allocator after cleaning
up the DMA mapping.

Fixes: 6a5bcd84 ("page_pool: Allow drivers to hint on SKB recycling")
Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Reviewed-by: NYongxin Li <liyongxin1@huawei.com>
Signed-off-by: NJunxin Chen <chenjunxin1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

64e670dd

skbuff: Fix a potential race while recycling page_pool packets · b0d0c96d

由 Ilias Apalodimas 提交于 11月 11, 2021

mainline inclusion
from mainline-v5.14-rc3
commit 2cc3aeb5
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4CVS3
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2cc3aeb5eccc

----------------------------------------------------------------------

As Alexander points out, when we are trying to recycle a cloned/expanded
SKB we might trigger a race.  The recycling code relies on the
pp_recycle bit to trigger,  which we carry over to cloned SKBs.
If that cloned SKB gets expanded or if we get references to the frags,
call skb_release_data() and overwrite skb->head, we are creating separate
instances accessing the same page frags.  Since the skb_release_data()
will first try to recycle the frags,  there's a potential race between
the original and cloned SKB, since both will have the pp_recycle bit set.

Fix this by explicitly those SKBs not recyclable.
The atomic_sub_return effectively limits us to a single release case,
and when we are calling skb_release_data we are also releasing the
option to perform the recycling, or releasing the pages from the page pool.

Fixes: 6a5bcd84 ("page_pool: Allow drivers to hint on SKB recycling")
Reported-by: NAlexander Duyck <alexanderduyck@fb.com>
Suggested-by: NAlexander Duyck <alexanderduyck@fb.com>
Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Reviewed-by: NYongxin Li <liyongxin1@huawei.com>
Signed-off-by: NJunxin Chen <chenjunxin1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b0d0c96d

net: ti: add pp skb recycling support · 1318c08a

由 Lorenzo Bianconi 提交于 11月 11, 2021

mainline inclusion
from mainline-v5.14-rc1
commit a078d981
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4CVS3
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a078d981f863

----------------------------------------------------------------------

As already done for mvneta and mvpp2, enable skb recycling for ti
ethernet drivers

ti driver on net-next:
----------------------
[perf top]
 47.15%  [kernel]     [k] _raw_spin_unlock_irqrestore
 11.77%  [kernel]     [k] __cpdma_chan_free
  3.16%  [kernel]     [k] ___bpf_prog_run
  2.52%  [kernel]     [k] cpsw_rx_vlan_encap
  2.34%  [kernel]     [k] __netif_receive_skb_core
  2.27%  [kernel]     [k] free_unref_page
  2.26%  [kernel]     [k] kmem_cache_free
  2.24%  [kernel]     [k] kmem_cache_alloc
  1.69%  [kernel]     [k] __softirqentry_text_start
  1.61%  [kernel]     [k] cpsw_rx_handler
  1.19%  [kernel]     [k] page_pool_release_page
  1.19%  [kernel]     [k] clear_bits_ll
  1.15%  [kernel]     [k] page_frag_free
  1.06%  [kernel]     [k] __dma_page_dev_to_cpu
  0.99%  [kernel]     [k] memset
  0.94%  [kernel]     [k] __alloc_pages_bulk
  0.92%  [kernel]     [k] kfree_skb
  0.85%  [kernel]     [k] packet_rcv
  0.78%  [kernel]     [k] page_address
  0.75%  [kernel]     [k] v7_dma_inv_range
  0.71%  [kernel]     [k] __lock_text_start

[iperf3 tcp]
[  5]   0.00-10.00  sec   873 MBytes   732 Mbits/sec    0   sender
[  5]   0.00-10.01  sec   866 MBytes   726 Mbits/sec        receiver

ti + skb recycling:
-------------------
[perf top]
 40.58%  [kernel]    [k] _raw_spin_unlock_irqrestore
 16.18%  [kernel]    [k] __softirqentry_text_start
 10.33%  [kernel]    [k] __cpdma_chan_free
  2.62%  [kernel]    [k] ___bpf_prog_run
  2.05%  [kernel]    [k] cpsw_rx_vlan_encap
  2.00%  [kernel]    [k] kmem_cache_alloc
  1.86%  [kernel]    [k] __netif_receive_skb_core
  1.80%  [kernel]    [k] kmem_cache_free
  1.63%  [kernel]    [k] cpsw_rx_handler
  1.12%  [kernel]    [k] cpsw_rx_mq_poll
  1.11%  [kernel]    [k] page_pool_put_page
  1.04%  [kernel]    [k] _raw_spin_unlock
  0.97%  [kernel]    [k] clear_bits_ll
  0.90%  [kernel]    [k] packet_rcv
  0.88%  [kernel]    [k] __dma_page_dev_to_cpu
  0.85%  [kernel]    [k] kfree_skb
  0.80%  [kernel]    [k] memset
  0.71%  [kernel]    [k] __lock_text_start
  0.66%  [kernel]    [k] v7_dma_inv_range
  0.64%  [kernel]    [k] gen_pool_free_owner

[iperf3 tcp]
[  5]   0.00-10.00  sec   884 MBytes   742 Mbits/sec    0   sender
[  5]   0.00-10.01  sec   878 MBytes   735 Mbits/sec        receiver
Tested-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Reviewed-by: NYongxin Li <liyongxin1@huawei.com>
Signed-off-by: NJunxin Chen <chenjunxin1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1318c08a

mvpp2: prefetch page · 583dda2d

由 Matteo Croce 提交于 11月 11, 2021

mainline inclusion
from mainline-v5.14-rc1
commit 2f128eb3
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4CVS3
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2f128eb3308a

----------------------------------------------------------------------

Most of the time during the RX is caused by the compound_head() call
done at the end of the RX loop:

       │     build_skb():
       [...]
       │     static inline struct page *compound_head(struct page *page)
       │     {
       │     unsigned long head = READ_ONCE(page->compound_head);
 65.23 │       ldr  x2, [x1, #8]

Prefetch the page struct as soon as possible, to speedup the RX path
noticeabily by a ~3-4% packet rate in a drop test.

       │     build_skb():
       [...]
       │     static inline struct page *compound_head(struct page *page)
       │     {
       │     unsigned long head = READ_ONCE(page->compound_head);
 17.92 │       ldr  x2, [x1, #8]
Signed-off-by: NMatteo Croce <mcroce@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Reviewed-by: NYongxin Li <liyongxin1@huawei.com>
Signed-off-by: NJunxin Chen <chenjunxin1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

583dda2d

mvpp2: prefetch right address · 78343012

由 Matteo Croce 提交于 11月 11, 2021

mainline inclusion
from mainline-v5.14-rc1
commit d8ea89fe
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4CVS3
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d8ea89fe8a49

----------------------------------------------------------------------

In the RX buffer, the received data starts after a headroom used to
align the IP header and to allow prepending headers efficiently.
The prefetch() should take this into account, and prefetch from
the very start of the received data.

We can see that ether_addr_equal_64bits(), which is the first function
to access the data, drops from the top of the perf top output.

prefetch(data):

Overhead  Shared Object     Symbol
  11.64%  [kernel]          [k] eth_type_trans

prefetch(data + MVPP2_MH_SIZE + MVPP2_SKB_HEADROOM):

Overhead  Shared Object     Symbol
  13.42%  [kernel]          [k] build_skb
  10.35%  [mvpp2]           [k] mvpp2_rx
   9.35%  [kernel]          [k] __netif_receive_skb_core
   8.24%  [kernel]          [k] kmem_cache_free
   7.97%  [kernel]          [k] dev_gro_receive
   7.68%  [kernel]          [k] page_pool_put_page
   7.32%  [kernel]          [k] kmem_cache_alloc
   7.09%  [mvpp2]           [k] mvpp2_bm_pool_put
   3.36%  [kernel]          [k] eth_type_trans

Also, move the eth_type_trans() call a bit down, to give the RAM more
time to prefetch the data.
Signed-off-by: NMatteo Croce <mcroce@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Reviewed-by: NYongxin Li <liyongxin1@huawei.com>
Signed-off-by: NJunxin Chen <chenjunxin1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

78343012

mvneta: recycle buffers · 9eb94bb2

由 Matteo Croce 提交于 11月 11, 2021

mainline inclusion
from mainline-v5.14-rc1
commit e4017570
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4CVS3
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e4017570daee

----------------------------------------------------------------------

Use the new recycling API for page_pool.
In a drop rate test, the packet rate increased by 10%,
from 296 Kpps to 326 Kpps.

perf top on a stock system shows:

Overhead  Shared Object     Symbol
  23.66%  [kernel]          [k] __pi___inval_dcache_area
  22.85%  [mvneta]          [k] mvneta_rx_swbm
   7.54%  [kernel]          [k] kmem_cache_alloc
   6.49%  [kernel]          [k] eth_type_trans
   3.94%  [kernel]          [k] dev_gro_receive
   3.91%  [kernel]          [k] __netif_receive_skb_core
   3.91%  [kernel]          [k] kmem_cache_free
   3.76%  [kernel]          [k] page_pool_release_page
   3.56%  [kernel]          [k] free_unref_page
   2.40%  [kernel]          [k] build_skb
   1.49%  [kernel]          [k] skb_release_data
   1.45%  [kernel]          [k] __alloc_pages_bulk
   1.30%  [kernel]          [k] page_frag_free

And this is the same output with recycling enabled:

Overhead  Shared Object     Symbol
  26.41%  [kernel]          [k] __pi___inval_dcache_area
  25.00%  [mvneta]          [k] mvneta_rx_swbm
   8.14%  [kernel]          [k] kmem_cache_alloc
   6.84%  [kernel]          [k] eth_type_trans
   4.44%  [kernel]          [k] __netif_receive_skb_core
   4.38%  [kernel]          [k] kmem_cache_free
   4.16%  [kernel]          [k] dev_gro_receive
   3.21%  [kernel]          [k] page_pool_put_page
   2.41%  [kernel]          [k] build_skb
   1.82%  [kernel]          [k] skb_release_data
   1.61%  [kernel]          [k] napi_gro_receive
   1.25%  [kernel]          [k] page_pool_refill_alloc_cache
   1.16%  [kernel]          [k] __netif_receive_skb_list_core

We can see that page_pool_release_page(), free_unref_page() and
__alloc_pages_bulk() are no longer on top of the list when receiving
traffic.

The test was done with mausezahn on the TX side with 64 byte raw
ethernet frames.
Signed-off-by: NMatteo Croce <mcroce@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Reviewed-by: NYongxin Li <liyongxin1@huawei.com>
Signed-off-by: NJunxin Chen <chenjunxin1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9eb94bb2

mvpp2: recycle buffers · 7d5df3e5

由 Matteo Croce 提交于 11月 11, 2021

mainline inclusion
from mainline-v5.14-rc1
commit 133637fc
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4CVS3
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=133637fcfab2

----------------------------------------------------------------------

Use the new recycling API for page_pool.
In a drop rate test, the packet rate is almost doubled,
from 1110 Kpps to 2128 Kpps.

perf top on a stock system shows:

Overhead  Shared Object     Symbol
  34.88%  [kernel]          [k] page_pool_release_page
   8.06%  [kernel]          [k] free_unref_page
   6.42%  [mvpp2]           [k] mvpp2_rx
   6.07%  [kernel]          [k] eth_type_trans
   5.18%  [kernel]          [k] __netif_receive_skb_core
   4.95%  [kernel]          [k] build_skb
   4.88%  [kernel]          [k] kmem_cache_free
   3.97%  [kernel]          [k] kmem_cache_alloc
   3.45%  [kernel]          [k] dev_gro_receive
   2.73%  [kernel]          [k] page_frag_free
   2.07%  [kernel]          [k] __alloc_pages_bulk
   1.99%  [kernel]          [k] arch_local_irq_save
   1.84%  [kernel]          [k] skb_release_data
   1.20%  [kernel]          [k] netif_receive_skb_list_internal

With packet rate stable at 1100 Kpps:

tx: 0 bps 0 pps rx: 532.7 Mbps 1110 Kpps
tx: 0 bps 0 pps rx: 532.6 Mbps 1110 Kpps
tx: 0 bps 0 pps rx: 532.4 Mbps 1109 Kpps
tx: 0 bps 0 pps rx: 532.1 Mbps 1109 Kpps
tx: 0 bps 0 pps rx: 531.9 Mbps 1108 Kpps
tx: 0 bps 0 pps rx: 531.9 Mbps 1108 Kpps

And this is the same output with recycling enabled:

Overhead  Shared Object     Symbol
  12.91%  [kernel]          [k] eth_type_trans
  12.54%  [mvpp2]           [k] mvpp2_rx
   9.67%  [kernel]          [k] build_skb
   9.63%  [kernel]          [k] __netif_receive_skb_core
   8.44%  [kernel]          [k] page_pool_put_page
   8.07%  [kernel]          [k] kmem_cache_free
   7.79%  [kernel]          [k] kmem_cache_alloc
   6.86%  [kernel]          [k] dev_gro_receive
   3.19%  [kernel]          [k] skb_release_data
   2.41%  [kernel]          [k] netif_receive_skb_list_internal
   2.18%  [kernel]          [k] page_pool_refill_alloc_cache
   1.76%  [kernel]          [k] napi_gro_receive
   1.61%  [kernel]          [k] kfree_skb
   1.20%  [kernel]          [k] dma_sync_single_for_device
   1.16%  [mvpp2]           [k] mvpp2_poll
   1.12%  [mvpp2]           [k] mvpp2_read

With packet rate above 2100 Kpps:

tx: 0 bps 0 pps rx: 1021 Mbps 2128 Kpps
tx: 0 bps 0 pps rx: 1021 Mbps 2127 Kpps
tx: 0 bps 0 pps rx: 1021 Mbps 2128 Kpps
tx: 0 bps 0 pps rx: 1021 Mbps 2128 Kpps
tx: 0 bps 0 pps rx: 1022 Mbps 2128 Kpps
tx: 0 bps 0 pps rx: 1022 Mbps 2129 Kpps

The major performance increase is explained by the fact that the most CPU
consuming functions (page_pool_release_page, page_frag_free and
free_unref_page) are no longer called on a per packet basis.

The test was done by sending to the macchiatobin 64 byte ethernet frames
with an invalid ethertype, so the packets are dropped early in the RX path.
Signed-off-by: NMatteo Croce <mcroce@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Reviewed-by: NYongxin Li <liyongxin1@huawei.com>
Signed-off-by: NJunxin Chen <chenjunxin1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

7d5df3e5

page_pool: Allow drivers to hint on SKB recycling · fe4cd4ee

由 Ilias Apalodimas 提交于 11月 11, 2021

mainline inclusion
from mainline-v5.14-rc1
commit 6a5bcd84
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4CVS3
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6a5bcd84e886

----------------------------------------------------------------------

Up to now several high speed NICs have custom mechanisms of recycling
the allocated memory they use for their payloads.
Our page_pool API already has recycling capabilities that are always
used when we are running in 'XDP mode'. So let's tweak the API and the
kernel network stack slightly and allow the recycling to happen even
during the standard operation.
The API doesn't take into account 'split page' policies used by those
drivers currently, but can be extended once we have users for that.

The idea is to be able to intercept the packet on skb_release_data().
If it's a buffer coming from our page_pool API recycle it back to the
pool for further usage or just release the packet entirely.

To achieve that we introduce a bit in struct sk_buff (pp_recycle:1) and
a field in struct page (page->pp) to store the page_pool pointer.
Storing the information in page->pp allows us to recycle both SKBs and
their fragments.
We could have skipped the skb bit entirely, since identical information
can bederived from struct page. However, in an effort to affect the free path
as less as possible, reading a single bit in the skb which is already
in cache, is better that trying to derive identical information for the
page stored data.

The driver or page_pool has to take care of the sync operations on it's own
during the buffer recycling since the buffer is, after opting-in to the
recycling, never unmapped.

Since the gain on the drivers depends on the architecture, we are not
enabling recycling by default if the page_pool API is used on a driver.
In order to enable recycling the driver must call skb_mark_for_recycle()
to store the information we need for recycling in page->pp and
enabling the recycling bit, or page_pool_store_mem_info() for a fragment.
Co-developed-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Co-developed-by: NMatteo Croce <mcroce@microsoft.com>
Signed-off-by: NMatteo Croce <mcroce@microsoft.com>
Signed-off-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Reviewed-by: NYongxin Li <liyongxin1@huawei.com>
Signed-off-by: NJunxin Chen <chenjunxin1@huawei.com>
(fix conflicts: replace skb_get_kcov_handle by skb_csum_is_sctp)
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

fe4cd4ee

skbuff: add a parameter to __skb_frag_unref · 2556a6f0

由 Matteo Croce 提交于 11月 11, 2021

mainline inclusion
from mainline-v5.14-rc1
commit c420c989
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4CVS3
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c420c98982fa

----------------------------------------------------------------------

This is a prerequisite patch, the next one is enabling recycling of
skbs and fragments. Add an extra argument on __skb_frag_unref() to
handle recycling, and update the current users of the function with that.
Signed-off-by: NMatteo Croce <mcroce@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Reviewed-by: NYongxin Li <liyongxin1@huawei.com>
Signed-off-by: NJunxin Chen <chenjunxin1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2556a6f0

mm: add a signature in struct page · 66151639

由 Matteo Croce 提交于 11月 11, 2021

mainline inclusion
from mainline-v5.14-rc1
commit c07aea3e
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4CVS3
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c07aea3ef4d4

----------------------------------------------------------------------

This is needed by the page_pool to avoid recycling a page not allocated
via page_pool.

The page->signature field is aliased to page->lru.next and
page->compound_head, but it can't be set by mistake because the
signature value is a bad pointer, and can't trigger a false positive
in PageTail() because the last bit is 0.
Co-developed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NMatteo Croce <mcroce@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Reviewed-by: NYongxin Li <liyongxin1@huawei.com>
Signed-off-by: NJunxin Chen <chenjunxin1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

66151639

net: page_pool: simplify page recycling condition tests · 078c748c

由 Alexander Lobakin 提交于 11月 11, 2021

mainline inclusion
from mainline-v5.12-rc1-dontuse
commit 05656132
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4CVS3
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=05656132a874

----------------------------------------------------------------------

pool_page_reusable() is a leftover from pre-NUMA-aware times. For now,
this function is just a redundant wrapper over page_is_pfmemalloc(),
so inline it into its sole call site.
Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Reviewed-by: NYongxin Li <liyongxin1@huawei.com>
Signed-off-by: NJunxin Chen <chenjunxin1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

078c748c

skbuff: Call skb_zcopy_clear() before unref'ing fragments · f91d76c8

由 Jonathan Lemon 提交于 11月 11, 2021

mainline inclusion
from mainline-v5.12-rc1-dontuse
commit 70c43167
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4CVS3
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=70c4316749f6

----------------------------------------------------------------------

RX zerocopy fragment pages which are not allocated from the
system page pool require special handling.  Give the callback
in skb_zcopy_clear() a chance to process them first.
Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Reviewed-by: NYongxin Li <liyongxin1@huawei.com>
Signed-off-by: NJunxin Chen <chenjunxin1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f91d76c8

net: page_pool: Add bulk support for ptr_ring · 9b9ef692

由 Lorenzo Bianconi 提交于 11月 11, 2021

mainline inclusion
from mainline-v5.11-rc1
commit 78862447
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4CVS3
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7886244736a4

----------------------------------------------------------------------

Introduce the capability to batch page_pool ptr_ring refill since it is
usually run inside the driver NAPI tx completion loop.
Suggested-by: NJesper Dangaard Brouer <brouer@redhat.com>
Co-developed-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Link: https://lore.kernel.org/bpf/08dd249c9522c001313f520796faa777c4089e1c.1605267335.git.lorenzo@kernel.orgReviewed-by: NYongxin Li <liyongxin1@huawei.com>
Signed-off-by: NJunxin Chen <chenjunxin1@huawei.com>
(fix conflicts: delete modifications to net/core/xdp.c)
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9b9ef692

MAINTAINERS: update for DAMON · 125e8f0b