1. 04 7月, 2022 1 次提交
  2. 13 5月, 2022 1 次提交
  3. 15 4月, 2022 1 次提交
  4. 12 4月, 2022 1 次提交
  5. 03 3月, 2022 3 次提交
  6. 03 2月, 2022 1 次提交
    • A
      page_pool: Refactor page_pool to enable fragmenting after allocation · 52cc6ffc
      Alexander Duyck 提交于
      This change is meant to permit a driver to perform "fragmenting" of the
      page from within the driver instead of the current model which requires
      pre-partitioning the page. The main motivation behind this is to support
      use cases where the page will be split up by the driver after DMA instead
      of before.
      
      With this change it becomes possible to start using page pool to replace
      some of the existing use cases where multiple references were being used
      for a single page, but the number needed was unknown as the size could be
      dynamic.
      
      For example, with this code it would be possible to do something like
      the following to handle allocation:
        page = page_pool_alloc_pages();
        if (!page)
          return NULL;
        page_pool_fragment_page(page, DRIVER_PAGECNT_BIAS_MAX);
        rx_buf->page = page;
        rx_buf->pagecnt_bias = DRIVER_PAGECNT_BIAS_MAX;
      
      Then we would process a received buffer by handling it with:
        rx_buf->pagecnt_bias--;
      
      Once the page has been fully consumed we could then flush the remaining
      instances with:
        if (page_pool_defrag_page(page, rx_buf->pagecnt_bias))
          continue;
        page_pool_put_defragged_page(pool, page -1, !!budget);
      
      The general idea is that we want to have the ability to allocate a page
      with excess fragment count and then trim off the unneeded fragments.
      Signed-off-by: NAlexander Duyck <alexanderduyck@fb.com>
      Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52cc6ffc
  7. 10 1月, 2022 1 次提交
  8. 06 1月, 2022 2 次提交
  9. 18 11月, 2021 1 次提交
  10. 15 10月, 2021 1 次提交
  11. 24 8月, 2021 1 次提交
  12. 10 8月, 2021 3 次提交
    • Y
      page_pool: add frag page recycling support in page pool · 53e0961d
      Yunsheng Lin 提交于
      Currently page pool only support page recycling when there
      is only one user of the page, and the split page reusing
      implemented in the most driver can not use the page pool as
      bing-pong way of reusing requires the multi user support in
      page pool.
      
      Those reusing or recycling has below limitations:
      1. page from page pool can only be used be one user in order
         for the page recycling to happen.
      2. Bing-pong way of reusing in most driver does not support
         multi desc using different part of the same page in order
         to save memory.
      
      So add multi-users support and frag page recycling in page
      pool to overcome the above limitation.
      Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      53e0961d
    • Y
      page_pool: add interface to manipulate frag count in page pool · 0e9d2a0a
      Yunsheng Lin 提交于
      For 32 bit systems with 64 bit dma, dma_addr[1] is used to
      store the upper 32 bit dma addr, those system should be rare
      those days.
      
      For normal system, the dma_addr[1] in 'struct page' is not
      used, so we can reuse dma_addr[1] for storing frag count,
      which means how many frags this page might be splited to.
      
      In order to simplify the page frag support in the page pool,
      the PAGE_POOL_DMA_USE_PP_FRAG_COUNT macro is added to indicate
      the 32 bit systems with 64 bit dma, and the page frag support
      in page pool is disabled for such system.
      
      The newly added page_pool_set_frag_count() is called to reserve
      the maximum frag count before any page frag is passed to the
      user. The page_pool_atomic_sub_frag_count_return() is called
      when user is done with the page frag.
      Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      0e9d2a0a
    • Y
      page_pool: keep pp info as long as page pool owns the page · 57f05bc2
      Yunsheng Lin 提交于
      Currently, page->pp is cleared and set everytime the page
      is recycled, which is unnecessary.
      
      So only set the page->pp when the page is added to the page
      pool and only clear it when the page is released from the
      page pool.
      
      This is also a preparation to support allocating frag page
      in page pool.
      Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      57f05bc2
  13. 09 8月, 2021 1 次提交
    • Y
      page_pool: mask the page->signature before the checking · 0fa32ca4
      Yunsheng Lin 提交于
      As mentioned in commit c07aea3e ("mm: add a signature in
      struct page"):
      "The page->signature field is aliased to page->lru.next and
      page->compound_head."
      
      And as the comment in page_is_pfmemalloc():
      "lru.next has bit 1 set if the page is allocated from the
      pfmemalloc reserves. Callers may simply overwrite it if they
      do not need to preserve that information."
      
      The page->signature is OR’ed with PP_SIGNATURE when a page is
      allocated in page pool, see __page_pool_alloc_pages_slow(),
      and page->signature is checked directly with PP_SIGNATURE in
      page_pool_return_skb_page(), which might cause resoure leaking
      problem for a page from page pool if bit 1 of lru.next is set
      for a pfmemalloc page. What happens here is that the original
      pp->signature is OR'ed with PP_SIGNATURE after the allocation
      in order to preserve any existing bits(such as the bit 1, used
      to indicate a pfmemalloc page), so when those bits are present,
      those page is not considered to be from page pool and the DMA
      mapping of those pages will be left stale.
      
      As bit 0 is for page->compound_head, So mask both bit 0/1 before
      the checking in page_pool_return_skb_page(). And we will return
      those pfmemalloc pages back to the page allocator after cleaning
      up the DMA mapping.
      
      Fixes: 6a5bcd84 ("page_pool: Allow drivers to hint on SKB recycling")
      Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0fa32ca4
  14. 08 6月, 2021 2 次提交
    • I
      page_pool: Allow drivers to hint on SKB recycling · 6a5bcd84
      Ilias Apalodimas 提交于
      Up to now several high speed NICs have custom mechanisms of recycling
      the allocated memory they use for their payloads.
      Our page_pool API already has recycling capabilities that are always
      used when we are running in 'XDP mode'. So let's tweak the API and the
      kernel network stack slightly and allow the recycling to happen even
      during the standard operation.
      The API doesn't take into account 'split page' policies used by those
      drivers currently, but can be extended once we have users for that.
      
      The idea is to be able to intercept the packet on skb_release_data().
      If it's a buffer coming from our page_pool API recycle it back to the
      pool for further usage or just release the packet entirely.
      
      To achieve that we introduce a bit in struct sk_buff (pp_recycle:1) and
      a field in struct page (page->pp) to store the page_pool pointer.
      Storing the information in page->pp allows us to recycle both SKBs and
      their fragments.
      We could have skipped the skb bit entirely, since identical information
      can bederived from struct page. However, in an effort to affect the free path
      as less as possible, reading a single bit in the skb which is already
      in cache, is better that trying to derive identical information for the
      page stored data.
      
      The driver or page_pool has to take care of the sync operations on it's own
      during the buffer recycling since the buffer is, after opting-in to the
      recycling, never unmapped.
      
      Since the gain on the drivers depends on the architecture, we are not
      enabling recycling by default if the page_pool API is used on a driver.
      In order to enable recycling the driver must call skb_mark_for_recycle()
      to store the information we need for recycling in page->pp and
      enabling the recycling bit, or page_pool_store_mem_info() for a fragment.
      Co-developed-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Co-developed-by: NMatteo Croce <mcroce@microsoft.com>
      Signed-off-by: NMatteo Croce <mcroce@microsoft.com>
      Signed-off-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a5bcd84
    • M
      mm: add a signature in struct page · c07aea3e
      Matteo Croce 提交于
      This is needed by the page_pool to avoid recycling a page not allocated
      via page_pool.
      
      The page->signature field is aliased to page->lru.next and
      page->compound_head, but it can't be set by mistake because the
      signature value is a bad pointer, and can't trigger a false positive
      in PageTail() because the last bit is 0.
      Co-developed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NMatteo Croce <mcroce@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c07aea3e
  15. 15 5月, 2021 1 次提交
  16. 01 5月, 2021 2 次提交
  17. 05 2月, 2021 1 次提交
  18. 14 11月, 2020 1 次提交
  19. 30 3月, 2020 1 次提交
  20. 21 2月, 2020 1 次提交
  21. 14 2月, 2020 1 次提交
  22. 03 1月, 2020 2 次提交
    • J
      page_pool: help compiler remove code in case CONFIG_NUMA=n · f13fc107
      Jesper Dangaard Brouer 提交于
      When kernel is compiled without NUMA support, then page_pool NUMA
      config setting (pool->p.nid) doesn't make any practical sense. The
      compiler cannot see that it can remove the code paths.
      
      This patch avoids reading pool->p.nid setting in case of !CONFIG_NUMA,
      in allocation and numa check code, which helps compiler to see the
      optimisation potential. It leaves update code intact to keep API the
      same.
      
       $ ./scripts/bloat-o-meter net/core/page_pool.o-numa-enabled \
                                 net/core/page_pool.o-numa-disabled
       add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-113 (-113)
       Function                                     old     new   delta
       page_pool_create                             401     398      -3
       __page_pool_alloc_pages_slow                 439     426     -13
       page_pool_refill_alloc_cache                 425     328     -97
       Total: Before=3611, After=3498, chg -3.13%
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f13fc107
    • J
      page_pool: handle page recycle for NUMA_NO_NODE condition · 44768dec
      Jesper Dangaard Brouer 提交于
      The check in pool_page_reusable (page_to_nid(page) == pool->p.nid) is
      not valid if page_pool was configured with pool->p.nid = NUMA_NO_NODE.
      
      The goal of the NUMA changes in commit d5394610 ("page_pool: Don't
      recycle non-reusable pages"), were to have RX-pages that belongs to the
      same NUMA node as the CPU processing RX-packet during softirq/NAPI. As
      illustrated by the performance measurements.
      
      This patch moves the NAPI checks out of fast-path, and at the same time
      solves the NUMA_NO_NODE issue.
      
      First realize that alloc_pages_node() with pool->p.nid = NUMA_NO_NODE
      will lookup current CPU nid (Numa ID) via numa_mem_id(), which is used
      as the the preferred nid.  It is only in rare situations, where
      e.g. NUMA zone runs dry, that page gets doesn't get allocated from
      preferred nid.  The page_pool API allows drivers to control the nid
      themselves via controlling pool->p.nid.
      
      This patch moves the NAPI check to when alloc cache is refilled, via
      dequeuing/consuming pages from the ptr_ring. Thus, we can allow placing
      pages from remote NUMA into the ptr_ring, as the dequeue/consume step
      will check the NUMA node. All current drivers using page_pool will
      alloc/refill RX-ring from same CPU running softirq/NAPI process.
      
      Drivers that control the nid explicitly, also use page_pool_update_nid
      when changing nid runtime.  To speed up transision to new nid the alloc
      cache is now flushed on nid changes.  This force pages to come from
      ptr_ring, which does the appropate nid check.
      
      For the NUMA_NO_NODE case, when a NIC IRQ is moved to another NUMA
      node, we accept that transitioning the alloc cache doesn't happen
      immediately. The preferred nid change runtime via consulting
      numa_mem_id() based on the CPU processing RX-packets.
      
      Notice, to avoid stressing the page buddy allocator and avoid doing too
      much work under softirq with preempt disabled, the NUMA check at
      ptr_ring dequeue will break the refill cycle, when detecting a NUMA
      mismatch. This will cause a slower transition, but its done on purpose.
      
      Fixes: d5394610 ("page_pool: Don't recycle non-reusable pages")
      Reported-by: NLi RongQing <lirongqing@baidu.com>
      Reported-by: NYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44768dec
  23. 21 11月, 2019 3 次提交
    • L
      net: page_pool: add the possibility to sync DMA memory for device · e68bc756
      Lorenzo Bianconi 提交于
      Introduce the following parameters in order to add the possibility to sync
      DMA memory for device before putting allocated pages in the page_pool
      caches:
      - PP_FLAG_DMA_SYNC_DEV: if set in page_pool_params flags, all pages that
        the driver gets from page_pool will be DMA-synced-for-device according
        to the length provided by the device driver. Please note DMA-sync-for-CPU
        is still device driver responsibility
      - offset: DMA address offset where the DMA engine starts copying rx data
      - max_len: maximum DMA memory size page_pool is allowed to flush. This
        is currently used in __page_pool_alloc_pages_slow routine when pages
        are allocated from page allocator
      These parameters are supposed to be set by device drivers.
      
      This optimization reduces the length of the DMA-sync-for-device.
      The optimization is valid because pages are initially
      DMA-synced-for-device as defined via max_len. At RX time, the driver
      will perform a DMA-sync-for-CPU on the memory for the packet length.
      What is important is the memory occupied by packet payload, because
      this is the area CPU is allowed to read and modify. As we don't track
      cache-lines written into by the CPU, simply use the packet payload length
      as dma_sync_size at page_pool recycle time. This also take into account
      any tail-extend.
      Tested-by: NMatteo Croce <mcroce@redhat.com>
      Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e68bc756
    • S
      page_pool: Don't recycle non-reusable pages · d5394610
      Saeed Mahameed 提交于
      A page is NOT reusable when at least one of the following is true:
      1) allocated when system was under some pressure. (page_is_pfmemalloc)
      2) belongs to a different NUMA node than pool->p.nid.
      
      To update pool->p.nid users should call page_pool_update_nid().
      
      Holding on to such pages in the pool will hurt the consumer performance
      when the pool migrates to a different numa node.
      
      Performance testing:
      XDP drop/tx rate and TCP single/multi stream, on mlx5 driver
      while migrating rx ring irq from close to far numa:
      
      mlx5 internal page cache was locally disabled to get pure page pool
      results.
      
      CPU: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
      NIC: Mellanox Technologies MT27700 Family [ConnectX-4] (100G)
      
      XDP Drop/TX single core:
      NUMA  | XDP  | Before    | After
      ---------------------------------------
      Close | Drop | 11   Mpps | 10.9 Mpps
      Far   | Drop | 4.4  Mpps | 5.8  Mpps
      
      Close | TX   | 6.5 Mpps  | 6.5 Mpps
      Far   | TX   | 3.5 Mpps  | 4  Mpps
      
      Improvement is about 30% drop packet rate, 15% tx packet rate for numa
      far test.
      No degradation for numa close tests.
      
      TCP single/multi cpu/stream:
      NUMA  | #cpu | Before  | After
      --------------------------------------
      Close | 1    | 18 Gbps | 18 Gbps
      Far   | 1    | 15 Gbps | 18 Gbps
      Close | 12   | 80 Gbps | 80 Gbps
      Far   | 12   | 68 Gbps | 80 Gbps
      
      In all test cases we see improvement for the far numa case, and no
      impact on the close numa case.
      
      The impact of adding a check per page is very negligible, and shows no
      performance degradation whatsoever, also functionality wise it seems more
      correct and more robust for page pool to verify when pages should be
      recycled, since page pool can't guarantee where pages are coming from.
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5394610
    • S
      page_pool: Add API to update numa node · bc836748
      Saeed Mahameed 提交于
      Add page_pool_update_nid() to be called by page pool consumers when they
      detect numa node changes.
      
      It will update the page pool nid value to start allocating from the new
      effective numa node.
      
      This is to mitigate page pool allocating pages from a wrong numa node,
      where the pool was originally allocated, and holding on to pages that
      belong to a different numa node, which causes performance degradation.
      
      For pages that are already being consumed and could be returned to the
      pool by the consumer, in next patch we will add a check per page to avoid
      recycling them back to the pool and return them to the page allocator.
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc836748
  24. 19 11月, 2019 1 次提交
    • J
      page_pool: add destroy attempts counter and rename tracepoint · 7c9e6942
      Jesper Dangaard Brouer 提交于
      When Jonathan change the page_pool to become responsible to its
      own shutdown via deferred work queue, then the disconnect_cnt
      counter was removed from xdp memory model tracepoint.
      
      This patch change the page_pool_inflight tracepoint name to
      page_pool_release, because it reflects the new responsability
      better.  And it reintroduces a counter that reflect the number of
      times page_pool_release have been tried.
      
      The counter is also used by the code, to only empty the alloc
      cache once.  With a stuck work queue running every second and
      counter being 64-bit, it will overrun in approx 584 billion
      years. For comparison, Earth lifetime expectancy is 7.5 billion
      years, before the Sun will engulf, and destroy, the Earth.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c9e6942
  25. 17 11月, 2019 1 次提交
  26. 16 8月, 2019 2 次提交
  27. 09 7月, 2019 1 次提交
  28. 19 6月, 2019 2 次提交