1. 01 4月, 2021 1 次提交
    • O
      xdp: fix xdp_return_frame() kernel BUG throw for page_pool memory model · 622d1369
      Ong Boon Leong 提交于
      xdp_return_frame() may be called outside of NAPI context to return
      xdpf back to page_pool. xdp_return_frame() calls __xdp_return() with
      napi_direct = false. For page_pool memory model, __xdp_return() calls
      xdp_return_frame_no_direct() unconditionally and below false negative
      kernel BUG throw happened under preempt-rt build:
      
      [  430.450355] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/3884
      [  430.451678] caller is __xdp_return+0x1ff/0x2e0
      [  430.452111] CPU: 0 PID: 3884 Comm: modprobe Tainted: G     U      E     5.12.0-rc2+ #45
      
      Changes in v2:
       - This patch fixes the issue by making xdp_return_frame_no_direct() is
         only called if napi_direct = true, as recommended for better by
         Jesper Dangaard Brouer. Thanks!
      
      Fixes: 2539650f ("xdp: Helpers for disabling napi_direct of xdp_return_frame")
      Signed-off-by: NOng Boon Leong <boon.leong.ong@intel.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      622d1369
  2. 04 2月, 2021 1 次提交
  3. 21 1月, 2021 2 次提交
  4. 09 12月, 2020 1 次提交
    • T
      xdp: Remove the xdp_attachment_flags_ok() callback · 998f1729
      Toke Høiland-Jørgensen 提交于
      Since commit 7f0a8382 ("bpf, xdp: Maintain info on attached XDP BPF
      programs in net_device"), the XDP program attachment info is now maintained
      in the core code. This interacts badly with the xdp_attachment_flags_ok()
      check that prevents unloading an XDP program with different load flags than
      it was loaded with. In practice, two kinds of failures are seen:
      
      - An XDP program loaded without specifying a mode (and which then ends up
        in driver mode) cannot be unloaded if the program mode is specified on
        unload.
      
      - The dev_xdp_uninstall() hook always calls the driver callback with the
        mode set to the type of the program but an empty flags argument, which
        means the flags_ok() check prevents the program from being removed,
        leading to bpf prog reference leaks.
      
      The original reason this check was added was to avoid ambiguity when
      multiple programs were loaded. With the way the checks are done in the core
      now, this is quite simple to enforce in the core code, so let's add a check
      there and get rid of the xdp_attachment_flags_ok() callback entirely.
      
      Fixes: 7f0a8382 ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device")
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJakub Kicinski <kuba@kernel.org>
      Link: https://lore.kernel.org/bpf/160752225751.110217.10267659521308669050.stgit@toke.dk
      998f1729
  5. 01 12月, 2020 2 次提交
  6. 14 11月, 2020 2 次提交
  7. 26 7月, 2020 1 次提交
  8. 18 6月, 2020 1 次提交
  9. 22 5月, 2020 3 次提交
  10. 15 5月, 2020 1 次提交
    • J
      xdp: Xdp_frame add member frame_sz and handle in convert_to_xdp_frame · 34cc0b33
      Jesper Dangaard Brouer 提交于
      Use hole in struct xdp_frame, when adding member frame_sz, which keeps
      same sizeof struct (32 bytes)
      
      Drivers ixgbe and sfc had bug cases where the necessary/expected
      tailroom was not reserved. This can lead to some hard to catch memory
      corruption issues. Having the drivers frame_sz this can be detected when
      packet length/end via xdp->data_end exceed the xdp_data_hard_end
      pointer, which accounts for the reserved the tailroom.
      
      When detecting this driver issue, simply fail the conversion with NULL,
      which results in feedback to driver (failing xdp_do_redirect()) causing
      driver to drop packet. Given the lack of consistent XDP stats, this can
      be hard to troubleshoot. And given this is a driver bug, we want to
      generate some more noise in form of a WARN stack dump (to ID the driver
      code that inlined convert_to_xdp_frame).
      
      Inlining the WARN macro is problematic, because it adds an asm
      instruction (on Intel CPUs ud2) what influence instruction cache
      prefetching. Thus, introduce xdp_warn and macro XDP_WARN, to avoid this
      and at the same time make identifying the function and line of this
      inlined function easier.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/bpf/158945337313.97035.10015729316710496600.stgit@firesoul
      34cc0b33
  11. 21 2月, 2020 1 次提交
  12. 10 12月, 2019 1 次提交
  13. 05 12月, 2019 1 次提交
  14. 19 11月, 2019 1 次提交
    • J
      xdp: remove memory poison on free for struct xdp_mem_allocator · c491eae8
      Jesper Dangaard Brouer 提交于
      When looking at the details I realised that the memory poison in
      __xdp_mem_allocator_rcu_free doesn't make sense. This is because the
      SLUB allocator uses the first 16 bytes (on 64 bit), for its freelist,
      which overlap with members in struct xdp_mem_allocator, that were
      updated.  Thus, SLUB already does the "poisoning" for us.
      
      I still believe that poisoning memory make sense in other cases.
      Kernel have gained different use-after-free detection mechanism, but
      enabling those is associated with a huge overhead. Experience is that
      debugging facilities can change the timing so much, that that a race
      condition will not be provoked when enabled. Thus, I'm still in favour
      of poisoning memory where it makes sense.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c491eae8
  15. 17 11月, 2019 1 次提交
  16. 12 10月, 2019 1 次提交
  17. 09 7月, 2019 1 次提交
  18. 26 6月, 2019 1 次提交
  19. 19 6月, 2019 5 次提交
    • J
      xdp: add tracepoints for XDP mem · f033b688
      Jesper Dangaard Brouer 提交于
      These tracepoints make it easier to troubleshoot XDP mem id disconnect.
      
      The xdp:mem_disconnect tracepoint cannot be replaced via kprobe. It is
      placed at the last stable place for the pointer to struct xdp_mem_allocator,
      just before it's scheduled for RCU removal. It also extract info on
      'safe_to_remove' and 'force'.
      
      Detailed info about in-flight pages is not available at this layer. The next
      patch will added tracepoints needed at the page_pool layer for this.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f033b688
    • J
      xdp: force mem allocator removal and periodic warning · d956a048
      Jesper Dangaard Brouer 提交于
      If bugs exists or are introduced later e.g. by drivers misusing the API,
      then we want to warn about the issue, such that developer notice. This patch
      will generate a bit of noise in form of periodic pr_warn every 30 seconds.
      
      It is not nice to have this stall warning running forever. Thus, this patch
      will (after 120 attempts) force disconnect the mem id (from the rhashtable)
      and free the page_pool object. This will cause fallback to the put_page() as
      before, which only potentially leak DMA-mappings, if objects are really
      stuck for this long. In that unlikely case, a WARN_ONCE should show us the
      call stack.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d956a048
    • J
      xdp: tracking page_pool resources and safe removal · 99c07c43
      Jesper Dangaard Brouer 提交于
      This patch is needed before we can allow drivers to use page_pool for
      DMA-mappings. Today with page_pool and XDP return API, it is possible to
      remove the page_pool object (from rhashtable), while there are still
      in-flight packet-pages. This is safely handled via RCU and failed lookups in
      __xdp_return() fallback to call put_page(), when page_pool object is gone.
      In-case page is still DMA mapped, this will result in page note getting
      correctly DMA unmapped.
      
      To solve this, the page_pool is extended with tracking in-flight pages. And
      XDP disconnect system queries page_pool and waits, via workqueue, for all
      in-flight pages to be returned.
      
      To avoid killing performance when tracking in-flight pages, the implement
      use two (unsigned) counters, that in placed on different cache-lines, and
      can be used to deduct in-flight packets. This is done by mapping the
      unsigned "sequence" counters onto signed Two's complement arithmetic
      operations. This is e.g. used by kernel's time_after macros, described in
      kernel commit 1ba3aab3 and 5a581b36, and also explained in RFC1982.
      
      The trick is these two incrementing counters only need to be read and
      compared, when checking if it's safe to free the page_pool structure. Which
      will only happen when driver have disconnected RX/alloc side. Thus, on a
      non-fast-path.
      
      It is chosen that page_pool tracking is also enabled for the non-DMA
      use-case, as this can be used for statistics later.
      
      After this patch, using page_pool requires more strict resource "release",
      e.g. via page_pool_release_page() that was introduced in this patchset, and
      previous patches implement/fix this more strict requirement.
      
      Drivers no-longer call page_pool_destroy(). Drivers already call
      xdp_rxq_info_unreg() which call xdp_rxq_info_unreg_mem_model(), which will
      attempt to disconnect the mem id, and if attempt fails schedule the
      disconnect for later via delayed workqueue.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      99c07c43
    • J
      xdp: page_pool related fix to cpumap · 6bf071bf
      Jesper Dangaard Brouer 提交于
      When converting an xdp_frame into an SKB, and sending this into the network
      stack, then the underlying XDP memory model need to release associated
      resources, because the network stack don't have callbacks for XDP memory
      models.  The only memory model that needs this is page_pool, when a driver
      use the DMA-mapping feature.
      
      Introduce page_pool_release_page(), which basically does the same as
      page_pool_unmap_page(). Add xdp_release_frame() as the XDP memory model
      interface for calling it, if the memory model match MEM_TYPE_PAGE_POOL, to
      save the function call overhead for others. Have cpumap call
      xdp_release_frame() before xdp_scrub_frame().
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6bf071bf
    • J
      xdp: fix leak of IDA cyclic id if rhashtable_insert_slow fails · 516a7593
      Jesper Dangaard Brouer 提交于
      Fix error handling case, where inserting ID with rhashtable_insert_slow
      fails in xdp_rxq_info_reg_mem_model, which leads to never releasing the IDA
      ID, as the lookup in xdp_rxq_info_unreg_mem_model fails and thus
      ida_simple_remove() is never called.
      
      Fix by releasing ID via ida_simple_remove(), and mark xdp_rxq->mem.id with
      zero, which is already checked in xdp_rxq_info_unreg_mem_model().
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      516a7593
  20. 05 6月, 2019 1 次提交
  21. 01 9月, 2018 1 次提交
  22. 30 8月, 2018 2 次提交
  23. 17 8月, 2018 1 次提交
    • T
      net/xdp: Fix suspicious RCU usage warning · 21b172ee
      Tariq Toukan 提交于
      Fix the warning below by calling rhashtable_lookup_fast.
      Also, make some code movements for better quality and human
      readability.
      
      [  342.450870] WARNING: suspicious RCU usage
      [  342.455856] 4.18.0-rc2+ #17 Tainted: G           O
      [  342.462210] -----------------------------
      [  342.467202] ./include/linux/rhashtable.h:481 suspicious rcu_dereference_check() usage!
      [  342.476568]
      [  342.476568] other info that might help us debug this:
      [  342.476568]
      [  342.486978]
      [  342.486978] rcu_scheduler_active = 2, debug_locks = 1
      [  342.495211] 4 locks held by modprobe/3934:
      [  342.500265]  #0: 00000000e23116b2 (mlx5_intf_mutex){+.+.}, at:
      mlx5_unregister_interface+0x18/0x90 [mlx5_core]
      [  342.511953]  #1: 00000000ca16db96 (rtnl_mutex){+.+.}, at: unregister_netdev+0xe/0x20
      [  342.521109]  #2: 00000000a46e2c4b (&priv->state_lock){+.+.}, at: mlx5e_close+0x29/0x60
      [mlx5_core]
      [  342.531642]  #3: 0000000060c5bde3 (mem_id_lock){+.+.}, at: xdp_rxq_info_unreg+0x93/0x6b0
      [  342.541206]
      [  342.541206] stack backtrace:
      [  342.547075] CPU: 12 PID: 3934 Comm: modprobe Tainted: G           O      4.18.0-rc2+ #17
      [  342.556621] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 1.5.4 10/002/2015
      [  342.565606] Call Trace:
      [  342.568861]  dump_stack+0x78/0xb3
      [  342.573086]  xdp_rxq_info_unreg+0x3f5/0x6b0
      [  342.578285]  ? __call_rcu+0x220/0x300
      [  342.582911]  mlx5e_free_rq+0x38/0xc0 [mlx5_core]
      [  342.588602]  mlx5e_close_channel+0x20/0x120 [mlx5_core]
      [  342.594976]  mlx5e_close_channels+0x26/0x40 [mlx5_core]
      [  342.601345]  mlx5e_close_locked+0x44/0x50 [mlx5_core]
      [  342.607519]  mlx5e_close+0x42/0x60 [mlx5_core]
      [  342.613005]  __dev_close_many+0xb1/0x120
      [  342.617911]  dev_close_many+0xa2/0x170
      [  342.622622]  rollback_registered_many+0x148/0x460
      [  342.628401]  ? __lock_acquire+0x48d/0x11b0
      [  342.633498]  ? unregister_netdev+0xe/0x20
      [  342.638495]  rollback_registered+0x56/0x90
      [  342.643588]  unregister_netdevice_queue+0x7e/0x100
      [  342.649461]  unregister_netdev+0x18/0x20
      [  342.654362]  mlx5e_remove+0x2a/0x50 [mlx5_core]
      [  342.659944]  mlx5_remove_device+0xe5/0x110 [mlx5_core]
      [  342.666208]  mlx5_unregister_interface+0x39/0x90 [mlx5_core]
      [  342.673038]  cleanup+0x5/0xbfc [mlx5_core]
      [  342.678094]  __x64_sys_delete_module+0x16b/0x240
      [  342.683725]  ? do_syscall_64+0x1c/0x210
      [  342.688476]  do_syscall_64+0x5a/0x210
      [  342.693025]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: 8d5d8852 ("xdp: rhashtable with allocator ID to pointer mapping")
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Suggested-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      21b172ee
  24. 10 8月, 2018 2 次提交
    • T
      xdp: Helpers for disabling napi_direct of xdp_return_frame · 2539650f
      Toshiaki Makita 提交于
      We need some mechanism to disable napi_direct on calling
      xdp_return_frame_rx_napi() from some context.
      When veth gets support of XDP_REDIRECT, it will redirects packets which
      are redirected from other devices. On redirection veth will reuse
      xdp_mem_info of the redirection source device to make return_frame work.
      But in this case .ndo_xdp_xmit() called from veth redirection uses
      xdp_mem_info which is not guarded by NAPI, because the .ndo_xdp_xmit()
      is not called directly from the rxq which owns the xdp_mem_info.
      
      This approach introduces a flag in bpf_redirect_info to indicate that
      napi_direct should be disabled even when _rx_napi variant is used as
      well as helper functions to use it.
      
      A NAPI handler who wants to use this flag needs to call
      xdp_set_return_frame_no_direct() before processing packets, and call
      xdp_clear_return_frame_no_direct() after xdp_do_flush_map() before
      exiting NAPI.
      
      v4:
      - Use bpf_redirect_info for storing the flag instead of xdp_mem_info to
        avoid per-frame copy cost.
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      2539650f
    • B
      Revert "xdp: add NULL pointer check in __xdp_return()" · eb91e4d4
      Björn Töpel 提交于
      This reverts commit 36e0f12b.
      
      The reverted commit adds a WARN to check against NULL entries in the
      mem_id_ht rhashtable. Any kernel path implementing the XDP (generic or
      driver) fast path is required to make a paired
      xdp_rxq_info_reg/xdp_rxq_info_unreg call for proper function. In
      addition, a driver using a different allocation scheme than the
      default MEM_TYPE_PAGE_SHARED is required to additionally call
      xdp_rxq_info_reg_mem_model.
      
      For MEM_TYPE_ZERO_COPY, an xdp_rxq_info_reg_mem_model call ensures
      that the mem_id_ht rhashtable has a properly inserted allocator id. If
      not, this would be a driver bug. A NULL pointer kernel OOPS is
      preferred to the WARN.
      Suggested-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      eb91e4d4
  25. 27 7月, 2018 1 次提交
  26. 14 7月, 2018 1 次提交
    • J
      xdp: factor out common program/flags handling from drivers · 05296620
      Jakub Kicinski 提交于
      Basic operations drivers perform during xdp setup and query can
      be moved to helpers in the core.  Encapsulate program and flags
      into a structure and add helpers.  Note that the structure is
      intended as the "main" program information source in the driver.
      Most drivers will additionally place the program pointer in their
      fast path or ring structures.
      
      The helpers don't have a huge impact now, but they will
      decrease the code duplication when programs can be installed
      in HW and driver at the same time.  Encapsulating the basic
      operations in helpers will hopefully also reduce the number
      of changes to drivers which adopt them.
      
      Helpers could really be static inline, but they depend on
      definition of struct netdev_bpf which means they'd have
      to be placed in netdevice.h, an already 4500 line header.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      05296620
  27. 22 6月, 2018 1 次提交
    • N
      rhashtable: remove nulls_base and related code. · 9f9a7077
      NeilBrown 提交于
      This "feature" is unused, undocumented, and untested and so doesn't
      really belong.  A patch is under development to properly implement
      support for detecting when a search gets diverted down a different
      chain, which the common purpose of nulls markers.
      
      This patch actually fixes a bug too.  The table resizing allows a
      table to grow to 2^31 buckets, but the hash is truncated to 27 bits -
      any growth beyond 2^27 is wasteful an ineffective.
      
      This patch results in NULLS_MARKER(0) being used for all chains,
      and leaves the use of rht_is_a_null() to test for it.
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f9a7077
  28. 05 6月, 2018 1 次提交
  29. 25 5月, 2018 1 次提交
    • J
      xdp: introduce xdp_return_frame_rx_napi · 389ab7f0
      Jesper Dangaard Brouer 提交于
      When sending an xdp_frame through xdp_do_redirect call, then error
      cases can happen where the xdp_frame needs to be dropped, and
      returning an -errno code isn't sufficient/possible any-longer
      (e.g. for cpumap case). This is already fully supported, by simply
      calling xdp_return_frame.
      
      This patch is an optimization, which provides xdp_return_frame_rx_napi,
      which is a faster variant for these error cases.  It take advantage of
      the protection provided by XDP RX running under NAPI protection.
      
      This change is mostly relevant for drivers using the page_pool
      allocator as it can take advantage of this. (Tested with mlx5).
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      389ab7f0