1. 19 10月, 2021 3 次提交
  2. 16 10月, 2021 5 次提交
  3. 13 10月, 2021 1 次提交
  4. 05 10月, 2021 3 次提交
  5. 20 8月, 2021 1 次提交
  6. 12 8月, 2021 4 次提交
    • P
      net/mlx5: Allocate individual capability · 48f02eef
      Parav Pandit 提交于
      Currently mlx5_core_dev contains array of capabilities. It contains 19
      valid capabilities of the device, 2 reserved entries and 12 holes.
      Due to this for 14 unused entries, mlx5_core_dev allocates 14 * 8K = 112K
      bytes of memory which is never used. Due to this mlx5_core_dev structure
      size is 270Kbytes odd. This allocation further aligns to next power of 2
      to 512Kbytes.
      
      By skipping non-existent entries,
      (a) 112Kbyte is saved,
      (b) mlx5_core_dev reduces to 8KB with alignment
      (c) 350KB saved in alignment
      
      In future individual capability allocation can be used to skip its
      allocation when such capability is disabled at the device level. This
      patch prepares mlx5_core_dev to hold capability using a pointer instead
      of inline array.
      Signed-off-by: NParav Pandit <parav@nvidia.com>
      Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: NShay Drory <shayd@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      48f02eef
    • P
      net/mlx5: Reorganize current and maximal capabilities to be per-type · 5958a6fa
      Parav Pandit 提交于
      In the current code, the current and maximal capabilities are
      maintained in separate arrays which are both per type. In order to
      allow the creation of such a basic structure as a dynamically
      allocated array, we move curr and max fields to a unified
      structure so that specific capabilities can be allocated as one unit.
      Signed-off-by: NParav Pandit <parav@nvidia.com>
      Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: NShay Drory <shayd@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      5958a6fa
    • L
      net/mlx5: Delete impossible dev->state checks · 8e792700
      Leon Romanovsky 提交于
      New mlx5_core device structure is allocated through devlink_alloc
      with\ kzalloc and that ensures that all fields are equal to zero
      and it includes ->state too.
      
      That means that checks of that field in the mlx5_init_one() is
      completely redundant, because that function is called only once
      in the begging of mlx5_core_dev lifetime.
      
      PCI:
       .probe()
        -> probe_one()
         -> mlx5_init_one()
      
      The recovery flow can't run at that time or before it, because relevant
      work initialized later in mlx5_init_once().
      
      Such initialization flow ensures that dev->state can't be
      MLX5_DEVICE_STATE_UNINITIALIZED at all, so remove such impossible
      checks.
      Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      8e792700
    • C
      net/mlx5: Fix typo in comments · 39c538d6
      Cai Huoqing 提交于
      Fix typo:
      *vectores  ==> vectors
      *realeased  ==> released
      *erros  ==> errors
      *namepsace  ==> namespace
      *trafic  ==> traffic
      *proccessed  ==> processed
      *retore  ==> restore
      *Currenlty  ==> Currently
      *crated  ==> created
      *chane  ==> change
      *cannnot  ==> cannot
      *usuallly  ==> usually
      *failes  ==> fails
      *importent  ==> important
      *reenabled  ==> re-enabled
      *alocation  ==> allocation
      *recived  ==> received
      *tanslation  ==> translation
      Signed-off-by: NCai Huoqing <caihuoqing@baidu.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      39c538d6
  7. 11 8月, 2021 1 次提交
  8. 10 8月, 2021 1 次提交
  9. 06 8月, 2021 4 次提交
  10. 03 8月, 2021 1 次提交
  11. 27 7月, 2021 1 次提交
    • M
      net/mlx5e: Block LRO if firmware asks for tunneled LRO · 26ab7b38
      Maxim Mikityanskiy 提交于
      This commit does a cleanup in LRO configuration.
      
      LRO is a parameter of an RQ, but its state is changed by modifying a TIR
      related to the RQ.
      
      The current status: LRO for tunneled packets is not supported in the
      driver, inner TIRs may enable LRO on creation, but LRO status of inner
      TIRs isn't changed in mlx5e_modify_tirs_lro(). This is inconsistent, but
      as long as the firmware doesn't declare support for tunneled LRO, it
      works, because the same RQs are shared between the inner and outer TIRs.
      
      This commit does two fixes:
      
      1. If the firmware has the tunneled LRO capability, LRO is blocked
      altogether, because it's not possible to block it for inner TIRs only,
      when the same RQs are shared between inner and outer TIRs, and the
      driver won't be able to handle tunneled LRO traffic.
      
      2. mlx5e_modify_tirs_lro() is patched to modify LRO state for all TIRs,
      including inner ones, because all TIRs related to an RQ should agree on
      their LRO state.
      
      Fixes: 7b3722fa ("net/mlx5e: Support RSS for GRE tunneled packets")
      Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      26ab7b38
  12. 25 7月, 2021 1 次提交
  13. 18 7月, 2021 1 次提交
  14. 03 7月, 2021 1 次提交
  15. 26 6月, 2021 1 次提交
  16. 22 6月, 2021 1 次提交
  17. 17 6月, 2021 1 次提交
    • D
      net/mlx5e: Don't create devices during unload flow · a5ae8fc9
      Dmytro Linkin 提交于
      Running devlink reload command for port in switchdev mode cause
      resources to corrupt: driver can't release allocated EQ and reclaim
      memory pages, because "rdma" auxiliary device had add CQs which blocks
      EQ from deletion.
      Erroneous sequence happens during reload-down phase, and is following:
      
      1. detach device - suspends auxiliary devices which support it, destroys
         others. During this step "eth-rep" and "rdma-rep" are destroyed,
         "eth" - suspended.
      2. disable SRIOV - moves device to legacy mode; as part of disablement -
         rescans drivers. This step adds "rdma" auxiliary device.
      3. destroy EQ table - <failure>.
      
      Driver shouldn't create any device during unload flows. To handle that
      implement MLX5_PRIV_FLAGS_DETACH flag, set it on device detach and unset
      on device attach. If flag is set do no-op on drivers rescan.
      
      Fixes: a925b5e3 ("net/mlx5: Register mlx5 devices to auxiliary virtual bus")
      Signed-off-by: NDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: NRoi Dayan <roid@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      a5ae8fc9
  18. 15 6月, 2021 2 次提交
  19. 10 6月, 2021 5 次提交
    • V
      net/mlx5: Bridge, add offload infrastructure · 19e9bfa0
      Vlad Buslov 提交于
      Create new files bridge.{c|h} in en/rep directory that implement bridge
      interaction with representor netdevices and handle required
      events/notifications, bridge.{c|h} in esw directory that implement all
      necessary eswitch offloading infrastructure and works on vport/eswitch
      level. Provide new kconfig MLX5_BRIDGE which is automatically selected when
      both kernel bridge and mlx5 eswitch configs are enabled.
      
      Provide basic infrastructure for bridge offloads:
      
      - struct mlx5_esw_bridge_offloads - per-eswitch bridge offload structure
      that encapsulates generic bridge-offloads data (notifier blocks, ingress
      flow table/group, etc.) that is created/deleted on enable/disable eswitch
      offloads.
      
      - struct mlx5_esw_bridge - per-bridge structure that encapsulates
      per-bridge data (reference counter, FDB, egress flow table/group, etc.)
      that is created when first eswitch represetor is attached to new bridge and
      deleted when last representor is removed from the bridge as a result of
      NETDEV_CHANGEUPPER event.
      
      The bridge tables are created with new priority FDB_BR_OFFLOAD in FDB
      namespace. The new priority is between tc-miss and slow path priorities.
      Priority consist of two levels: the ingress table that is global per
      eswitch and matches incoming packets by src_mac/vid and redirects them to
      next level (egress table) that is chosen according to ingress port bridge
      membership and matches on dst_mac/vid in order to redirect packet to vport
      according to the following diagram:
      
                      +
                      |
            +---------v----------+
            |                    |
            |   FDB_TC_OFFLOAD   |
            |                    |
            +---------+----------+
                      |
                      |
            +---------v----------+
            |                    |
            |   FDB_FT_OFFLOAD   |
            |                    |
            +---------+----------+
                      |
                      |
            +---------v----------+
            |                    |
            |    FDB_TC_MISS     |
            |                    |
            +---------+----------+
                      |
      +--------------------------------------+
      |               |                      |
      |        +------+                      |
      |        |                             |
      | +------v--------+   FDB_BR_OFFLOAD   |
      | | INGRESS_TABLE |                    |
      | +------+---+----+                    |
      |        |   |      match              |
      |        |   +---------+               |
      |        |             |               |    +-------+
      |        |     +-------v-------+ match |    |       |
      |        |     | EGRESS_TABLE  +------------> vport |
      |        |     +-------+-------+       |    |       |
      |        |             |               |    +-------+
      |        |    miss     |               |
      |        +------+------+               |
      |               |                      |
      +--------------------------------------+
                      |
                      |
            +---------v----------+
            |                    |
            |   FDB_SLOW_PATH    |
            |                    |
            +---------+----------+
                      |
                      v
      Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: NJianbo Liu <jianbol@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      19e9bfa0
    • V
      net/mlx5: Create TC-miss priority and table · ec3be887
      Vlad Buslov 提交于
      In order to adhere to kernel software datapath model bridge offloads must
      come after TC and NF FDBs. Following patches in this series add new FDB
      priority for bridge after FDB_FT_OFFLOAD. However, since netfilter offload
      is implemented with unmanaged tables, its miss path is not automatically
      connected to next priority and requires the code to manually connect with
      slow table. To keep bridge offloads encapsulated and not mix it with
      eswitch offloads, create a new FDB_TC_MISS priority between FDB_FT_OFFLOAD
      and FDB_SLOW_PATH:
      
                +
                |
      +---------v----------+
      |                    |
      |   FDB_TC_OFFLOAD   |
      |                    |
      +---------+----------+
                |
                |
                |
      +---------v----------+
      |                    |
      |   FDB_FT_OFFLOAD   |
      |                    |
      +---------+----------+
                |
                |
                |
      +---------v----------+
      |                    |
      |    FDB_TC_MISS     |
      |                    |
      +---------+----------+
                |
                |
                |
      +---------v----------+
      |                    |
      |   FDB_SLOW_PATH    |
      |                    |
      +---------+----------+
                |
                v
      
      Initialize the new priority with single default empty managed table and use
      the table as TC/NF miss patch instead of slow table. This approach allows
      bridge offloads to be created as new FDB namespace priority between
      FDB_TC_MISS and FDB_SLOW_PATH without exposing its internal tables to any
      other modules since miss path of managed TC-miss table is automatically
      wired to next priority.
      Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: NJianbo Liu <jianbol@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      ec3be887
    • Y
      net/mlx5: Added new parameters to reformat context · 3f3f05ab
      Yevgeny Kliteynik 提交于
      Adding new reformat context type (INSERT_HEADER) requires adding two new
      parameters to reformat context - reformat_param_0 and reformat_param_1.
      As defined by HW spec, these parameters have different meaning for
      different reformat context type.
      
      The first parameter (reformat_param_0) is not new to HW spec, but it
      wasn't used by any of the supported reformats. The second parameter
      (reformat_param_1) is new to the HW spec - it was added to allow
      supporting INSERT_HEADER.
      
      For NSERT_HEADER, reformat_param_0 indicates the header used to
      reference the location of the inserted header, and reformat_param_1
      indicates the offset of the inserted header from the reference point
      defined by reformat_param_0.
      Signed-off-by: NYevgeny Kliteynik <kliteyn@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      3f3f05ab
    • Y
      net/mlx5: mlx5_ifc support for header insert/remove · 67133eaa
      Yevgeny Kliteynik 提交于
      Add support for HCA caps 2 that contains capabilities for the new
      insert/remove header actions.
      
      Added the required definitions for supporting the new reformat type:
      added packet reformat parameters, reformat anchors and definitions
      to allow copy/set into the inserted EMD (Embedded MetaData) tag.
      Signed-off-by: NYevgeny Kliteynik <kliteyn@nvidia.com>
      Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: NJianbo Liu <jianbol@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      67133eaa
    • D
      net/mlx5e: Fix page reclaim for dead peer hairpin · a3e5fd93
      Dima Chumak 提交于
      When adding a hairpin flow, a firmware-side send queue is created for
      the peer net device, which claims some host memory pages for its
      internal ring buffer. If the peer net device is removed/unbound before
      the hairpin flow is deleted, then the send queue is not destroyed which
      leads to a stack trace on pci device remove:
      
      [ 748.005230] mlx5_core 0000:08:00.2: wait_func:1094:(pid 12985): MANAGE_PAGES(0x108) timeout. Will cause a leak of a command resource
      [ 748.005231] mlx5_core 0000:08:00.2: reclaim_pages:514:(pid 12985): failed reclaiming pages: err -110
      [ 748.001835] mlx5_core 0000:08:00.2: mlx5_reclaim_root_pages:653:(pid 12985): failed reclaiming pages (-110) for func id 0x0
      [ 748.002171] ------------[ cut here ]------------
      [ 748.001177] FW pages counter is 4 after reclaiming all pages
      [ 748.001186] WARNING: CPU: 1 PID: 12985 at drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:685 mlx5_reclaim_startup_pages+0x34b/0x460 [mlx5_core]                      [  +0.002771] Modules linked in: cls_flower mlx5_ib mlx5_core ptp pps_core act_mirred sch_ingress openvswitch nsh xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm ib_uverbs ib_core overlay fuse [last unloaded: pps_core]
      [ 748.007225] CPU: 1 PID: 12985 Comm: tee Not tainted 5.12.0+ #1
      [ 748.001376] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      [ 748.002315] RIP: 0010:mlx5_reclaim_startup_pages+0x34b/0x460 [mlx5_core]
      [ 748.001679] Code: 28 00 00 00 0f 85 22 01 00 00 48 81 c4 b0 00 00 00 31 c0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c7 40 cc 19 a1 e8 9f 71 0e e2 <0f> 0b e9 30 ff ff ff 48 c7 c7 a0 cc 19 a1 e8 8c 71 0e e2 0f 0b e9
      [ 748.003781] RSP: 0018:ffff88815220faf8 EFLAGS: 00010286
      [ 748.001149] RAX: 0000000000000000 RBX: ffff8881b4900280 RCX: 0000000000000000
      [ 748.001445] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffffed102a441f51
      [ 748.001614] RBP: 00000000000032b9 R08: 0000000000000001 R09: ffffed1054a15ee8
      [ 748.001446] R10: ffff8882a50af73b R11: ffffed1054a15ee7 R12: fffffbfff07c1e30
      [ 748.001447] R13: dffffc0000000000 R14: ffff8881b492cba8 R15: 0000000000000000
      [ 748.001429] FS:  00007f58bd08b580(0000) GS:ffff8882a5080000(0000) knlGS:0000000000000000
      [ 748.001695] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 748.001309] CR2: 000055a026351740 CR3: 00000001d3b48006 CR4: 0000000000370ea0
      [ 748.001506] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 748.001483] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 748.001654] Call Trace:
      [ 748.000576]  ? mlx5_satisfy_startup_pages+0x290/0x290 [mlx5_core]
      [ 748.001416]  ? mlx5_cmd_teardown_hca+0xa2/0xd0 [mlx5_core]
      [ 748.001354]  ? mlx5_cmd_init_hca+0x280/0x280 [mlx5_core]
      [ 748.001203]  mlx5_function_teardown+0x30/0x60 [mlx5_core]
      [ 748.001275]  mlx5_uninit_one+0xa7/0xc0 [mlx5_core]
      [ 748.001200]  remove_one+0x5f/0xc0 [mlx5_core]
      [ 748.001075]  pci_device_remove+0x9f/0x1d0
      [ 748.000833]  device_release_driver_internal+0x1e0/0x490
      [ 748.001207]  unbind_store+0x19f/0x200
      [ 748.000942]  ? sysfs_file_ops+0x170/0x170
      [ 748.001000]  kernfs_fop_write_iter+0x2bc/0x450
      [ 748.000970]  new_sync_write+0x373/0x610
      [ 748.001124]  ? new_sync_read+0x600/0x600
      [ 748.001057]  ? lock_acquire+0x4d6/0x700
      [ 748.000908]  ? lockdep_hardirqs_on_prepare+0x400/0x400
      [ 748.001126]  ? fd_install+0x1c9/0x4d0
      [ 748.000951]  vfs_write+0x4d0/0x800
      [ 748.000804]  ksys_write+0xf9/0x1d0
      [ 748.000868]  ? __x64_sys_read+0xb0/0xb0
      [ 748.000811]  ? filp_open+0x50/0x50
      [ 748.000919]  ? syscall_enter_from_user_mode+0x1d/0x50
      [ 748.001223]  do_syscall_64+0x3f/0x80
      [ 748.000892]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [ 748.001026] RIP: 0033:0x7f58bcfb22f7
      [ 748.000944] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
      [ 748.003925] RSP: 002b:00007fffd7f2aaa8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [ 748.001732] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f58bcfb22f7
      [ 748.001426] RDX: 000000000000000d RSI: 00007fffd7f2abc0 RDI: 0000000000000003
      [ 748.001746] RBP: 00007fffd7f2abc0 R08: 0000000000000000 R09: 0000000000000001
      [ 748.001631] R10: 00000000000001b6 R11: 0000000000000246 R12: 000000000000000d
      [ 748.001537] R13: 00005597ac2c24a0 R14: 000000000000000d R15: 00007f58bd084700
      [ 748.001564] irq event stamp: 0
      [ 748.000787] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
      [ 748.001399] hardirqs last disabled at (0): [<ffffffff813132cf>] copy_process+0x146f/0x5eb0
      [ 748.001854] softirqs last  enabled at (0): [<ffffffff8131330e>] copy_process+0x14ae/0x5eb0
      [ 748.013431] softirqs last disabled at (0): [<0000000000000000>] 0x0
      [ 748.001492] ---[ end trace a6fabd773d1c51ae ]---
      
      Fix by destroying the send queue of a hairpin peer net device that is
      being removed/unbound, which returns the allocated ring buffer pages to
      the host.
      
      Fixes: 4d8fcf21 ("net/mlx5e: Avoid unbounded peer devices when unpairing TC hairpin rules")
      Signed-off-by: NDima Chumak <dchumak@nvidia.com>
      Reviewed-by: NRoi Dayan <roid@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      a3e5fd93
  20. 02 6月, 2021 1 次提交
  21. 28 5月, 2021 1 次提交