1. 24 2月, 2022 5 次提交
  2. 03 12月, 2021 1 次提交
    • A
      net/mlx5: Dynamically resize flow counters query buffer · b247f32a
      Avihai Horon 提交于
      The flow counters bulk query buffer is allocated once during
      mlx5_fc_init_stats(). For PFs and VFs this buffer usually takes a little
      more than 512KB of memory, which is aligned to the next power of 2, to
      1MB. For SFs, this buffer is reduced and takes around 128 Bytes.
      
      The buffer size determines the maximum number of flow counters that
      can be queried at a time. Thus, having a bigger buffer can improve
      performance for users that need to query many flow counters.
      
      There are cases that don't use many flow counters and don't need a big
      buffer (e.g. SFs, VFs). Since this size is critical with large scale,
      in these cases the buffer size should be reduced.
      
      In order to reduce memory consumption while maintaining query
      performance, change the query buffer's allocation scheme to the
      following:
      - First allocate the buffer with small initial size.
      - If the number of counters surpasses the initial size, resize the
        buffer to the maximum size.
      
      The buffer only grows and isn't shrank, because users with many flow
      counters don't care about the buffer size and we don't want to add
      resize overhead if the current number of counters drops.
      
      This solution is preferable to the current one, which is less accurate
      and only addresses SFs.
      Signed-off-by: NAvihai Horon <avihaih@nvidia.com>
      Reviewed-by: NMark Bloch <mbloch@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      b247f32a
  3. 27 10月, 2021 1 次提交
  4. 26 10月, 2021 2 次提交
  5. 21 10月, 2021 1 次提交
  6. 19 10月, 2021 5 次提交
  7. 16 10月, 2021 5 次提交
  8. 05 10月, 2021 1 次提交
  9. 28 9月, 2021 1 次提交
  10. 12 8月, 2021 4 次提交
    • P
      net/mlx5: Allocate individual capability · 48f02eef
      Parav Pandit 提交于
      Currently mlx5_core_dev contains array of capabilities. It contains 19
      valid capabilities of the device, 2 reserved entries and 12 holes.
      Due to this for 14 unused entries, mlx5_core_dev allocates 14 * 8K = 112K
      bytes of memory which is never used. Due to this mlx5_core_dev structure
      size is 270Kbytes odd. This allocation further aligns to next power of 2
      to 512Kbytes.
      
      By skipping non-existent entries,
      (a) 112Kbyte is saved,
      (b) mlx5_core_dev reduces to 8KB with alignment
      (c) 350KB saved in alignment
      
      In future individual capability allocation can be used to skip its
      allocation when such capability is disabled at the device level. This
      patch prepares mlx5_core_dev to hold capability using a pointer instead
      of inline array.
      Signed-off-by: NParav Pandit <parav@nvidia.com>
      Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: NShay Drory <shayd@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      48f02eef
    • P
      net/mlx5: Reorganize current and maximal capabilities to be per-type · 5958a6fa
      Parav Pandit 提交于
      In the current code, the current and maximal capabilities are
      maintained in separate arrays which are both per type. In order to
      allow the creation of such a basic structure as a dynamically
      allocated array, we move curr and max fields to a unified
      structure so that specific capabilities can be allocated as one unit.
      Signed-off-by: NParav Pandit <parav@nvidia.com>
      Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: NShay Drory <shayd@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      5958a6fa
    • L
      net/mlx5: Delete impossible dev->state checks · 8e792700
      Leon Romanovsky 提交于
      New mlx5_core device structure is allocated through devlink_alloc
      with\ kzalloc and that ensures that all fields are equal to zero
      and it includes ->state too.
      
      That means that checks of that field in the mlx5_init_one() is
      completely redundant, because that function is called only once
      in the begging of mlx5_core_dev lifetime.
      
      PCI:
       .probe()
        -> probe_one()
         -> mlx5_init_one()
      
      The recovery flow can't run at that time or before it, because relevant
      work initialized later in mlx5_init_once().
      
      Such initialization flow ensures that dev->state can't be
      MLX5_DEVICE_STATE_UNINITIALIZED at all, so remove such impossible
      checks.
      Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      8e792700
    • C
      net/mlx5: Fix typo in comments · 39c538d6
      Cai Huoqing 提交于
      Fix typo:
      *vectores  ==> vectors
      *realeased  ==> released
      *erros  ==> errors
      *namepsace  ==> namespace
      *trafic  ==> traffic
      *proccessed  ==> processed
      *retore  ==> restore
      *Currenlty  ==> Currently
      *crated  ==> created
      *chane  ==> change
      *cannnot  ==> cannot
      *usuallly  ==> usually
      *failes  ==> fails
      *importent  ==> important
      *reenabled  ==> re-enabled
      *alocation  ==> allocation
      *recived  ==> received
      *tanslation  ==> translation
      Signed-off-by: NCai Huoqing <caihuoqing@baidu.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      39c538d6
  11. 10 8月, 2021 1 次提交
  12. 06 8月, 2021 1 次提交
  13. 17 6月, 2021 1 次提交
    • D
      net/mlx5e: Don't create devices during unload flow · a5ae8fc9
      Dmytro Linkin 提交于
      Running devlink reload command for port in switchdev mode cause
      resources to corrupt: driver can't release allocated EQ and reclaim
      memory pages, because "rdma" auxiliary device had add CQs which blocks
      EQ from deletion.
      Erroneous sequence happens during reload-down phase, and is following:
      
      1. detach device - suspends auxiliary devices which support it, destroys
         others. During this step "eth-rep" and "rdma-rep" are destroyed,
         "eth" - suspended.
      2. disable SRIOV - moves device to legacy mode; as part of disablement -
         rescans drivers. This step adds "rdma" auxiliary device.
      3. destroy EQ table - <failure>.
      
      Driver shouldn't create any device during unload flows. To handle that
      implement MLX5_PRIV_FLAGS_DETACH flag, set it on device detach and unset
      on device attach. If flag is set do no-op on drivers rescan.
      
      Fixes: a925b5e3 ("net/mlx5: Register mlx5 devices to auxiliary virtual bus")
      Signed-off-by: NDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: NRoi Dayan <roid@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      a5ae8fc9
  14. 28 5月, 2021 1 次提交
  15. 19 5月, 2021 1 次提交
  16. 17 4月, 2021 1 次提交
  17. 03 4月, 2021 2 次提交
    • P
      net/mlx5: Allocate rate limit table when rate is configured · 6b30b6d4
      Parav Pandit 提交于
      A device supports 128 rate limiters. A static table allocation consumes
      8KB of memory even when rate is not configured.
      
      Instead, allocate the table when at least one rate is configured.
      Signed-off-by: NParav Pandit <parav@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      6b30b6d4
    • P
      net/mlx5: Pack mlx5_rl_entry structure · 4c4c0a89
      Parav Pandit 提交于
      mlx5_rl_entry structure is not properly packed as shown below. Due to this
      an array of size 9144 bytes allocated which is aligned to 16Kbytes.
      Hence, pack the structure and avoid the wastage.
      
      This offers 8Kbytes of saving per mlx5_core_dev struct.
      
      pahole -C mlx5_rl_entry  drivers/net/ethernet/mellanox/mlx5/core/en_main.o
      
      Existing layout:
      
      struct mlx5_rl_entry {
              u8                         rl_raw[48];           /*     0    48 */
              u16                        index;                /*    48     2 */
      
              /* XXX 6 bytes hole, try to pack */
      
              u64                        refcount;             /*    56     8 */
              /* --- cacheline 1 boundary (64 bytes) --- */
              u16                        uid;                  /*    64     2 */
              u8                         dedicated:1;          /*    66: 0  1 */
      
              /* size: 72, cachelines: 2, members: 5 */
              /* sum members: 60, holes: 1, sum holes: 6 */
              /* sum bitfield members: 1 bits (0 bytes) */
              /* padding: 5 */
              /* bit_padding: 7 bits */
              /* last cacheline: 8 bytes */
      };
      
      After alignment:
      
      struct mlx5_rl_entry {
              u8                         rl_raw[48];           /*     0    48 */
              u64                        refcount;             /*    48     8 */
              u16                        index;                /*    56     2 */
              u16                        uid;                  /*    58     2 */
              u8                         dedicated:1;          /*    60: 0  1 */
      
              /* size: 64, cachelines: 1, members: 5 */
              /* padding: 3 */
              /* bit_padding: 7 bits */
      };
      Signed-off-by: NParav Pandit <parav@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      4c4c0a89
  18. 17 3月, 2021 3 次提交
  19. 13 3月, 2021 2 次提交
  20. 12 3月, 2021 1 次提交