1. 18 5月, 2022 1 次提交
  2. 10 5月, 2022 4 次提交
    • M
      net/mlx5: Lag, add debugfs to query hardware lag state · 7f46a0b7
      Mark Bloch 提交于
      Lag state has become very complicated with many modes, flags, types and
      port selections methods and future work will add additional features.
      
      Add a debugfs to query the current lag state. A new directory named "lag"
      will be created under the mlx5 debugfs directory. As the driver has
      debugfs per pci function the location will be: <debugfs>/mlx5/<BDF>/lag
      
      For example:
      /sys/kernel/debug/mlx5/0000:08:00.0/lag
      
      The following files are exposed:
      
      - state: Returns "active" or "disabled". If "active" it means hardware
               lag is active.
      
      - members: Returns the BDFs of all the members of lag object.
      
      - type: Returns the type of the lag currently configured. Valid only
      	if hardware lag is active.
      	* "roce" - Members are bare metal PFs.
      	* "switchdev" - Members are in switchdev mode.
      	* "multipath" - ECMP offloads.
      
      - port_sel_mode: Returns the egress port selection method, valid
      		 only if hardware lag is active.
      		 * "queue_affinity" - Egress port is selected by
      		   the QP/SQ affinity.
      		 * "hash" - Egress port is selected by hash done on
      		   each packet. Controlled by: xmit_hash_policy of the
      		   bond device.
      - flags: Returns flags that are specific per lag @type. Valid only if
      	 hardware lag is active.
      	 * "shared_fdb" - "on" or "off", if "on" single FDB is used.
      
      - mapping: Returns the mapping which is used to select egress port.
      	   Valid only if hardware lag is active.
      	   If @port_sel_mode is "hash" returns the active egress ports.
      	   The hash result will select only active ports.
      	   if @port_sel_mode is "queue_affinity" returns the mapping
      	   between the configured port affinity of the QP/SQ and actual
      	   egress port. For example:
      	   * 1:1 - Mapping means if the configured affinity is port 1
      	           traffic will egress via port 1.
      	   * 1:2 - Mapping means if the configured affinity is port 1
      		   traffic will egress via port 2. This can happen
      		   if port 1 is down or in active/backup mode and port 1
      		   is backup.
      Signed-off-by: NMark Bloch <mbloch@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      7f46a0b7
    • M
      net/mlx5: Support devices with more than 2 ports · 4cd14d44
      Mark Bloch 提交于
      Increase the define MLX5_MAX_PORTS to 4 as the driver is ready
      to support NICs with 4 ports.
      Signed-off-by: NMark Bloch <mbloch@nvidia.com>
      Reviewed-by: NMaor Gottlieb <maorg@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      4cd14d44
    • M
      net/mlx5: Lag, expose number of lag ports · 34a30d76
      Mark Bloch 提交于
      Downstream patches will add support for hardware lag with
      more than 2 ports. Add a way for users to query the number of lag ports.
      Signed-off-by: NMark Bloch <mbloch@nvidia.com>
      Reviewed-by: NMaor Gottlieb <maorg@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      34a30d76
    • G
      net/mlx5: Add exit route when waiting for FW · 8324a02c
      Gavin Li 提交于
      Currently, removing a device needs to get the driver interface lock before
      doing any cleanup. If the driver is waiting in a loop for FW init, there
      is no way to cancel the wait, instead the device cleanup waits for the
      loop to conclude and release the lock.
      
      To allow immediate response to remove device commands, check the TEARDOWN
      flag while waiting for FW init, and exit the loop if it has been set.
      Signed-off-by: NGavin Li <gavinl@nvidia.com>
      Reviewed-by: NMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      8324a02c
  3. 09 4月, 2022 2 次提交
  4. 18 3月, 2022 2 次提交
  5. 10 3月, 2022 4 次提交
  6. 27 2月, 2022 1 次提交
    • Y
      net/mlx5: Expose APIs to get/put the mlx5 core device · 1695b97b
      Yishai Hadas 提交于
      Expose an API to get the mlx5 core device from a given VF PCI device if
      mlx5_core is its driver.
      
      Upon the get API we stay with the intf_state_mutex locked to make sure
      that the device can't be gone/unloaded till the caller will complete
      its job over the device, this expects to be for a short period of time
      for any flow that the lock is taken.
      
      Upon the put API we unlock the intf_state_mutex.
      
      The use case for those APIs is the migration flow of a VF over VFIO PCI.
      In that case the VF doesn't ride on mlx5_core, because the device is
      driving *two* different PCI devices, the PF owned by mlx5_core and the
      VF owned by the vfio driver.
      
      The mlx5_core of the PF is accessed only during the narrow window of the
      VF's ioctl that requires its services.
      
      This allows the PF driver to be more independent of the VF driver, so
      long as it doesn't reset the FW.
      
      Link: https://lore.kernel.org/all/20220224142024.147653-6-yishaih@nvidia.comSigned-off-by: NYishai Hadas <yishaih@nvidia.com>
      Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
      1695b97b
  7. 24 2月, 2022 5 次提交
  8. 03 12月, 2021 1 次提交
    • A
      net/mlx5: Dynamically resize flow counters query buffer · b247f32a
      Avihai Horon 提交于
      The flow counters bulk query buffer is allocated once during
      mlx5_fc_init_stats(). For PFs and VFs this buffer usually takes a little
      more than 512KB of memory, which is aligned to the next power of 2, to
      1MB. For SFs, this buffer is reduced and takes around 128 Bytes.
      
      The buffer size determines the maximum number of flow counters that
      can be queried at a time. Thus, having a bigger buffer can improve
      performance for users that need to query many flow counters.
      
      There are cases that don't use many flow counters and don't need a big
      buffer (e.g. SFs, VFs). Since this size is critical with large scale,
      in these cases the buffer size should be reduced.
      
      In order to reduce memory consumption while maintaining query
      performance, change the query buffer's allocation scheme to the
      following:
      - First allocate the buffer with small initial size.
      - If the number of counters surpasses the initial size, resize the
        buffer to the maximum size.
      
      The buffer only grows and isn't shrank, because users with many flow
      counters don't care about the buffer size and we don't want to add
      resize overhead if the current number of counters drops.
      
      This solution is preferable to the current one, which is less accurate
      and only addresses SFs.
      Signed-off-by: NAvihai Horon <avihaih@nvidia.com>
      Reviewed-by: NMark Bloch <mbloch@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      b247f32a
  9. 27 10月, 2021 1 次提交
  10. 26 10月, 2021 2 次提交
  11. 21 10月, 2021 1 次提交
  12. 19 10月, 2021 5 次提交
  13. 16 10月, 2021 5 次提交
  14. 05 10月, 2021 1 次提交
  15. 28 9月, 2021 1 次提交
  16. 12 8月, 2021 4 次提交
    • P
      net/mlx5: Allocate individual capability · 48f02eef
      Parav Pandit 提交于
      Currently mlx5_core_dev contains array of capabilities. It contains 19
      valid capabilities of the device, 2 reserved entries and 12 holes.
      Due to this for 14 unused entries, mlx5_core_dev allocates 14 * 8K = 112K
      bytes of memory which is never used. Due to this mlx5_core_dev structure
      size is 270Kbytes odd. This allocation further aligns to next power of 2
      to 512Kbytes.
      
      By skipping non-existent entries,
      (a) 112Kbyte is saved,
      (b) mlx5_core_dev reduces to 8KB with alignment
      (c) 350KB saved in alignment
      
      In future individual capability allocation can be used to skip its
      allocation when such capability is disabled at the device level. This
      patch prepares mlx5_core_dev to hold capability using a pointer instead
      of inline array.
      Signed-off-by: NParav Pandit <parav@nvidia.com>
      Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: NShay Drory <shayd@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      48f02eef
    • P
      net/mlx5: Reorganize current and maximal capabilities to be per-type · 5958a6fa
      Parav Pandit 提交于
      In the current code, the current and maximal capabilities are
      maintained in separate arrays which are both per type. In order to
      allow the creation of such a basic structure as a dynamically
      allocated array, we move curr and max fields to a unified
      structure so that specific capabilities can be allocated as one unit.
      Signed-off-by: NParav Pandit <parav@nvidia.com>
      Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: NShay Drory <shayd@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      5958a6fa
    • L
      net/mlx5: Delete impossible dev->state checks · 8e792700
      Leon Romanovsky 提交于
      New mlx5_core device structure is allocated through devlink_alloc
      with\ kzalloc and that ensures that all fields are equal to zero
      and it includes ->state too.
      
      That means that checks of that field in the mlx5_init_one() is
      completely redundant, because that function is called only once
      in the begging of mlx5_core_dev lifetime.
      
      PCI:
       .probe()
        -> probe_one()
         -> mlx5_init_one()
      
      The recovery flow can't run at that time or before it, because relevant
      work initialized later in mlx5_init_once().
      
      Such initialization flow ensures that dev->state can't be
      MLX5_DEVICE_STATE_UNINITIALIZED at all, so remove such impossible
      checks.
      Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      8e792700
    • C
      net/mlx5: Fix typo in comments · 39c538d6
      Cai Huoqing 提交于
      Fix typo:
      *vectores  ==> vectors
      *realeased  ==> released
      *erros  ==> errors
      *namepsace  ==> namespace
      *trafic  ==> traffic
      *proccessed  ==> processed
      *retore  ==> restore
      *Currenlty  ==> Currently
      *crated  ==> created
      *chane  ==> change
      *cannnot  ==> cannot
      *usuallly  ==> usually
      *failes  ==> fails
      *importent  ==> important
      *reenabled  ==> re-enabled
      *alocation  ==> allocation
      *recived  ==> received
      *tanslation  ==> translation
      Signed-off-by: NCai Huoqing <caihuoqing@baidu.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      39c538d6