1. 19 9月, 2020 1 次提交
    • S
      devlink: add timeout information to status_notify · f92970c6
      Shannon Nelson 提交于
      Add a timeout element to the DEVLINK_CMD_FLASH_UPDATE_STATUS
      netlink message for use by a userland utility to show that
      a particular firmware flash activity may take a long but
      bounded time to finish.  Also add a handy helper for drivers
      to make use of the new timeout value.
      
      UI usage hints:
       - if non-zero, add timeout display to the end of the status line
       	[component] status_msg  ( Xm Ys : Am Bs )
           using the timeout value for Am Bs and updating the Xm Ys
           every second
       - if the timeout expires while awaiting the next update,
         display something like
       	[component] status_msg  ( timeout reached : Am Bs )
       - if new status notify messages are received, remove
         the timeout and start over
      Signed-off-by: NShannon Nelson <snelson@pensando.io>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Reviewed-by: NJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f92970c6
  2. 16 9月, 2020 1 次提交
  3. 10 9月, 2020 4 次提交
    • P
      devlink: Introduce controller number · 3a2d9588
      Parav Pandit 提交于
      A devlink port may be for a controller consist of PCI device.
      A devlink instance holds ports of two types of controllers.
      (1) controller discovered on same system where eswitch resides
      This is the case where PCI PF/VF of a controller and devlink eswitch
      instance both are located on a single system.
      (2) controller located on external host system.
      This is the case where a controller is located in one system and its
      devlink eswitch ports are located in a different system.
      
      When a devlink eswitch instance serves the devlink ports of both
      controllers together, PCI PF/VF numbers may overlap.
      Due to this a unique phys_port_name cannot be constructed.
      
      For example in below such system controller-0 and controller-1, each has
      PCI PF pf0 whose eswitch ports can be present in controller-0.
      These results in phys_port_name as "pf0" for both.
      Similar problem exists for VFs and upcoming Sub functions.
      
      An example view of two controller systems:
      
                   ---------------------------------------------------------
                   |                                                       |
                   |           --------- ---------         ------- ------- |
      -----------  |           | vf(s) | | sf(s) |         |vf(s)| |sf(s)| |
      | server  |  | -------   ----/---- ---/----- ------- ---/--- ---/--- |
      | pci rc  |=== | pf0 |______/________/       | pf1 |___/_______/     |
      | connect |  | -------                       -------                 |
      -----------  |     | controller_num=1 (no eswitch)                   |
                   ------|--------------------------------------------------
                   (internal wire)
                         |
                   ---------------------------------------------------------
                   | devlink eswitch ports and reps                        |
                   | ----------------------------------------------------- |
                   | |ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 | ctrl-0 |ctrl-0 | |
                   | |pf0    | pf0vfN | pf0sfN | pf1    | pf1vfN |pf1sfN | |
                   | ----------------------------------------------------- |
                   | |ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 | ctrl-1 |ctrl-1 | |
                   | |pf1    | pf1vfN | pf1sfN | pf1    | pf1vfN |pf0sfN | |
                   | ----------------------------------------------------- |
                   |                                                       |
                   |                                                       |
                   |           --------- ---------         ------- ------- |
                   |           | vf(s) | | sf(s) |         |vf(s)| |sf(s)| |
                   | -------   ----/---- ---/----- ------- ---/--- ---/--- |
                   | | pf0 |______/________/       | pf1 |___/_______/     |
                   | -------                       -------                 |
                   |                                                       |
                   |  local controller_num=0 (eswitch)                     |
                   ---------------------------------------------------------
      
      An example devlink port for external controller with controller
      number = 1 for a VF 1 of PF 0:
      
      $ devlink port show pci/0000:06:00.0/2
      pci/0000:06:00.0/2: type eth netdev ens2f0pf0vf1 flavour pcivf controller 1 pfnum 0 vfnum 1 external true splittable false
        function:
          hw_addr 00:00:00:00:00:00
      
      $ devlink port show pci/0000:06:00.0/2 -jp
      {
          "port": {
              "pci/0000:06:00.0/2": {
                  "type": "eth",
                  "netdev": "ens2f0pf0vf1",
                  "flavour": "pcivf",
                  "controller": 1,
                  "pfnum": 0,
                  "vfnum": 1,
                  "external": true,
                  "splittable": false,
                  "function": {
                      "hw_addr": "00:00:00:00:00:00"
                  }
              }
          }
      }
      Signed-off-by: NParav Pandit <parav@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a2d9588
    • P
      devlink: Introduce external controller flag · 05b595e9
      Parav Pandit 提交于
      A devlink eswitch port may represent PCI PF/VF ports of a controller.
      
      A controller either located on same system or it can be an external
      controller located in host where such NIC is plugged in.
      
      Add the ability for driver to specify if a port is for external
      controller.
      
      Use such flag in the mlx5_core driver.
      
      An example of an external controller having VF1 of PF0 belong to
      controller 1.
      
      $ devlink port show pci/0000:06:00.0/2
      pci/0000:06:00.0/2: type eth netdev ens2f0pf0vf1 flavour pcivf pfnum 0 vfnum 1 external true splittable false
        function:
          hw_addr 00:00:00:00:00:00
      $ devlink port show pci/0000:06:00.0/2 -jp
      {
          "port": {
              "pci/0000:06:00.0/2": {
                  "type": "eth",
                  "netdev": "ens2f0pf0vf1",
                  "flavour": "pcivf",
                  "pfnum": 0,
                  "vfnum": 1,
                  "external": true,
                  "splittable": false,
                  "function": {
                      "hw_addr": "00:00:00:00:00:00"
                  }
              }
          }
      }
      Signed-off-by: NParav Pandit <parav@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05b595e9
    • P
      devlink: Move structure comments outside of structure · ff03e63a
      Parav Pandit 提交于
      To add more fields to the PCI PF and VF port attributes, follow standard
      structure comment format.
      Signed-off-by: NParav Pandit <parav@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff03e63a
    • P
      devlink: Add comment block for missing port attributes · 2efbe6ae
      Parav Pandit 提交于
      Add comment block for physical, PF and VF port attributes.
      Signed-off-by: NParav Pandit <parav@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Reviewed-by: NRoi Dayan <roid@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2efbe6ae
  4. 04 8月, 2020 2 次提交
  5. 30 7月, 2020 1 次提交
  6. 22 7月, 2020 1 次提交
  7. 11 7月, 2020 2 次提交
  8. 10 7月, 2020 5 次提交
  9. 23 6月, 2020 3 次提交
  10. 02 6月, 2020 4 次提交
  11. 31 3月, 2020 4 次提交
    • I
      devlink: Allow setting of packet trap group parameters · c064875a
      Ido Schimmel 提交于
      The previous patch allowed device drivers to publish their default
      binding between packet trap policers and packet trap groups. However,
      some users might not be content with this binding and would like to
      change it.
      
      In case user space passed a packet trap policer identifier when setting
      a packet trap group, invoke the appropriate device driver callback and
      pass the new policer identifier.
      
      v2:
      * Check for presence of 'DEVLINK_ATTR_TRAP_POLICER_ID' in
        devlink_trap_group_set() and bail if not present
      * Add extack error message in case trap group was partially modified
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c064875a
    • I
      devlink: Add packet trap group parameters support · f9f54392
      Ido Schimmel 提交于
      Packet trap groups are used to aggregate logically related packet traps.
      Currently, these groups allow user space to batch operations such as
      setting the trap action of all member traps.
      
      In order to prevent the CPU from being overwhelmed by too many trapped
      packets, it is desirable to bind a packet trap policer to these groups.
      For example, to limit all the packets that encountered an exception
      during routing to 10Kpps.
      
      Allow device drivers to bind default packet trap policers to packet trap
      groups when the latter are registered with devlink.
      
      The next patch will enable user space to change this default binding.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9f54392
    • I
      devlink: Add packet trap policers support · 1e8c6619
      Ido Schimmel 提交于
      Devices capable of offloading the kernel's datapath and perform
      functions such as bridging and routing must also be able to send (trap)
      specific packets to the kernel (i.e., the CPU) for processing.
      
      For example, a device acting as a multicast-aware bridge must be able to
      trap IGMP membership reports to the kernel for processing by the bridge
      module.
      
      In most cases, the underlying device is capable of handling packet rates
      that are several orders of magnitude higher compared to those that can
      be handled by the CPU.
      
      Therefore, in order to prevent the underlying device from overwhelming
      the CPU, devices usually include packet trap policers that are able to
      police the trapped packets to rates that can be handled by the CPU.
      
      This patch allows capable device drivers to register their supported
      packet trap policers with devlink. User space can then tune the
      parameters of these policer (currently, rate and burst size) and read
      from the device the number of packets that were dropped by the policer,
      if supported.
      
      Subsequent patches in the series will allow device drivers to create
      default binding between these policers and packet trap groups and allow
      user space to change the binding.
      
      v2:
      * Add 'strict_start_type' in devlink policy
      * Have device drivers provide max/min rate/burst size for each policer.
        Use them to check validity of user provided parameters
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e8c6619
    • E
      devlink: Implicitly set auto recover flag when registering health reporter · ba7d16c7
      Eran Ben Elisha 提交于
      When health reporter is registered to devlink, devlink will implicitly set
      auto recover if and only if the reporter has a recover method. No reason
      to explicitly get the auto recover flag from the driver.
      
      Remove this flag from all drivers that called
      devlink_health_reporter_create.
      
      All existing health reporters set auto recovery to true if they have a
      recover method.
      
      Yet, administrator can unset auto recover via netlink command as prior to
      this patch.
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba7d16c7
  12. 28 3月, 2020 1 次提交
  13. 27 3月, 2020 5 次提交
    • J
      devlink: implement DEVLINK_CMD_REGION_NEW · b9a17abf
      Jacob Keller 提交于
      Implement support for the DEVLINK_CMD_REGION_NEW command for creating
      snapshots. This new command parallels the existing
      DEVLINK_CMD_REGION_DEL.
      
      In order for DEVLINK_CMD_REGION_NEW to work for a region, the new
      ".snapshot" operation must be implemented in the region's ops structure.
      
      The desired snapshot id must be provided. This helps avoid confusion on
      the purpose of DEVLINK_CMD_REGION_NEW, and keeps the API simpler.
      
      The requested id will be inserted into the xarray tracking the number of
      snapshots using each id. If this id is already used by another snapshot
      on any region, an error will be returned.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b9a17abf
    • J
      devlink: track snapshot id usage count using an xarray · 12102436
      Jacob Keller 提交于
      Each snapshot created for a devlink region must have an id. These ids
      are supposed to be unique per "event" that caused the snapshot to be
      created. Drivers call devlink_region_snapshot_id_get to obtain a new id
      to use for a new event trigger. The id values are tracked per devlink,
      so that the same id number can be used if a triggering event creates
      multiple snapshots on different regions.
      
      There is no mechanism for snapshot ids to ever be reused. Introduce an
      xarray to store the count of how many snapshots are using a given id,
      replacing the snapshot_id field previously used for picking the next id.
      
      The devlink_region_snapshot_id_get() function will use xa_alloc to
      insert an initial value of 1 value at an available slot between 0 and
      U32_MAX.
      
      The new __devlink_snapshot_id_increment() and
      __devlink_snapshot_id_decrement() functions will be used to track how
      many snapshots currently use an id.
      
      Drivers must now call devlink_snapshot_id_put() in order to release
      their reference of the snapshot id after adding region snapshots.
      
      By tracking the total number of snapshots using a given id, it is
      possible for the decrement() function to erase the id from the xarray
      when it is not in use.
      
      With this method, a snapshot id can become reused again once all
      snapshots that referred to it have been deleted via
      DEVLINK_CMD_REGION_DEL, and the driver has finished adding snapshots.
      
      This work also paves the way to introduce a mechanism for userspace to
      request a snapshot.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12102436
    • J
      devlink: report error once U32_MAX snapshot ids have been used · 7ef19d3b
      Jacob Keller 提交于
      The devlink_snapshot_id_get() function returns a snapshot id. The
      snapshot id is a u32, so there is no way to indicate an error code.
      
      A future change is going to possibly add additional cases where this
      function could fail. Refactor the function to return the snapshot id in
      an argument, so that it can return zero or an error value.
      
      This ensures that snapshot ids cannot be confused with error values, and
      aids in the future refactor of snapshot id allocation management.
      
      Because there is no current way to release previously used snapshot ids,
      add a simple check ensuring that an error is reported in case the
      snapshot_id would over flow.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ef19d3b
    • J
      devlink: convert snapshot destructor callback to region op · a0a09f6b
      Jacob Keller 提交于
      It does not makes sense that two snapshots for a given region would use
      different destructors. Simplify snapshot creation by adding
      a .destructor op for regions.
      
      This operation will replace the data_destructor for the snapshot
      creation, and makes snapshot creation easier.
      Noticed-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a0a09f6b
    • J
      devlink: prepare to support region operations · e8937681
      Jacob Keller 提交于
      Modify the devlink region code in preparation for adding new operations
      on regions.
      
      Create a devlink_region_ops structure, and move the name pointer from
      within the devlink_region structure into the ops structure (similar to
      the devlink_health_reporter_ops).
      
      This prepares the regions to enable support of additional operations in
      the future such as requesting snapshots, or accessing the region
      directly without a snapshot.
      
      In order to re-use the constant strings in the mlx4 driver their
      declaration must be changed to 'const char * const' to ensure the
      compiler realizes that both the data and the pointer cannot change.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e8937681
  14. 24 3月, 2020 2 次提交
  15. 21 3月, 2020 1 次提交
  16. 26 2月, 2020 2 次提交
  17. 25 2月, 2020 1 次提交