1. 05 4月, 2018 4 次提交
    • M
      IB/uverbs: Add modify ESP flow_action · 7d12f8d5
      Matan Barak 提交于
      flow_actions of ESP type could be modified during runtime. This could be
      common for example when ESN should be changed. Adding a new
      UVERBS_FLOW_ACTION_ESP_MODIFY method for changing ESP parameters of an
      existing ESP flow_action.
      The new method uses the UVERBS_FLOW_ACTION_ESP_CREATE attributes, but
      adds a new IB_FLOW_ACTION_ESP_FLAGS_MOD_ESP_ATTRS which means ESP_ATTRS
      should be changed.
      In addition, we add a new FLOW_ACTION_ESP_REPLAY_NONE replay type that
      could be used when one wants to disable a replay protection over a
      specific flow_action.
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      7d12f8d5
    • B
      IB/uverbs: Introduce egress flow steering · 21e82d3e
      Boris Pismenny 提交于
      The egress flag indicates that this flow steering rule is for egress
      traffic. The scope of an egress rule is port-wide, meaning all packets
      originated from that port, which match the steering rule specification
      will be effected by this steering rule's action.
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
      Reviewed-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      21e82d3e
    • M
      IB/uverbs: Add action_handle flow steering specification · 9b828441
      Matan Barak 提交于
      Binding a flow_action to flow steering rule requires using a new
      specification. Therefore, adding such an IB_FLOW_SPEC_ACTION_HANDLE flow
      specification.
      
      Flow steering rules could use flow_action(s) and as of that we need to
      avoid deleting flow_action(s) as long as they're being used.
      Moreover, when the attached rules are deleted, action_handle reference
      count should be decremented. Introducing a new mechanism of flow
      resources to keep track on the attached action_handle(s). Later on, this
      mechanism should be extended to other attached flow steering resources
      like flow counters.
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      9b828441
    • M
      IB/uverbs: Add flow_action create and destroy verbs · 2eb9beae
      Matan Barak 提交于
      A verbs application may receive and transmits packets using a data
      path pipeline. Sometimes, the first stage in the receive pipeline or
      the last stage in the transmit pipeline involves transforming a
      packet, either in order to make it easier for later stages to process
      it or to prepare it for transmission over the wire. Such transformation
      could be stripping/encapsulating the packet (i.e. vxlan),
      decrypting/encrypting it (i.e. ipsec), altering headers, doing some
      complex FPGA changes, etc.
      
      Some hardware could do such transformations without software data path
      intervention at all. The flow steering API supports steering a
      packet (either to a QP or dropping it) and some simple packet
      immutable actions (i.e. tagging a packet). Complex actions, that may
      change the packet, could bloat the flow steering API extensively.
      Sometimes the same action should be applied to several flows.
      In this case, it's easier to bind several flows to the same action and
      modify it than change all matching flows.
      
      Introducing a new flow_action object that abstracts any packet
      transformation (out of a standard and well defined set of actions).
      This flow_action object could be tied to a flow steering rule via a
      new specification.
      
      Currently, we support esp flow_action, which encrypts or decrypts a
      packet according to the given parameters. However, we present a
      flexible schema that could be used to other transformation actions tied
      to flow rules.
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      2eb9beae
  2. 04 4月, 2018 3 次提交
    • P
      RDMA: Use ib_gid_attr during GID modification · 414448d2
      Parav Pandit 提交于
      Now that ib_gid_attr contains device, port and index, simplify the
      provider APIs add_gid() and del_gid() to use device, port and index
      fields from the ib_gid_attr attributes structure.
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      414448d2
    • P
      IB/core: Refactor GID modify code for RoCE · 598ff6ba
      Parav Pandit 提交于
      Code is refactored to prepare separate functions for RoCE which can do more
      complex operations related to reference counting, while still
      maintainining code readability. This includes
      (a) Simplification to not perform netdevice checks and modifications
      for IB link layer.
      (b) Do not add RoCE GID entry which has NULL netdevice; instead return
      an error.
      (c) If GID addition fails at provider level add_gid(), do not add the
      entry in the cache and keep the entry marked as INVALID.
      (d) Simplify and reuse the ib_cache_gid_add()/del() routines so that they
      can be used even for modifying default GIDs. This avoid some code
      duplication in modifying default GIDs.
      (e) find_gid() routine refers to the data entry flags to qualify a GID
      as valid or invalid GID rather than depending on attributes and zeroness
      of the GID content.
      (f) gid_table_reserve_default() sets the GID default attribute at
      beginning while setting up the GID table. There is no need to use
      default_gid flag in low level functions such as write_gid(), add_gid(),
      del_gid(), as they never need to update the DEFAULT property of the GID
      entry while during GID table update.
      
      As as result of this refactor, reserved GID 0:0:0:0:0:0:0:0 is no longer
      searchable as described below.
      
      A unicast GID entry of 0:0:0:0:0:0:0:0 is Reserved GID as per the IB
      spec version 1.3 section 4.1.1, point (6) whose snippet is below.
      
      "The unicast GID address 0:0:0:0:0:0:0:0 is reserved - referred to as
      the Reserved GID. It shall never be assigned to any endport. It shall
      not be used as a destination address or in a global routing header
      (GRH)."
      
      GID table cache now only stores valid GID entries. Before this patch,
      Reserved GID 0:0:0:0:0:0:0:0 was searchable in the GID table using
      ib_find_cached_gid_by_port() and other similar find routines.
      
      Zero GID is no longer searchable as it shall not to be present in GRH or
      path recored entry as described in IB spec version 1.3 section 4.1.1,
      point (6), section 12.7.10 and section 12.7.20.
      
      ib_cache_update() is simplified to check link layer once, use unified
      locking scheme for all link layers, removed temporary gid table
      allocation/free logic.
      
      Additionally,
      (a) Expand ib_gid_attr to store port and index so that GID query
      routines can get port and index information from the attribute structure.
      (b) Expand ib_gid_attr to store device as well so that in future code when
      GID reference counting is done, device is used to reach back to the GID
      table entry.
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      598ff6ba
    • P
      RDMA/core: Update query_gid documentation for HCA drivers · 72e1ff0f
      Parav Pandit 提交于
      query_gid() should return right GID value for iWarp and IB link layers.
      It is a no-op for RoCE link layer.  Update the documentation to reflect
      this.
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      72e1ff0f
  3. 28 3月, 2018 1 次提交
  4. 20 3月, 2018 3 次提交
  5. 16 3月, 2018 1 次提交
  6. 15 3月, 2018 1 次提交
  7. 09 3月, 2018 1 次提交
  8. 02 2月, 2018 1 次提交
  9. 30 1月, 2018 4 次提交
  10. 16 1月, 2018 2 次提交
    • P
      RDMA/core: Clarify rdma_ah_find_type · a6532e71
      Parav Pandit 提交于
      iWARP does not use rdma_ah_attr_type, and for this reason we do not have a
      RDMA_AH_ATTR_TYPE_IWARP. rdma_ah_find_type should not even be called on iwarp
      ports and for clarity it shouldn't have a special test for iWarp.
      
      This changes the result from RDMA_AH_ATTR_TYPE_ROCE to RDMA_AH_ATTR_TYPE_IB
      when wrongly called on an iWarp port.
      
      Fixes: 44c58487 ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      a6532e71
    • B
      IB/core: Fix ib_wc structure size to remain in 64 bytes boundary · cd2a6e7d
      Bodong Wang 提交于
      The change of slid from u16 to u32 results in sizeof(struct ib_wc)
      cross 64B boundary, which causes more cache misses. This patch
      rearranges the fields and remain the size to 64B.
      
      Pahole output before this change:
      
      struct ib_wc {
              union {
                      u64                wr_id;                /*           8 */
                      struct ib_cqe *    wr_cqe;               /*           8 */
              };                                               /*     0     8 */
              enum ib_wc_status          status;               /*     8     4 */
              enum ib_wc_opcode          opcode;               /*    12     4 */
              u32                        vendor_err;           /*    16     4 */
              u32                        byte_len;             /*    20     4 */
              struct ib_qp *             qp;                   /*    24     8 */
              union {
                      __be32             imm_data;             /*           4 */
                      u32                invalidate_rkey;      /*           4 */
              } ex;                                            /*    32     4 */
              u32                        src_qp;               /*    36     4 */
              int                        wc_flags;             /*    40     4 */
              u16                        pkey_index;           /*    44     2 */
      
              /* XXX 2 bytes hole, try to pack */
      
              u32                        slid;                 /*    48     4 */
              u8                         sl;                   /*    52     1 */
              u8                         dlid_path_bits;       /*    53     1 */
              u8                         port_num;             /*    54     1 */
              u8                         smac[6];              /*    55     6 */
      
              /* XXX 1 byte hole, try to pack */
      
              u16                        vlan_id;              /*    62     2 */
              /* --- cacheline 1 boundary (64 bytes) --- */
              u8                         network_hdr_type;     /*    64     1 */
      
              /* size: 72, cachelines: 2, members: 17 */
              /* sum members: 62, holes: 2, sum holes: 3 */
              /* padding: 7 */
              /* last cacheline: 8 bytes */
      };
      
      Pahole output after this change:
      
      struct ib_wc {
              union {
                      u64                wr_id;                /*           8 */
                      struct ib_cqe *    wr_cqe;               /*           8 */
              };                                               /*     0     8 */
              enum ib_wc_status          status;               /*     8     4 */
              enum ib_wc_opcode          opcode;               /*    12     4 */
              u32                        vendor_err;           /*    16     4 */
              u32                        byte_len;             /*    20     4 */
              struct ib_qp *             qp;                   /*    24     8 */
              union {
                      __be32             imm_data;             /*           4 */
                      u32                invalidate_rkey;      /*           4 */
              } ex;                                            /*    32     4 */
              u32                        src_qp;               /*    36     4 */
              u32                        slid;                 /*    40     4 */
              int                        wc_flags;             /*    44     4 */
              u16                        pkey_index;           /*    48     2 */
              u8                         sl;                   /*    50     1 */
              u8                         dlid_path_bits;       /*    51     1 */
              u8                         port_num;             /*    52     1 */
              u8                         smac[6];              /*    53     6 */
      
              /* XXX 1 byte hole, try to pack */
      
              u16                        vlan_id;              /*    60     2 */
              u8                         network_hdr_type;     /*    62     1 */
      
              /* size: 64, cachelines: 1, members: 17 */
              /* sum members: 62, holes: 1, sum holes: 1 */
              /* padding: 1 */
      };
      
      Cc: <stable@vger.kernel.org> # v4.13
      Fixes: 7db20ecd ("IB/core: Change wc.slid from 16 to 32 bits")
      Signed-off-by: NBodong Wang <bodong@mellanox.com>
      Reviewed-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      cd2a6e7d
  11. 09 1月, 2018 2 次提交
    • D
      {net, IB}/mlx5: Manage port association for multiport RoCE · 32f69e4b
      Daniel Jurgens 提交于
      When mlx5_ib_add is called determine if the mlx5 core device being
      added is capable of dual port RoCE operation. If it is, determine
      whether it is a master device or a slave device using the
      num_vhca_ports and affiliate_nic_vport_criteria capabilities.
      
      If the device is a slave, attempt to find a master device to affiliate it
      with. Devices that can be affiliated will share a system image guid. If
      none are found place it on a list of unaffiliated ports. If a master is
      found bind the port to it by configuring the port affiliation in the NIC
      vport context.
      
      Similarly when mlx5_ib_remove is called determine the port type. If it's
      a slave port, unaffiliate it from the master device, otherwise just
      remove it from the unaffiliated port list.
      
      The IB device is registered as a multiport device, even if a 2nd port is
      not available for affiliation. When the 2nd port is affiliated later the
      GID cache must be refreshed in order to get the default GIDs for the 2nd
      port in the cache. Export roce_rescan_device to provide a mechanism to
      refresh the cache after a new port is bound.
      
      In a multiport configuration all IB object (QP, MR, PD, etc) related
      commands should flow through the master mlx5_core_dev, other commands
      must be sent to the slave port mlx5_core_mdev, an interface is provide
      to get the correct mdev for non IB object commands.
      Signed-off-by: NDaniel Jurgens <danielj@mellanox.com>
      Reviewed-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      32f69e4b
    • M
      IB/core: Introduce driver QP type · 8011c1e3
      Moni Shoua 提交于
      Vendors can implement type of QPs that are not described in the
      InfiniBand specification. To still be able to use the IB/core layer
      services (e.g. user object management) without tainting this layer with
      driver proprietary logic, a new QP type is added - IB_QPT_DRIVER. This
      will be a general QP type that the core layer doesn't know about its true nature.
      When a command like create_qp() is passed to a hardware driver the extra
      data that is required is taken from the driver channel.
      Downstream patches from this series will use that QP type in the mlx5
      driver.
      Signed-off-by: NMoni Shoua <monis@mellanox.com>
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      8011c1e3
  12. 19 12月, 2017 2 次提交
  13. 14 11月, 2017 3 次提交
  14. 11 11月, 2017 1 次提交
    • N
      IB/core: Add PCI write end padding flags for WQ and QP · e1d2e887
      Noa Osherovich 提交于
      There are root complexes that are able to optimize their
      performance when incoming data is multiple full cache lines.
      
      PCI write end padding is the device's ability to pad the ending of
      incoming packets (scatter) to full cache line such that the last
      upstream write generated by an incoming packet will be a full cache
      line.
      
      Add a relevant entry to ib_device_cap_flags to report such capability
      of an RDMA device.
      
      Add the QP and WQ create flags:
       * A QP/WQ created with a scatter end padding flag will cause
         HW to pad the last upstream write generated by a packet to cache line.
      
      User should consider several factors before activating this feature:
      - In case of high CPU memory load (which may cause PCI back pressure in
        turn), if a large percent of the writes are partial cache line, this
        feature should be checked as an optional solution.
      - This feature might reduce performance if most packets are between one
        and two cache lines and PCIe throughput has reached its maximum
        capacity. E.g. 65B packet from the network port will lead to 128B
        write on PCIe, which may cause traffic on PCIe to reach high
        throughput.
      Signed-off-by: NNoa Osherovich <noaos@mellanox.com>
      Reviewed-by: NMajd Dibbiny <majd@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      e1d2e887
  15. 19 10月, 2017 2 次提交
  16. 25 9月, 2017 2 次提交
  17. 09 9月, 2017 1 次提交
  18. 31 8月, 2017 1 次提交
    • M
      IB/core: Add new ioctl interface · fac9658c
      Matan Barak 提交于
      In this ioctl interface, processing the command starts from
      properties of the command and fetching the appropriate user objects
      before calling the handler.
      
      Parsing and validation is done according to a specifier declared by
      the driver's code. In the driver, all supported objects are declared.
      These objects are separated to different object namepsaces. Dividing
      objects to namespaces is done at initialization by using the higher
      bits of the object ids. This initialization can mix objects declared
      in different places to one parsing tree using in this ioctl interface.
      
      For each object we list all supported methods. Similarly to objects,
      methods are separated to method namespaces too. Namespacing is done
      similarly to the objects case. This could be used in order to add
      methods to an existing object.
      
      Each method has a specific handler, which could be either a default
      handler or a driver specific handler.
      Along with the handler, a bunch of attributes are specified as well.
      Similarly to objects and method, attributes are namespaced and hashed
      by their ids at initialization too. All supported attributes are
      subject to automatic fetching and validation. These attributes include
      the command, response and the method's related objects' ids.
      
      When these entities (objects, methods and attributes) are used, the
      high bits of the entities ids are used in order to calculate the hash
      bucket index. Then, these high bits are masked out in order to have a
      zero based index. Since we use these high bits for both bucketing and
      namespacing, we get a compact representation and O(1) array access.
      This is mandatory for efficient dispatching.
      
      Each attribute has a type (PTR_IN, PTR_OUT, IDR and FD) and a length.
      Attributes could be validated through some attributes, like:
      (*) Minimum size / Exact size
      (*) Fops for FD
      (*) Object type for IDR
      
      If an IDR/fd attribute is specified, the kernel also states the object
      type and the required access (NEW, WRITE, READ or DESTROY).
      All uobject/fd management is done automatically by the infrastructure,
      meaning - the infrastructure will fail concurrent commands that at
      least one of them requires concurrent access (WRITE/DESTROY),
      synchronize actions with device removals (dissociate context events)
      and take care of reference counting (increase/decrease) for concurrent
      actions invocation. The reference counts on the actual kernel objects
      shall be handled by the handlers.
      
       objects
      +--------+
      |        |
      |        |   methods                                                                +--------+
      |        |   ns         method      method_spec                           +-----+   |len     |
      +--------+  +------+[d]+-------+   +----------------+[d]+------------+    |attr1+-> |type    |
      | object +> |method+-> | spec  +-> +  attr_buckets  +-> |default_chain+--> +-----+   |idr_type|
      +--------+  +------+   |handler|   |                |   +------------+    |attr2|   |access  |
      |        |  |      |   +-------+   +----------------+   |driver chain|    +-----+   +--------+
      |        |  |      |                                    +------------+
      |        |  +------+
      |        |
      |        |
      |        |
      |        |
      |        |
      |        |
      |        |
      |        |
      |        |
      |        |
      +--------+
      
      [d] = Hash ids to groups using the high order bits
      
      The right types table is also chosen by using the high bits from
      the ids. Currently we have either default or driver specific groups.
      
      Once validation and object fetching (or creation) completed, we call
      the handler:
      int (*handler)(struct ib_device *ib_dev, struct ib_uverbs_file *ufile,
                     struct uverbs_attr_bundle *ctx);
      
      ctx bundles attributes of different namespaces. Each element there
      is an array of attributes which corresponds to one namespaces of
      attributes. For example, in the usually used case:
      
       ctx                               core
      +----------------------------+     +------------+
      | core:                      +---> | valid      |
      +----------------------------+     | cmd_attr   |
      | driver:                    |     +------------+
      |----------------------------+--+  | valid      |
                                      |  | cmd_attr   |
                                      |  +------------+
                                      |  | valid      |
                                      |  | obj_attr   |
                                      |  +------------+
                                      |
                                      |  drivers
                                      |  +------------+
                                      +> | valid      |
                                         | cmd_attr   |
                                         +------------+
                                         | valid      |
                                         | cmd_attr   |
                                         +------------+
                                         | valid      |
                                         | obj_attr   |
                                         +------------+
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      fac9658c
  19. 29 8月, 2017 3 次提交
  20. 25 8月, 2017 2 次提交