1. 20 3月, 2018 2 次提交
  2. 16 3月, 2018 1 次提交
  3. 15 3月, 2018 1 次提交
  4. 09 3月, 2018 1 次提交
  5. 02 2月, 2018 1 次提交
  6. 30 1月, 2018 4 次提交
  7. 16 1月, 2018 2 次提交
    • P
      RDMA/core: Clarify rdma_ah_find_type · a6532e71
      Parav Pandit 提交于
      iWARP does not use rdma_ah_attr_type, and for this reason we do not have a
      RDMA_AH_ATTR_TYPE_IWARP. rdma_ah_find_type should not even be called on iwarp
      ports and for clarity it shouldn't have a special test for iWarp.
      
      This changes the result from RDMA_AH_ATTR_TYPE_ROCE to RDMA_AH_ATTR_TYPE_IB
      when wrongly called on an iWarp port.
      
      Fixes: 44c58487 ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      a6532e71
    • B
      IB/core: Fix ib_wc structure size to remain in 64 bytes boundary · cd2a6e7d
      Bodong Wang 提交于
      The change of slid from u16 to u32 results in sizeof(struct ib_wc)
      cross 64B boundary, which causes more cache misses. This patch
      rearranges the fields and remain the size to 64B.
      
      Pahole output before this change:
      
      struct ib_wc {
              union {
                      u64                wr_id;                /*           8 */
                      struct ib_cqe *    wr_cqe;               /*           8 */
              };                                               /*     0     8 */
              enum ib_wc_status          status;               /*     8     4 */
              enum ib_wc_opcode          opcode;               /*    12     4 */
              u32                        vendor_err;           /*    16     4 */
              u32                        byte_len;             /*    20     4 */
              struct ib_qp *             qp;                   /*    24     8 */
              union {
                      __be32             imm_data;             /*           4 */
                      u32                invalidate_rkey;      /*           4 */
              } ex;                                            /*    32     4 */
              u32                        src_qp;               /*    36     4 */
              int                        wc_flags;             /*    40     4 */
              u16                        pkey_index;           /*    44     2 */
      
              /* XXX 2 bytes hole, try to pack */
      
              u32                        slid;                 /*    48     4 */
              u8                         sl;                   /*    52     1 */
              u8                         dlid_path_bits;       /*    53     1 */
              u8                         port_num;             /*    54     1 */
              u8                         smac[6];              /*    55     6 */
      
              /* XXX 1 byte hole, try to pack */
      
              u16                        vlan_id;              /*    62     2 */
              /* --- cacheline 1 boundary (64 bytes) --- */
              u8                         network_hdr_type;     /*    64     1 */
      
              /* size: 72, cachelines: 2, members: 17 */
              /* sum members: 62, holes: 2, sum holes: 3 */
              /* padding: 7 */
              /* last cacheline: 8 bytes */
      };
      
      Pahole output after this change:
      
      struct ib_wc {
              union {
                      u64                wr_id;                /*           8 */
                      struct ib_cqe *    wr_cqe;               /*           8 */
              };                                               /*     0     8 */
              enum ib_wc_status          status;               /*     8     4 */
              enum ib_wc_opcode          opcode;               /*    12     4 */
              u32                        vendor_err;           /*    16     4 */
              u32                        byte_len;             /*    20     4 */
              struct ib_qp *             qp;                   /*    24     8 */
              union {
                      __be32             imm_data;             /*           4 */
                      u32                invalidate_rkey;      /*           4 */
              } ex;                                            /*    32     4 */
              u32                        src_qp;               /*    36     4 */
              u32                        slid;                 /*    40     4 */
              int                        wc_flags;             /*    44     4 */
              u16                        pkey_index;           /*    48     2 */
              u8                         sl;                   /*    50     1 */
              u8                         dlid_path_bits;       /*    51     1 */
              u8                         port_num;             /*    52     1 */
              u8                         smac[6];              /*    53     6 */
      
              /* XXX 1 byte hole, try to pack */
      
              u16                        vlan_id;              /*    60     2 */
              u8                         network_hdr_type;     /*    62     1 */
      
              /* size: 64, cachelines: 1, members: 17 */
              /* sum members: 62, holes: 1, sum holes: 1 */
              /* padding: 1 */
      };
      
      Cc: <stable@vger.kernel.org> # v4.13
      Fixes: 7db20ecd ("IB/core: Change wc.slid from 16 to 32 bits")
      Signed-off-by: NBodong Wang <bodong@mellanox.com>
      Reviewed-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      cd2a6e7d
  8. 09 1月, 2018 2 次提交
    • D
      {net, IB}/mlx5: Manage port association for multiport RoCE · 32f69e4b
      Daniel Jurgens 提交于
      When mlx5_ib_add is called determine if the mlx5 core device being
      added is capable of dual port RoCE operation. If it is, determine
      whether it is a master device or a slave device using the
      num_vhca_ports and affiliate_nic_vport_criteria capabilities.
      
      If the device is a slave, attempt to find a master device to affiliate it
      with. Devices that can be affiliated will share a system image guid. If
      none are found place it on a list of unaffiliated ports. If a master is
      found bind the port to it by configuring the port affiliation in the NIC
      vport context.
      
      Similarly when mlx5_ib_remove is called determine the port type. If it's
      a slave port, unaffiliate it from the master device, otherwise just
      remove it from the unaffiliated port list.
      
      The IB device is registered as a multiport device, even if a 2nd port is
      not available for affiliation. When the 2nd port is affiliated later the
      GID cache must be refreshed in order to get the default GIDs for the 2nd
      port in the cache. Export roce_rescan_device to provide a mechanism to
      refresh the cache after a new port is bound.
      
      In a multiport configuration all IB object (QP, MR, PD, etc) related
      commands should flow through the master mlx5_core_dev, other commands
      must be sent to the slave port mlx5_core_mdev, an interface is provide
      to get the correct mdev for non IB object commands.
      Signed-off-by: NDaniel Jurgens <danielj@mellanox.com>
      Reviewed-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      32f69e4b
    • M
      IB/core: Introduce driver QP type · 8011c1e3
      Moni Shoua 提交于
      Vendors can implement type of QPs that are not described in the
      InfiniBand specification. To still be able to use the IB/core layer
      services (e.g. user object management) without tainting this layer with
      driver proprietary logic, a new QP type is added - IB_QPT_DRIVER. This
      will be a general QP type that the core layer doesn't know about its true nature.
      When a command like create_qp() is passed to a hardware driver the extra
      data that is required is taken from the driver channel.
      Downstream patches from this series will use that QP type in the mlx5
      driver.
      Signed-off-by: NMoni Shoua <monis@mellanox.com>
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      8011c1e3
  9. 19 12月, 2017 2 次提交
  10. 14 11月, 2017 3 次提交
  11. 11 11月, 2017 1 次提交
    • N
      IB/core: Add PCI write end padding flags for WQ and QP · e1d2e887
      Noa Osherovich 提交于
      There are root complexes that are able to optimize their
      performance when incoming data is multiple full cache lines.
      
      PCI write end padding is the device's ability to pad the ending of
      incoming packets (scatter) to full cache line such that the last
      upstream write generated by an incoming packet will be a full cache
      line.
      
      Add a relevant entry to ib_device_cap_flags to report such capability
      of an RDMA device.
      
      Add the QP and WQ create flags:
       * A QP/WQ created with a scatter end padding flag will cause
         HW to pad the last upstream write generated by a packet to cache line.
      
      User should consider several factors before activating this feature:
      - In case of high CPU memory load (which may cause PCI back pressure in
        turn), if a large percent of the writes are partial cache line, this
        feature should be checked as an optional solution.
      - This feature might reduce performance if most packets are between one
        and two cache lines and PCIe throughput has reached its maximum
        capacity. E.g. 65B packet from the network port will lead to 128B
        write on PCIe, which may cause traffic on PCIe to reach high
        throughput.
      Signed-off-by: NNoa Osherovich <noaos@mellanox.com>
      Reviewed-by: NMajd Dibbiny <majd@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      e1d2e887
  12. 19 10月, 2017 2 次提交
  13. 25 9月, 2017 2 次提交
  14. 09 9月, 2017 1 次提交
  15. 31 8月, 2017 1 次提交
    • M
      IB/core: Add new ioctl interface · fac9658c
      Matan Barak 提交于
      In this ioctl interface, processing the command starts from
      properties of the command and fetching the appropriate user objects
      before calling the handler.
      
      Parsing and validation is done according to a specifier declared by
      the driver's code. In the driver, all supported objects are declared.
      These objects are separated to different object namepsaces. Dividing
      objects to namespaces is done at initialization by using the higher
      bits of the object ids. This initialization can mix objects declared
      in different places to one parsing tree using in this ioctl interface.
      
      For each object we list all supported methods. Similarly to objects,
      methods are separated to method namespaces too. Namespacing is done
      similarly to the objects case. This could be used in order to add
      methods to an existing object.
      
      Each method has a specific handler, which could be either a default
      handler or a driver specific handler.
      Along with the handler, a bunch of attributes are specified as well.
      Similarly to objects and method, attributes are namespaced and hashed
      by their ids at initialization too. All supported attributes are
      subject to automatic fetching and validation. These attributes include
      the command, response and the method's related objects' ids.
      
      When these entities (objects, methods and attributes) are used, the
      high bits of the entities ids are used in order to calculate the hash
      bucket index. Then, these high bits are masked out in order to have a
      zero based index. Since we use these high bits for both bucketing and
      namespacing, we get a compact representation and O(1) array access.
      This is mandatory for efficient dispatching.
      
      Each attribute has a type (PTR_IN, PTR_OUT, IDR and FD) and a length.
      Attributes could be validated through some attributes, like:
      (*) Minimum size / Exact size
      (*) Fops for FD
      (*) Object type for IDR
      
      If an IDR/fd attribute is specified, the kernel also states the object
      type and the required access (NEW, WRITE, READ or DESTROY).
      All uobject/fd management is done automatically by the infrastructure,
      meaning - the infrastructure will fail concurrent commands that at
      least one of them requires concurrent access (WRITE/DESTROY),
      synchronize actions with device removals (dissociate context events)
      and take care of reference counting (increase/decrease) for concurrent
      actions invocation. The reference counts on the actual kernel objects
      shall be handled by the handlers.
      
       objects
      +--------+
      |        |
      |        |   methods                                                                +--------+
      |        |   ns         method      method_spec                           +-----+   |len     |
      +--------+  +------+[d]+-------+   +----------------+[d]+------------+    |attr1+-> |type    |
      | object +> |method+-> | spec  +-> +  attr_buckets  +-> |default_chain+--> +-----+   |idr_type|
      +--------+  +------+   |handler|   |                |   +------------+    |attr2|   |access  |
      |        |  |      |   +-------+   +----------------+   |driver chain|    +-----+   +--------+
      |        |  |      |                                    +------------+
      |        |  +------+
      |        |
      |        |
      |        |
      |        |
      |        |
      |        |
      |        |
      |        |
      |        |
      |        |
      +--------+
      
      [d] = Hash ids to groups using the high order bits
      
      The right types table is also chosen by using the high bits from
      the ids. Currently we have either default or driver specific groups.
      
      Once validation and object fetching (or creation) completed, we call
      the handler:
      int (*handler)(struct ib_device *ib_dev, struct ib_uverbs_file *ufile,
                     struct uverbs_attr_bundle *ctx);
      
      ctx bundles attributes of different namespaces. Each element there
      is an array of attributes which corresponds to one namespaces of
      attributes. For example, in the usually used case:
      
       ctx                               core
      +----------------------------+     +------------+
      | core:                      +---> | valid      |
      +----------------------------+     | cmd_attr   |
      | driver:                    |     +------------+
      |----------------------------+--+  | valid      |
                                      |  | cmd_attr   |
                                      |  +------------+
                                      |  | valid      |
                                      |  | obj_attr   |
                                      |  +------------+
                                      |
                                      |  drivers
                                      |  +------------+
                                      +> | valid      |
                                         | cmd_attr   |
                                         +------------+
                                         | valid      |
                                         | cmd_attr   |
                                         +------------+
                                         | valid      |
                                         | obj_attr   |
                                         +------------+
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      fac9658c
  16. 29 8月, 2017 3 次提交
  17. 25 8月, 2017 3 次提交
  18. 23 8月, 2017 1 次提交
  19. 19 8月, 2017 1 次提交
  20. 10 8月, 2017 2 次提交
    • L
      RDMA: Simplify get firmware interface · 9abb0d1b
      Leon Romanovsky 提交于
      There is a need to forward FW version to user space
      application through RDMA netlink. In order to make it safe, there
      is need to declare nla_policy and limit the size of FW string.
      
      The new define IB_FW_VERSION_NAME_MAX will limit the size of
      FW version string. That define was chosen to be equal to
      ETHTOOL_FWVERS_LEN, because many drivers anyway are limited
      by that value indirectly.
      
      The introduction of this define allows us to remove the string size
      from get_fw_str function signature.
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      9abb0d1b
    • L
      RDMA/core: Add and expose static device index · ecc82c53
      Leon Romanovsky 提交于
      This patch adds static device index in similar fashion to
      already available in netdev world (struct net->ifindex).
      
      In downstream patches, the RDMA nelink will use this idx-to-ib_device
      conversion, so as part of this commit, we are exposing the translation
      function to be visible for IB/core users.
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      ecc82c53
  21. 09 8月, 2017 4 次提交