1. 09 1月, 2018 1 次提交
    • M
      IB/core: Introduce driver QP type · 8011c1e3
      Moni Shoua 提交于
      Vendors can implement type of QPs that are not described in the
      InfiniBand specification. To still be able to use the IB/core layer
      services (e.g. user object management) without tainting this layer with
      driver proprietary logic, a new QP type is added - IB_QPT_DRIVER. This
      will be a general QP type that the core layer doesn't know about its true nature.
      When a command like create_qp() is passed to a hardware driver the extra
      data that is required is taken from the driver channel.
      Downstream patches from this series will use that QP type in the mlx5
      driver.
      Signed-off-by: NMoni Shoua <monis@mellanox.com>
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      8011c1e3
  2. 19 12月, 2017 2 次提交
  3. 14 11月, 2017 3 次提交
  4. 11 11月, 2017 1 次提交
    • N
      IB/core: Add PCI write end padding flags for WQ and QP · e1d2e887
      Noa Osherovich 提交于
      There are root complexes that are able to optimize their
      performance when incoming data is multiple full cache lines.
      
      PCI write end padding is the device's ability to pad the ending of
      incoming packets (scatter) to full cache line such that the last
      upstream write generated by an incoming packet will be a full cache
      line.
      
      Add a relevant entry to ib_device_cap_flags to report such capability
      of an RDMA device.
      
      Add the QP and WQ create flags:
       * A QP/WQ created with a scatter end padding flag will cause
         HW to pad the last upstream write generated by a packet to cache line.
      
      User should consider several factors before activating this feature:
      - In case of high CPU memory load (which may cause PCI back pressure in
        turn), if a large percent of the writes are partial cache line, this
        feature should be checked as an optional solution.
      - This feature might reduce performance if most packets are between one
        and two cache lines and PCIe throughput has reached its maximum
        capacity. E.g. 65B packet from the network port will lead to 128B
        write on PCIe, which may cause traffic on PCIe to reach high
        throughput.
      Signed-off-by: NNoa Osherovich <noaos@mellanox.com>
      Reviewed-by: NMajd Dibbiny <majd@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      e1d2e887
  5. 19 10月, 2017 2 次提交
  6. 25 9月, 2017 2 次提交
  7. 09 9月, 2017 1 次提交
  8. 31 8月, 2017 1 次提交
    • M
      IB/core: Add new ioctl interface · fac9658c
      Matan Barak 提交于
      In this ioctl interface, processing the command starts from
      properties of the command and fetching the appropriate user objects
      before calling the handler.
      
      Parsing and validation is done according to a specifier declared by
      the driver's code. In the driver, all supported objects are declared.
      These objects are separated to different object namepsaces. Dividing
      objects to namespaces is done at initialization by using the higher
      bits of the object ids. This initialization can mix objects declared
      in different places to one parsing tree using in this ioctl interface.
      
      For each object we list all supported methods. Similarly to objects,
      methods are separated to method namespaces too. Namespacing is done
      similarly to the objects case. This could be used in order to add
      methods to an existing object.
      
      Each method has a specific handler, which could be either a default
      handler or a driver specific handler.
      Along with the handler, a bunch of attributes are specified as well.
      Similarly to objects and method, attributes are namespaced and hashed
      by their ids at initialization too. All supported attributes are
      subject to automatic fetching and validation. These attributes include
      the command, response and the method's related objects' ids.
      
      When these entities (objects, methods and attributes) are used, the
      high bits of the entities ids are used in order to calculate the hash
      bucket index. Then, these high bits are masked out in order to have a
      zero based index. Since we use these high bits for both bucketing and
      namespacing, we get a compact representation and O(1) array access.
      This is mandatory for efficient dispatching.
      
      Each attribute has a type (PTR_IN, PTR_OUT, IDR and FD) and a length.
      Attributes could be validated through some attributes, like:
      (*) Minimum size / Exact size
      (*) Fops for FD
      (*) Object type for IDR
      
      If an IDR/fd attribute is specified, the kernel also states the object
      type and the required access (NEW, WRITE, READ or DESTROY).
      All uobject/fd management is done automatically by the infrastructure,
      meaning - the infrastructure will fail concurrent commands that at
      least one of them requires concurrent access (WRITE/DESTROY),
      synchronize actions with device removals (dissociate context events)
      and take care of reference counting (increase/decrease) for concurrent
      actions invocation. The reference counts on the actual kernel objects
      shall be handled by the handlers.
      
       objects
      +--------+
      |        |
      |        |   methods                                                                +--------+
      |        |   ns         method      method_spec                           +-----+   |len     |
      +--------+  +------+[d]+-------+   +----------------+[d]+------------+    |attr1+-> |type    |
      | object +> |method+-> | spec  +-> +  attr_buckets  +-> |default_chain+--> +-----+   |idr_type|
      +--------+  +------+   |handler|   |                |   +------------+    |attr2|   |access  |
      |        |  |      |   +-------+   +----------------+   |driver chain|    +-----+   +--------+
      |        |  |      |                                    +------------+
      |        |  +------+
      |        |
      |        |
      |        |
      |        |
      |        |
      |        |
      |        |
      |        |
      |        |
      |        |
      +--------+
      
      [d] = Hash ids to groups using the high order bits
      
      The right types table is also chosen by using the high bits from
      the ids. Currently we have either default or driver specific groups.
      
      Once validation and object fetching (or creation) completed, we call
      the handler:
      int (*handler)(struct ib_device *ib_dev, struct ib_uverbs_file *ufile,
                     struct uverbs_attr_bundle *ctx);
      
      ctx bundles attributes of different namespaces. Each element there
      is an array of attributes which corresponds to one namespaces of
      attributes. For example, in the usually used case:
      
       ctx                               core
      +----------------------------+     +------------+
      | core:                      +---> | valid      |
      +----------------------------+     | cmd_attr   |
      | driver:                    |     +------------+
      |----------------------------+--+  | valid      |
                                      |  | cmd_attr   |
                                      |  +------------+
                                      |  | valid      |
                                      |  | obj_attr   |
                                      |  +------------+
                                      |
                                      |  drivers
                                      |  +------------+
                                      +> | valid      |
                                         | cmd_attr   |
                                         +------------+
                                         | valid      |
                                         | cmd_attr   |
                                         +------------+
                                         | valid      |
                                         | obj_attr   |
                                         +------------+
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      fac9658c
  9. 29 8月, 2017 3 次提交
  10. 25 8月, 2017 3 次提交
  11. 23 8月, 2017 1 次提交
  12. 19 8月, 2017 1 次提交
  13. 10 8月, 2017 2 次提交
    • L
      RDMA: Simplify get firmware interface · 9abb0d1b
      Leon Romanovsky 提交于
      There is a need to forward FW version to user space
      application through RDMA netlink. In order to make it safe, there
      is need to declare nla_policy and limit the size of FW string.
      
      The new define IB_FW_VERSION_NAME_MAX will limit the size of
      FW version string. That define was chosen to be equal to
      ETHTOOL_FWVERS_LEN, because many drivers anyway are limited
      by that value indirectly.
      
      The introduction of this define allows us to remove the string size
      from get_fw_str function signature.
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      9abb0d1b
    • L
      RDMA/core: Add and expose static device index · ecc82c53
      Leon Romanovsky 提交于
      This patch adds static device index in similar fashion to
      already available in netdev world (struct net->ifindex).
      
      In downstream patches, the RDMA nelink will use this idx-to-ib_device
      conversion, so as part of this commit, we are exposing the translation
      function to be visible for IB/core users.
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      ecc82c53
  14. 09 8月, 2017 4 次提交
  15. 24 7月, 2017 3 次提交
  16. 18 7月, 2017 2 次提交
  17. 06 7月, 2017 1 次提交
  18. 28 6月, 2017 2 次提交
  19. 24 5月, 2017 1 次提交
    • D
      IB/core: Enforce PKey security on QPs · d291f1a6
      Daniel Jurgens 提交于
      Add new LSM hooks to allocate and free security contexts and check for
      permission to access a PKey.
      
      Allocate and free a security context when creating and destroying a QP.
      This context is used for controlling access to PKeys.
      
      When a request is made to modify a QP that changes the port, PKey index,
      or alternate path, check that the QP has permission for the PKey in the
      PKey table index on the subnet prefix of the port. If the QP is shared
      make sure all handles to the QP also have access.
      
      Store which port and PKey index a QP is using. After the reset to init
      transition the user can modify the port, PKey index and alternate path
      independently. So port and PKey settings changes can be a merge of the
      previous settings and the new ones.
      
      In order to maintain access control if there are PKey table or subnet
      prefix change keep a list of all QPs are using each PKey index on
      each port. If a change occurs all QPs using that device and port must
      have access enforced for the new cache settings.
      
      These changes add a transaction to the QP modify process. Association
      with the old port and PKey index must be maintained if the modify fails,
      and must be removed if it succeeds. Association with the new port and
      PKey index must be established prior to the modify and removed if the
      modify fails.
      
      1. When a QP is modified to a particular Port, PKey index or alternate
         path insert that QP into the appropriate lists.
      
      2. Check permission to access the new settings.
      
      3. If step 2 grants access attempt to modify the QP.
      
      4a. If steps 2 and 3 succeed remove any prior associations.
      
      4b. If ether fails remove the new setting associations.
      
      If a PKey table or subnet prefix changes walk the list of QPs and
      check that they have permission. If not send the QP to the error state
      and raise a fatal error event. If it's a shared QP make sure all the
      QPs that share the real_qp have permission as well. If the QP that
      owns a security structure is denied access the security structure is
      marked as such and the QP is added to an error_list. Once the moving
      the QP to error is complete the security structure mark is cleared.
      
      Maintaining the lists correctly turns QP destroy into a transaction.
      The hardware driver for the device frees the ib_qp structure, so while
      the destroy is in progress the ib_qp pointer in the ib_qp_security
      struct is undefined. When the destroy process begins the ib_qp_security
      structure is marked as destroying. This prevents any action from being
      taken on the QP pointer. After the QP is destroyed successfully it
      could still listed on an error_list wait for it to be processed by that
      flow before cleaning up the structure.
      
      If the destroy fails the QPs port and PKey settings are reinserted into
      the appropriate lists, the destroying flag is cleared, and access control
      is enforced, in case there were any cache changes during the destroy
      flow.
      
      To keep the security changes isolated a new file is used to hold security
      related functionality.
      Signed-off-by: NDaniel Jurgens <danielj@mellanox.com>
      Acked-by: NDoug Ledford <dledford@redhat.com>
      [PM: merge fixup in ib_verbs.h and uverbs_cmd.c]
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      d291f1a6
  20. 23 5月, 2017 1 次提交
  21. 10 5月, 2017 1 次提交
  22. 02 5月, 2017 2 次提交