1. 29 8月, 2013 2 次提交
    • I
      IB/core: Infrastructure for extensible uverbs commands · 400dbc96
      Igor Ivanov 提交于
      Add infrastructure to support extended uverbs capabilities in a
      forward/backward manner.  Uverbs command opcodes which are based on
      the verbs extensions approach should be greater or equal to
      IB_USER_VERBS_CMD_THRESHOLD.  They have new header format and
      processed a bit differently.
      
      Whenever a specific IB_USER_VERBS_CMD_XXX is extended, which practically means
      it needs to have additional arguments, we will be able to add them without creating
      a completely new IB_USER_VERBS_CMD_YYY command or bumping the uverbs ABI version.
      
      This patch for itself doesn't provide the whole scheme which is also dependent
      on adding a comp_mask field to each extended uverbs command struct.
      
      The new header framework allows for future extension of the CMD arguments
      (ib_uverbs_cmd_hdr.in_words, ib_uverbs_cmd_hdr.out_words) for an existing
      new command (that is a command that supports the new uverbs command header format
      suggested in this patch) w/o bumping ABI version and with maintaining backward
      and formward compatibility to new and old libibverbs versions.
      
      In the uverbs command we are passing both uverbs arguments and the provider arguments.
      We split the ib_uverbs_cmd_hdr.in_words to ib_uverbs_cmd_hdr.in_words which will now carry only
      uverbs input argument struct size and  ib_uverbs_cmd_hdr.provider_in_words that will carry
      the provider input argument size. Same goes for the response (the uverbs CMD output argument).
      
      For example take the create_cq call and the mlx4_ib provider:
      
      The uverbs layer gets libibverb's struct ibv_create_cq (named struct ib_uverbs_create_cq
      in the kernel), mlx4_ib gets libmlx4's struct mlx4_create_cq (which includes struct
      ibv_create_cq and is named struct mlx4_ib_create_cq in the kernel) and
      in_words = sizeof(mlx4_create_cq)/4 .
      
      Thus ib_uverbs_cmd_hdr.in_words carry both uverbs plus mlx4_ib input argument sizes,
      where uverbs assumes it knows the size of its input argument - struct ibv_create_cq.
      
      Now, if we wish to add a variable to struct ibv_create_cq, we can add a comp_mask field
      to the struct which is basically bit field indicating which fields exists in the struct
      (as done for the libibverbs API extension), but we need a way to tell what is the total
      size of the struct and not assume the struct size is predefined (since we may get different
      struct sizes from different user libibverbs versions). So we know at which point the
      provider input argument (struct mlx4_create_cq) begins. Same goes for extending the
      provider struct mlx4_create_cq. Thus we split the ib_uverbs_cmd_hdr.in_words to
      ib_uverbs_cmd_hdr.in_words which will now carry only uverbs input argument struct size and
      ib_uverbs_cmd_hdr.provider_in_words that will carry the provider (mlx4_ib) input argument size.
      Signed-off-by: NIgor Ivanov <Igor.Ivanov@itseez.com>
      Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      400dbc96
    • H
      IB/core: Add receive flow steering support · 319a441d
      Hadar Hen Zion 提交于
      The RDMA stack allows for applications to create IB_QPT_RAW_PACKET
      QPs, which receive plain Ethernet packets, specifically packets that
      don't carry any QPN to be matched by the receiving side.  Applications
      using these QPs must be provided with a method to program some
      steering rule with the HW so packets arriving at the local port can be
      routed to them.
      
      This patch adds ib_create_flow(), which allow providing a flow
      specification for a QP.  When there's a match between the
      specification and a received packet, the packet is forwarded to that
      QP, in a the same way one uses ib_attach_multicast() for IB UD
      multicast handling.
      
      Flow specifications are provided as instances of struct ib_flow_spec_yyy,
      which describe L2, L3 and L4 headers.  Currently specs for Ethernet, IPv4,
      TCP and UDP are defined.  Flow specs are made of values and masks.
      
      The input to ib_create_flow() is a struct ib_flow_attr, which contains
      a few mandatory control elements and optional flow specs.
      
          struct ib_flow_attr {
                  enum ib_flow_attr_type type;
                  u16      size;
                  u16      priority;
                  u32      flags;
                  u8       num_of_specs;
                  u8       port;
                  /* Following are the optional layers according to user request
                   * struct ib_flow_spec_yyy
                   * struct ib_flow_spec_zzz
                   */
          };
      
      As these specs are eventually coming from user space, they are defined and
      used in a way which allows adding new spec types without kernel/user ABI
      change, just with a little API enhancement which defines the newly added spec.
      
      The flow spec structures are defined with TLV (Type-Length-Value)
      entries, which allows calling ib_create_flow() with a list of variable
      length of optional specs.
      
      For the actual processing of ib_flow_attr the driver uses the number
      of specs and the size mandatory fields along with the TLV nature of
      the specs.
      
      Steering rules processing order is according to the domain over which
      the rule is set and the rule priority.  All rules set by user space
      applicatations fall into the IB_FLOW_DOMAIN_USER domain, other domains
      could be used by future IPoIB RFS and Ethetool flow-steering interface
      implementation.  Lower numerical value for the priority field means
      higher priority.
      
      The returned value from ib_create_flow() is a struct ib_flow, which
      contains a database pointer (handle) provided by the HW driver to be
      used when calling ib_destroy_flow().
      
      Applications that offload TCP/IP traffic can also be written over IB
      UD QPs.  The ib_create_flow() / ib_destroy_flow() API is designed to
      support UD QPs too.  A HW driver can set IB_DEVICE_MANAGED_FLOW_STEERING
      to denote support for flow steering.
      
      The ib_flow_attr enum type supports usage of flow steering for promiscuous
      and sniffer purposes:
      
          IB_FLOW_ATTR_NORMAL - "regular" rule, steering according to rule specification
      
          IB_FLOW_ATTR_ALL_DEFAULT - default unicast and multicast rule, receive
              all Ethernet traffic which isn't steered to any QP
      
          IB_FLOW_ATTR_MC_DEFAULT - same as IB_FLOW_ATTR_ALL_DEFAULT but only for multicast
      
          IB_FLOW_ATTR_SNIFFER - sniffer rule, receive all port traffic
      
      ALL_DEFAULT and MC_DEFAULT rules options are valid only for Ethernet link type.
      Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      319a441d
  2. 01 8月, 2013 7 次提交
    • E
      IPoIB: Fix pkey change flow for virtualization environments · c2904141
      Erez Shitrit 提交于
      IPoIB's required behaviour w.r.t to the pkey used by the device is the following:
      
      - For "parent" interfaces (e.g ib0, ib1, etc) who are created
        automatically as a result of hot-plug events from the IB core, the
        driver needs to take whatever pkey vlaue it finds in index 0, and
        stick to that index.
      
      - For child interfaces (e.g ib0.8001, etc) created by admin directive,
        the driver needs to use and stick to the value provided during its
        creation.
      
      In SR-IOV environment its possible for the VF probe to take place
      before the cloud management software provisions the suitable pkey for
      the VF in the paravirtualed PKEY table index 0. When this is the case,
      the VF IB stack will find in index 0 an invalide pkey, which is all
      zeros.
      
      Moreover, the cloud managment can assign the pkey value at index 0 at
      any time of the guest life cycle.
      
      The correct behavior for IPoIB to address these requirements for
      parent interfaces is to use PKEY_CHANGE event as trigger to optionally
      re-init the device pkey value and re-create all the relevant resources
      accordingly, if the value of the pkey in index 0 has changed (from
      invalid to valid or from valid value X to invalid value Y).
      
      This patch enhances the heavy flushing code which is triggered by pkey
      change event, to behave correctly for parent devices. For child
      devices, the code remains the same, namely chases pkey value and not
      index.
      Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      c2904141
    • O
      IPoIB: Make sure child devices use valid/proper pkeys · 3d790a4c
      Or Gerlitz 提交于
      Make sure that the IB invalid pkey (0x0000 or 0x8000) isn't used for
      child devices.
      
      Also, make sure to always set the full membership bit for the pkey of
      devices created by rtnl link ops.
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      3d790a4c
    • J
      IB/core: Create QP1 using the pkey index which contains the default pkey · ef5ed416
      Jack Morgenstein 提交于
      Currently, QP1 is created using pkey index 0. This patch simply looks
      for the index containing the default pkey, rather than hard-coding
      pkey index 0.
      
      This change will have no effect in native mode, since QP0 and QP1 are
      created before the SM configures the port, so pkey table will still be
      the default table defined by the IB Spec, in C10-123: "If non-volatile
      storage is not used to hold P_Key Table contents, then if a PM
      (Partition Manager) is not present, and prior to PM initialization of
      the P_Key Table, the P_Key Table must act as if it contains a single
      valid entry, at P_Key_ix = 0, containing the default partition
      key. All other entries in the P_Key Table must be invalid."
      
      Thus, in the native mode case, the driver will find the default pkey
      at index 0 (so it will be no different than the hard-coding).
      
      However, in SR-IOV mode, for VFs, the pkey table may be
      paravirtualized, so that the VF's pkey index zero may not necessarily
      be mapped to the real pkey index 0. For VFs, therefore, it is
      important to find the virtual index which maps to the real default
      pkey.
      
      This commit does the following for QP1 creation:
      
      1. Find the pkey index containing the default pkey, and use that index
         if found.  ib_find_pkey() returns the index of the
         limited-membership default pkey (0x7FFF) if the full-member default
         pkey is not in the table.
      
      2. If neither form of the default pkey is found, use pkey index 0
         (previous behavior).
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      ef5ed416
    • A
      mlx5_core: Variable may be used uninitialized · 618af384
      Andi Shyti 提交于
      In the sq_overhead() function, if qp_typ is equal to IB_QPT_RC, size
      will be used uninitialized.
      Signed-off-by: NAndi Shyti <andi@etezian.org>
      Acked-by: NEli Cohen <eli@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      618af384
    • D
      IB/mlx5: Fix stack info leak in mlx5_ib_alloc_ucontext() · 92b0ca7c
      Dan Carpenter 提交于
      We don't set "resp.reserved".  Since it's at the end of the struct
      that means we don't have to copy it to the user.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: NEli Cohen <eli@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      92b0ca7c
    • W
      IB/mlx5: Fix error return code in init_one() · 281d1a92
      Wei Yongjun 提交于
      Fix to return a negative error code from the error handling case
      instead of 0, as done elsewhere in this function.
      Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      281d1a92
    • J
      IB/mlx4: Use default pkey when creating tunnel QPs · 3eac103f
      Jack Morgenstein 提交于
      When creating tunnel QPs for special QP tunneling, look for the
      default pkey in the slave's virtual pkey table.  If it is present, use
      the real pkey index where the default pkey is located.
      
      If the default pkey is not found in the pkey table, use the real pkey
      index which is stored at index 0 in the slave's virtual pkey table
      (this is the current behavior).
      
      This change is required to support cloud computing, where the
      paravirtualized index of the default pkey is moved to index 1 or
      higher.  The pkey at paravirtualized index 0 is used for the default
      IPoIB interface created by the VF.
      
      Its possible for the pkey value at paravirtualized index 0 to be
      invalid (zero) at VF probe time (pkey index 0 is mapped to real pkey
      index 127, which contains pkey = 0).
      
      At some point after the VF probe, the cloud computing interface at the
      hypervisor maps virtual index 0 for the VF to the pkey index
      containing the pkey that IPoIB will use in its operation.  However,
      when the tunnel QP is created, the pkey at the slave's virtual index 0
      is still mapped to the invalid pkey index, so tunnel QP creation
      fails.
      
      This commit causes the hypervisor to search for the default pkey in
      the slave's pkey table -- and this pkey is present in the table (at
      index > 0) at tunnel QP creation time, so that the tunnel QP creation
      will succeed.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      3eac103f
  3. 31 7月, 2013 9 次提交
  4. 27 7月, 2013 1 次提交
  5. 12 7月, 2013 4 次提交
  6. 09 7月, 2013 3 次提交
  7. 08 7月, 2013 6 次提交
  8. 07 7月, 2013 4 次提交
    • N
      iscsi-target: Fix ISCSI_OP_SCSI_TMFUNC handling for iser · 186a9647
      Nicholas Bellinger 提交于
      This patch adds target_get_sess_cmd reference counting for
      iscsit_handle_task_mgt_cmd(), and adds a target_put_sess_cmd()
      for the failure case.
      
      It also fixes a bug where ISCSI_OP_SCSI_TMFUNC type commands
      where leaking iscsi_cmd->i_conn_node and eventually triggering
      an OOPs during struct isert_conn shutdown.
      
      Cc: stable@vger.kernel.org  # 3.10+
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      186a9647
    • N
      iscsi-target: Fix iscsit_sequence_cmd reject handling for iser · 561bf158
      Nicholas Bellinger 提交于
      This patch moves ISCSI_OP_REJECT failures into iscsit_sequence_cmd()
      in order to avoid external iscsit_reject_cmd() reject usage for all
      PDU types.
      
      It also updates PDU specific handlers for traditional iscsi-target
      code to not reset the session after posting a ISCSI_OP_REJECT during
      setup.
      
      (v2: Fix CMDSN_LOWER_THAN_EXP for ISCSI_OP_SCSI to call
           target_put_sess_cmd() after iscsit_sequence_cmd() failure)
      
      Cc: Or Gerlitz <ogerlitz@mellanox.com>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: stable@vger.kernel.org  # 3.10+
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      561bf158
    • N
      iscsi-target: Fix iscsit_add_reject* usage for iser · ba159914
      Nicholas Bellinger 提交于
      This patch changes iscsit_add_reject() + iscsit_add_reject_from_cmd()
      usage to not sleep on iscsi_cmd->reject_comp to address a free-after-use
      usage bug in v3.10 with iser-target code.
      
      It saves ->reject_reason for use within iscsit_build_reject() so the
      correct value for both transport cases.  It also drops the legacy
      fail_conn parameter usage throughput iscsi-target code and adds
      two iscsit_add_reject_cmd() and iscsit_reject_cmd helper functions,
      along with various small cleanups.
      
      (v2: Re-enable target_put_sess_cmd() to be called from
           iscsit_add_reject_from_cmd() for rejects invoked after
           target_get_sess_cmd() has been called)
      
      Cc: Or Gerlitz <ogerlitz@mellanox.com>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: stable@vger.kernel.org  # 3.10+
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      ba159914
    • N
      iser-target: Fix isert_put_reject payload buffer post · 3df8f68a
      Nicholas Bellinger 提交于
      This patch adds the missing isert_put_reject() logic to post
      a outgoing payload buffer to hold the 48 bytes of original PDU
      header request payload for the rejected cmd.
      
      It also fixes ISTATE_SEND_REJECT handling in isert_response_completion()
      -> isert_do_control_comp() code, and drops incorrect iscsi_cmd_t->reject_comp
      usage.
      
      Cc: Or Gerlitz <ogerlitz@mellanox.com>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: stable@vger.kernel.org  # 3.10+
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      3df8f68a
  9. 04 7月, 2013 1 次提交
  10. 02 7月, 2013 3 次提交