1. 17 7月, 2014 1 次提交
  2. 14 7月, 2014 1 次提交
  3. 09 7月, 2014 3 次提交
  4. 12 6月, 2014 3 次提交
  5. 11 6月, 2014 6 次提交
    • H
      iw_cxgb4: don't truncate the recv window size · b408ff28
      Hariprasad Shenai 提交于
      Fixed a bug that shows up with recv window sizes that exceed the size of
      the RCV_BUFSIZ field in opt0 (>= 1024K).  If the recv window exceeds
      this, then we specify the max possible in opt0, add add the rest in via
      a RX_DATA_ACK credits.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b408ff28
    • H
      iw_cxgb4: Choose appropriate hw mtu index and ISS for iWARP connections · 92e7ae71
      Hariprasad Shenai 提交于
      Select the appropriate hw mtu index and initial sequence number to optimize
      hw memory performance.
      
      Add new cxgb4_best_aligned_mtu() which allows callers to provide enough
      information to be used to [possibly] select an MTU which will result in the
      TCP Data Segment Size (AKA Maximum Segment Size) to be an aligned value.
      
      If an RTR message exhange is required, then align the ISS to 8B - 1 + 4, so
      that after the SYN the send seqno will align on a 4B boundary. The RTR
      message exchange will leave the send seqno aligned on an 8B boundary.
      If an RTR is not required, then align the ISS to 8B - 1.  The goal is
      to have the send seqno be 8B aligned when we send the first FPDU.
      
      Based on original work by Casey Leedom <leeedom@chelsio.com> and
      Steve Wise <swise@opengridcomputing.com>
      Signed-off-by: NCasey Leedom <leedom@chelsio.com>
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92e7ae71
    • H
      iw_cxgb4: Allocate and use IQs specifically for indirect interrupts · cf38be6d
      Hariprasad Shenai 提交于
      Currently indirect interrupts for RDMA CQs funnel through the LLD's RDMA
      RXQs, which also handle direct interrupts for offload CPLs during RDMA
      connection setup/teardown.  The intended T4 usage model, however, is to
      have indirect interrupts flow through dedicated IQs. IE not to mix
      indirect interrupts with CPL messages in an IQ.  This patch adds the
      concept of RDMA concentrator IQs, or CIQs, setup and maintained by the
      LLD and exported to iw_cxgb4 for use when creating CQs. RDMA CPLs will
      flow through the LLD's RDMA RXQs, and CQ interrupts flow through the
      CIQs.
      
      Design:
      
      cxgb4 creates and exports an array of CIQs for the RDMA ULD.  These IQs
      are sized according to the max available CQs available at adapter init.
      In addition, these IQs don't need FL buffers since they only service
      indirect interrupts.  One CIQ is setup per RX channel similar to the
      RDMA RXQs.
      
      iw_cxgb4 will utilize these CIQs based on the vector value passed into
      create_cq().  The num_comp_vectors advertised by iw_cxgb4 will be the
      number of CIQs configured, and thus the vector value will be the index
      into the array of CIQs.
      
      Based on original work by Steve Wise <swise@opengridcomputing.com>
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf38be6d
    • S
      RDMA/cxgb4: Add support for iWARP Port Mapper user space service · 9eccfe10
      Steve Wise 提交于
      Based on original work by Vipul Pandya.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      
      [ Fix htons -> ntohs to make sparse happy.  - Roland ]
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      9eccfe10
    • T
    • T
      RDMA/core: Add support for iWARP Port Mapper user space service · 30dc5e63
      Tatyana Nikolova 提交于
      This patch adds iWARP Port Mapper (IWPM) Version 2 support.  The iWARP
      Port Mapper implementation is based on the port mapper specification
      section in the Sockets Direct Protocol paper -
      http://www.rdmaconsortium.org/home/draft-pinkerton-iwarp-sdp-v1.0.pdf
      
      Existing iWARP RDMA providers use the same IP address as the native
      TCP/IP stack when creating RDMA connections.  They need a mechanism to
      claim the TCP ports used for RDMA connections to prevent TCP port
      collisions when other host applications use TCP ports.  The iWARP Port
      Mapper provides a standard mechanism to accomplish this.  Without this
      service it is possible for RDMA application to bind/listen on the same
      port which is already being used by native TCP host application.  If
      that happens the incoming TCP connection data can be passed to the
      RDMA stack with error.
      
      The iWARP Port Mapper solution doesn't contain any changes to the
      existing network stack in the kernel space.  All the changes are
      contained with the infiniband tree and also in user space.
      
      The iWARP Port Mapper service is implemented as a user space daemon
      process.  Source for the IWPM service is located at
      http://git.openfabrics.org/git?p=~tnikolova/libiwpm-1.0.0/.git;a=summary
      
      The iWARP driver (port mapper client) sends to the IWPM service the
      local IP address and TCP port it has received from the RDMA
      application, when starting a connection.  The IWPM service performs a
      socket bind from user space to get an available TCP port, called a
      mapped port, and communicates it back to the client.  In that sense,
      the IWPM service is used to map the TCP port, which the RDMA
      application uses to any port available from the host TCP port
      space. The mapped ports are used in iWARP RDMA connections to avoid
      collisions with native TCP stack which is aware that these ports are
      taken. When an RDMA connection using a mapped port is terminated, the
      client notifies the IWPM service, which then releases the TCP port.
      
      The message exchange between the IWPM service and the iWARP drivers
      (between user space and kernel space) is implemented using netlink
      sockets.
      
      1) Netlink interface functions are added: ibnl_unicast() and
         ibnl_mulitcast() for sending netlink messages to user space
      
      2) The signature of the existing ibnl_put_msg() is changed to be more
         generic
      
      3) Two netlink clients are added: RDMA_NL_NES, RDMA_NL_C4IW
         corresponding to the two iWarp drivers - nes and cxgb4 which use
         the IWPM service
      
      4) Enums are added to enumerate the attributes in the netlink
         messages, which are exchanged between the user space IWPM service
         and the iWARP drivers
      Signed-off-by: NTatyana Nikolova <tatyana.e.nikolova@intel.com>
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Reviewed-by: NPJ Waskiewicz <pj.waskiewicz@solidfire.com>
      
      [ Fold in range checking fixes and nlh_next removal as suggested by Dan
        Carpenter and Steve Wise.  Fix sparse endianness in hash.  - Roland ]
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      30dc5e63
  6. 10 6月, 2014 1 次提交
  7. 07 6月, 2014 1 次提交
    • B
      IB/umad: Fix use-after-free on close · 60e1751c
      Bart Van Assche 提交于
      Avoid that closing /dev/infiniband/umad<n> or /dev/infiniband/issm<n>
      triggers a use-after-free.  __fput() invokes f_op->release() before it
      invokes cdev_put().  Make sure that the ib_umad_device structure is
      freed by the cdev_put() call instead of f_op->release().  This avoids
      that changing the port mode from IB into Ethernet and back to IB
      followed by restarting opensmd triggers the following kernel oops:
      
          general protection fault: 0000 [#1] PREEMPT SMP
          RIP: 0010:[<ffffffff810cc65c>]  [<ffffffff810cc65c>] module_put+0x2c/0x170
          Call Trace:
           [<ffffffff81190f20>] cdev_put+0x20/0x30
           [<ffffffff8118e2ce>] __fput+0x1ae/0x1f0
           [<ffffffff8118e35e>] ____fput+0xe/0x10
           [<ffffffff810723bc>] task_work_run+0xac/0xe0
           [<ffffffff81002a9f>] do_notify_resume+0x9f/0xc0
           [<ffffffff814b8398>] int_signal+0x12/0x17
      
      Reference: https://bugzilla.kernel.org/show_bug.cgi?id=75051Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Reviewed-by: NYann Droneaud <ydroneaud@opteya.com>
      Cc: <stable@vger.kernel.org> # 3.x: 8ec0a0e6: IB/umad: Fix error handling
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      60e1751c
  8. 06 6月, 2014 2 次提交
    • H
      IB/core: Fix kobject leak on device register error flow · 584482ac
      Haggai Eran 提交于
      The ports kobject isn't being released during error flow in device
      registration.  This patch refactors the ports kobject cleanup into a
      single function called from both the error flow in device registration
      and from the unregistration function.
      
      A couple of attributes aren't being deleted (iw_stats_group, and
      ib_class_attributes).  While this may be handled implicitly by the
      destruction of their kobjects, it seems better to handle all the
      attributes the same way.
      Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
      
      [ Make free_port_list_attributes() static.  - Roland ]
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      584482ac
    • Y
      RDMA/cxgb4: add missing padding at end of struct c4iw_alloc_ucontext_resp · b7dfa889
      Yann Droneaud 提交于
      The i386 ABI disagrees with most other ABIs regarding alignment of
      data types larger than 4 bytes: on most ABIs a padding must be added
      at end of the structures, while it is not required on i386.
      
      So for most ABI struct c4iw_alloc_ucontext_resp gets implicitly padded
      to be aligned on a 8 bytes multiple, while for i386, such padding is
      not added.
      
      The tool pahole can be used to find such implicit padding:
      
        $ pahole --anon_include \
                 --nested_anon_include \
                 --recursive \
                 --class_name c4iw_alloc_ucontext_resp \
                 drivers/infiniband/hw/cxgb4/iw_cxgb4.o
      
      Then, structure layout can be compared between i386 and x86_64:
      
        +++ obj-i386/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt   2014-03-28 11:43:05.547432195 +0100
        --- obj-x86_64/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 10:55:10.990133017 +0100
        @@ -2,9 +2,8 @@ struct c4iw_alloc_ucontext_resp {
                __u64                      status_page_key;      /*     0     8 */
                __u32                      status_page_size;     /*     8     4 */
      
        -       /* size: 12, cachelines: 1, members: 2 */
        -       /* last cacheline: 12 bytes */
        +       /* size: 16, cachelines: 1, members: 2 */
        +       /* padding: 4 */
        +       /* last cacheline: 16 bytes */
         };
      
      This ABI disagreement will make an x86_64 kernel try to write past the
      buffer provided by an i386 binary.
      
      When boundary check will be implemented, the x86_64 kernel will refuse
      to write past the i386 userspace provided buffer and the uverbs will
      fail.
      
      If the structure is on a page boundary and the next page is not
      mapped, ib_copy_to_udata() will fail and the uverb will fail.
      
      Additionally, as reported by Dan Carpenter, without the implicit
      padding being properly cleared, an information leak would take place
      in most architectures.
      
      This patch adds an explicit padding to struct c4iw_alloc_ucontext_resp,
      and, like 92b0ca7c ("IB/mlx5: Fix stack info leak in
      mlx5_ib_alloc_ucontext()"), makes function c4iw_alloc_ucontext()
      not writting this padding field to userspace. This way, x86_64 kernel
      will be able to write struct c4iw_alloc_ucontext_resp as expected by
      unpatched and patched i386 libcxgb4.
      
      Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com
      Link: http://marc.info/?i=1395848977.3297.15.camel@localhost.localdomain
      Link: http://marc.info/?i=20140328082428.GH25192@mwanda
      Cc: <stable@vger.kernel.org>
      Fixes: 05eb2389 ("cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes")
      Reported-by: NYann Droneaud <ydroneaud@opteya.com>
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
      Acked-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      b7dfa889
  9. 05 6月, 2014 3 次提交
    • H
      IB/core: Fix port kobject deletion during error flow · cad6d02a
      Haggai Eran 提交于
      When encountering an error during the add_port function, adding a port
      to sysfs, the port kobject is freed without being deleted from sysfs.
      
      Instead of freeing it directly, the patch uses kobject_put to release
      the kobject and delete it.
      Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      cad6d02a
    • H
      IB/core: Remove unneeded kobject_get/put calls · 373c0ea1
      Haggai Eran 提交于
      The ib_core module will call kobject_get on the parent object of each
      kobject it creates.  This is redundant since kobject_add does that
      anyway.
      
      As a side effect, this patch should fix leaking the ports kobject and
      the device kobject during unregister flow, since the previous code
      didn't seem to take into account the kobject_get calls on behalf of
      the child kobjects.
      Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      373c0ea1
    • R
      IB/core: Fix sparse warnings about redeclared functions · 8385fd84
      Roland Dreier 提交于
      Fix a few functions that are declared with __attribute_const__ in the
      ib_verbs.h header file but defined without it in verbs.c.  This gets rid
      of the following sparse warnings:
      
          drivers/infiniband/core/verbs.c:51:5: error: symbol 'ib_rate_to_mult' redeclared with different type (originally declared at include/rdma/ib_verbs.h:469) - different modifiers
          drivers/infiniband/core/verbs.c:68:14: error: symbol 'mult_to_ib_rate' redeclared with different type (originally declared at include/rdma/ib_verbs.h:607) - different modifiers
          drivers/infiniband/core/verbs.c:85:5: error: symbol 'ib_rate_to_mbps' redeclared with different type (originally declared at include/rdma/ib_verbs.h:476) - different modifiers
          drivers/infiniband/core/verbs.c:111:1: error: symbol 'rdma_node_get_transport' redeclared with different type (originally declared at include/rdma/ib_verbs.h:84) - different modifiers
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      8385fd84
  10. 04 6月, 2014 2 次提交
    • N
      iser-target: Add missing target_put_sess_cmd for ImmedateData failure · 6cc44a6f
      Nicholas Bellinger 提交于
      This patch addresses a bug where an early exception for SCSI WRITE
      with ImmediateData=Yes was missing the target_put_sess_cmd() call
      to drop the extra se_cmd->cmd_kref reference obtained during the
      normal iscsit_setup_scsi_cmd() codepath execution.
      
      This bug was manifesting itself during session shutdown within
      isert_cq_rx_comp_err() where target_wait_for_sess_cmds() would
      end up waiting indefinately for the last se_cmd->cmd_kref put to
      occur for the failed SCSI WRITE + ImmediateData descriptors.
      
      This fix follows what traditional iscsi-target code already does
      for the same failure case within iscsit_get_immediate_data().
      Reported-by: NSagi Grimberg <sagig@dev.mellanox.co.il>
      Cc: Sagi Grimberg <sagig@dev.mellanox.co.il>
      Cc: Or Gerlitz <ogerlitz@mellanox.com>
      Cc: stable@vger.kernel.org # 3.10+
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      6cc44a6f
    • R
      IB/mad: Fix sparse warning about gfp_t use · 5343c00d
      Roland Dreier 提交于
      Properly convert gfp_t & result to bool to fix:
      
          drivers/infiniband/core/sa_query.c:621:33: warning: incorrect type in initializer (different base types)
          drivers/infiniband/core/sa_query.c:621:33:    expected bool [unsigned] [usertype] preload
          drivers/infiniband/core/sa_query.c:621:33:    got restricted gfp_t
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      5343c00d
  11. 03 6月, 2014 4 次提交
    • J
      IB/mlx4: Implement IB_QP_CREATE_USE_GFP_NOIO · 40f2287b
      Jiri Kosina 提交于
      Modify the various routines used to allocate memory resources which
      serve QPs in mlx4 to get an input GFP directive.  Have the Ethernet
      driver to use GFP_KERNEL in it's QP allocations as done prior to this
      commit, and the IB driver to use GFP_NOIO when the IB verbs
      IB_QP_CREATE_USE_GFP_NOIO QP creation flag is provided.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      40f2287b
    • O
      IB: Add a QP creation flag to use GFP_NOIO allocations · 09b93088
      Or Gerlitz 提交于
      This addresses a problem where NFS client writes over IPoIB connected
      mode may deadlock on memory allocation/writeback.
      
      The problem is not directly memory reclamation.  There is an indirect
      dependency between network filesystems writing back pages and
      ipoib_cm_tx_init() due to how a kworker is used.  Page reclaim cannot
      make forward progress until ipoib_cm_tx_init() succeeds and it is
      stuck in page reclaim itself waiting for network transmission.
      Ordinarily this situation may be avoided by having the caller use
      GFP_NOFS but ipoib_cm_tx_init() does not have that information.
      
      To address this, take a general approach and add a new QP creation
      flag that tells the low-level hardware driver to use GFP_NOIO for the
      memory allocations related to the new QP.
      
      Use the new flag in the ipoib connected mode path, and if the driver
      doesn't support it, re-issue the QP creation without the flag.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      09b93088
    • O
      IB: Return error for unsupported QP creation flags · 60093dc0
      Or Gerlitz 提交于
      Fix the usnic and thw qib drivers to err when QP creation flags that
      they don't understand are provided.
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      60093dc0
    • Y
      IB: Allow build of hw/ and ulp/ subdirectories independently · 729ee4ef
      Yann Droneaud 提交于
      It is not possible to build only the drivers/infiniband/hw/ (or ulp/)
      subdirectory with command such as:
      
          $ make ARCH=x86_64 O=./obj-x86_64/ drivers/infiniband/hw/
      
      This fails with following error messages:
      
          make[2]: Nothing to be done for `all'.
          make[2]: Nothing to be done for `relocs'.
            CHK     include/config/kernel.release
            Using /home/ydroneaud/src/linux as source for kernel
            GEN     /home/ydroneaud/src/linux/obj-x86_64/Makefile
            CHK     include/generated/uapi/linux/version.h
            CHK     include/generated/utsrelease.h
            CALL    /home/ydroneaud/src/linux/scripts/checksyscalls.sh
          /home/ydroneaud/src/linux/scripts/Makefile.build:44: /home/ydroneaud/src/linux/drivers/infiniband/hw/Makefile: No such file or directory
          make[2]: *** No rule to make target `/home/ydroneaud/src/linux/drivers/infiniband/hw/Makefile'.  Stop.
          make[1]: *** [drivers/infiniband/hw/] Error 2
          make: *** [sub-make] Error 2
      
      This patch creates a Makefile in hw/ and ulp/ and moves each
      corresponding parts of drivers/infiniband/Makefile in the new
      Makefiles.
      
      It should not break build except if some hw/ drivers or ulp/ were
      allowed previously to be built while CONFIG_INFINIBAND is set to 'n',
      but according to drivers/infiniband/Kconfig, it's not possible. So it
      should be safe to apply.
      Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
      Reviewed-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      729ee4ef
  12. 02 6月, 2014 2 次提交
  13. 30 5月, 2014 8 次提交
    • Y
      RDMA/cxgb4: Add missing padding at end of struct c4iw_create_cq_resp · b6f04d3d
      Yann Droneaud 提交于
      The i386 ABI disagrees with most other ABIs regarding alignment of
      data types larger than 4 bytes: on most ABIs a padding must be added
      at end of the structures, while it is not required on i386.
      
      So for most ABI struct c4iw_create_cq_resp gets implicitly padded
      to be aligned on a 8 bytes multiple, while for i386, such padding
      is not added.
      
      The tool pahole can be used to find such implicit padding:
      
        $ pahole --anon_include \
                 --nested_anon_include \
                 --recursive \
                 --class_name c4iw_create_cq_resp \
                 drivers/infiniband/hw/cxgb4/iw_cxgb4.o
      
      Then, structure layout can be compared between i386 and x86_64:
      
        +++ obj-i386/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt   2014-03-28 11:43:05.547432195 +0100
        --- obj-x86_64/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 10:55:10.990133017 +0100
        @@ -14,9 +13,8 @@ struct c4iw_create_cq_resp {
                __u32                      size;                 /*    28     4 */
                __u32                      qid_mask;             /*    32     4 */
      
        -       /* size: 36, cachelines: 1, members: 6 */
        -       /* last cacheline: 36 bytes */
        +       /* size: 40, cachelines: 1, members: 6 */
        +       /* padding: 4 */
        +       /* last cacheline: 40 bytes */
         };
      
      This ABI disagreement will make an x86_64 kernel try to write past the
      buffer provided by an i386 binary.
      
      When boundary check will be implemented, the x86_64 kernel will refuse
      to write past the i386 userspace provided buffer and the uverbs will
      fail.
      
      If the structure is on a page boundary and the next page is not
      mapped, ib_copy_to_udata() will fail and the uverb will fail.
      
      This patch adds an explicit padding at end of structure
      c4iw_create_cq_resp, and, like 92b0ca7c ("IB/mlx5: Fix stack info
      leak in mlx5_ib_alloc_ucontext()"), makes function c4iw_create_cq()
      not writting this padding field to userspace. This way, x86_64 kernel
      will be able to write struct c4iw_create_cq_resp as expected by
      unpatched and patched i386 libcxgb4.
      
      Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com
      Cc: <stable@vger.kernel.org>
      Fixes: cfdda9d7 ("RDMA/cxgb4: Add driver for Chelsio T4 RNIC")
      Fixes: e24a72a3 ("RDMA/cxgb4: Fix four byte info leak in c4iw_create_cq()")
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
      Acked-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      b6f04d3d
    • J
      IB/srp: Avoid problems if a header uses pr_fmt · d236cd0e
      Joe Perches 提交于
      SRP defines pr_fmt(fmt) to be "PFX fmt", and then includes a bunch of
      header files before it gets around to defining PFX.  This causes
      problems if any of the header files do a pr_... and use pr_fmt().
      
      Fix this by using KBUILD_MODNAME instead of the private PFX.
      Acked-by: NChris Metcalf <cmetcalf@tilera.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      d236cd0e
    • B
      IB/umad: Fix error handling · 8ec0a0e6
      Bart Van Assche 提交于
      Avoid leaking a kref count in ib_umad_open() if port->ib_dev == NULL
      or if nonseekable_open() fails.
      
      Avoid leaking a kref count, that sm_sem is kept down and also that the
      IB_PORT_SM capability mask is not cleared in ib_umad_sm_open() if
      nonseekable_open() fails.
      
      Since container_of() never returns NULL, remove the code that tests
      whether container_of() returns NULL.
      
      Moving the kref_get() call from the start of ib_umad_*open() to the
      end is safe since it is the responsibility of the caller of these
      functions to ensure that the cdev pointer remains valid until at least
      when these functions return.
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Cc: <stable@vger.kernel.org>
      
      [ydroneaud@opteya.com: rework a bit to reduce the amount of code changed]
      Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
      
      [ nonseekable_open() can't actually fail, but....  - Roland ]
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      8ec0a0e6
    • J
      IB/mlx4: Add interface for selecting VFs to enable QP0 via MLX proxy QPs · 65fed8a8
      Jack Morgenstein 提交于
      This commit adds the sysfs interface for enabling QP0 on VFs for
      selected VF/port.
      
      By default, no VFs are enabled for QP0 operation.
      
      To enable QP0 operation on a VF/port, under
      /sys/class/infiniband/mlx4_x/iov/<b:d:f>/ports/x there are two new entries:
      
      - smi_enabled (read-only). Indicates whether smi is currently
        enabled for the indicated VF/port
      
      - enable_smi_admin (rw). Used by the admin to request that smi
        capability be enabled or disabled for the indicated VF/port.
        0 = disable, 1 = enable.
        The requested enablement will occur at the next reset of the
        VF (e.g. driver restart on the VM which owns the VF).
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      65fed8a8
    • J
      mlx4: Add infrastructure for selecting VFs to enable QP0 via MLX proxy QPs · 99ec41d0
      Jack Morgenstein 提交于
      This commit adds the infrastructure for enabling selected VFs to
      operate SMI (QP0) MADs without restriction.
      
      Additionally, for these enabled VFs, their QP0 proxy and tunnel QPs
      are MLX QPs.  As such, they operate over VL15.  Therefore, they are
      not affected by "credit" problems or changes in the VLArb table (which
      may shut down VL0).
      
      Non-enabled VFs may only create UD proxy QP0 qps (which are forced by
      the hypervisor to send packets using the q-key it assigns and places
      in the qp-context).  Thus, non-enabled VFs will not pose a security
      risk.  The hypervisor discards any privileged MADs it receives from
      these non-enabled VFs.
      
      By default, all VFs are NOT enabled, and must explicitly be enabled
      by the administrator.
      
      The sysfs interface which operates the VF enablement infrastructure
      is provided in the next commit.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      99ec41d0
    • J
      IB/mlx4: Preparation for VFs to issue/receive SMI (QP0) requests/responses · 97982f5a
      Jack Morgenstein 提交于
      Currently, VFs in SRIOV VFs are denied QP0 access.  The main reason
      for this decision is security, since Subnet Management Datagrams
      (SMPs) are not restricted by network partitioning and may affect the
      physical network topology.  Moreover, even the SM may be denied access
      from portions of the network by setting management keys unknown to the
      SM.
      
      However, it is desirable to grant SMI access to certain privileged
      VFs, so that certain network management activities may be conducted
      within virtual machines instead of the hypervisor.
      
      This commit does the following:
      
      1. Create QP0 tunnel QPs for all VFs.
      
      2. Discard SMI mads sent-from/received-for non-privileged VFs in the
         hypervisor MAD multiplex/demultiplex logic.  SMI mads from/for
         privileged VFs are allowed to pass.
      
      3. MAD_IFC wrapper changes/fixes.  For non-privileged VFs, only
         host-view MAD_IFC commands are allowed, and only for SMI LID-Routed
         GET mads.  For privileged VFs, there are no restrictions.
      
      This commit does not allow privileged VFs as yet.  To determine if a VF
      is privileged, it calls function mlx4_vf_smi_enabled().  This function
      returns 0 unconditionally for now.
      
      The next two commits allow defining and activating privileged VFs.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      97982f5a
    • J
      IB/mlx4: SET_PORT called by mlx4_ib_modify_port should be wrapped · 61565013
      Jack Morgenstein 提交于
      mlx4_ib_modify_port is invoked in IB for resetting the Q_Key violations
      counters and for modifying the IB port capability flags.
      
      For example, when opensm is started up on the hypervisor,
      mlx4_ib_modify_port is called to set the port's IsSM flag.
      
      In multifunction mode, the SET_PORT command used in this flow should
      be wrapped (so that the PF port capability flags are also tracked,
      thus enabling the aggregate of all the VF/PF capability flags to be
      tracked properly).
      
      The procedure mlx4_SET_PORT() in main.c is also renamed to mlx4_ib_SET_PORT()
      to differentiate it from procedure mlx4_SET_PORT() in port.c.
      mlx4_ib_SET_PORT() is used exclusively by mlx4_ib_modify_port().
      
      Finally, the CM invokes ib_modify_port() to set the IsCMSupported flag
      even when running over RoCE.  Therefore, when RoCE is active,
      mlx4_ib_modify_port should return OK unconditionally (since the
      capability flags and qkey violations counter are not relevant).
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      61565013
    • V
      IB/qib: Additional Intel branding changes · 0a66d2bd
      Vinit Agnihotri 提交于
      This patches changes user visible function names containing "qlogic"
      in module init and cleanup.
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NVinit Agnihotri <vinit.abhay.agnihotri@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      0a66d2bd
  14. 29 5月, 2014 3 次提交