1. 20 3月, 2018 1 次提交
  2. 15 3月, 2018 2 次提交
    • M
      IB/mlx5: Fix cleanup order on unload · 42cea83f
      Mark Bloch 提交于
      On load we create private CQ/QP/PD in order to be used by UMR, we create
      those resources after we register ourself as an IB device, and we destroy
      them after we unregister as an IB device. This was changed by commit
      16c1975f ("IB/mlx5: Create profile infrastructure to add and remove
      stages") which moved the destruction before we unregistration. This
      allowed to trigger an invalid memory access when unloading mlx5_ib while
      there are open resources:
      
      BUG: unable to handle kernel paging request at 00000001002c012c
      ...
      Call Trace:
       mlx5_ib_post_send_wait+0x75/0x110 [mlx5_ib]
       __slab_free+0x9a/0x2d0
       delay_time_func+0x10/0x10 [mlx5_ib]
       unreg_umr.isra.15+0x4b/0x50 [mlx5_ib]
       mlx5_mr_cache_free+0x46/0x150 [mlx5_ib]
       clean_mr+0xc9/0x190 [mlx5_ib]
       dereg_mr+0xba/0xf0 [mlx5_ib]
       ib_dereg_mr+0x13/0x20 [ib_core]
       remove_commit_idr_uobject+0x16/0x70 [ib_uverbs]
       uverbs_cleanup_ucontext+0xe8/0x1a0 [ib_uverbs]
       ib_uverbs_cleanup_ucontext.isra.9+0x19/0x40 [ib_uverbs]
       ib_uverbs_remove_one+0x162/0x2e0 [ib_uverbs]
       ib_unregister_device+0xd4/0x190 [ib_core]
       __mlx5_ib_remove+0x2e/0x40 [mlx5_ib]
       mlx5_remove_device+0xf5/0x120 [mlx5_core]
       mlx5_unregister_interface+0x37/0x90 [mlx5_core]
       mlx5_ib_cleanup+0xc/0x225 [mlx5_ib]
       SyS_delete_module+0x153/0x230
       do_syscall_64+0x62/0x110
       entry_SYSCALL_64_after_hwframe+0x21/0x86
      ...
      
      We restore the original behavior by breaking the UMR stage into two parts,
      pre and post IB registration stages, this way we can restore the original
      functionality and maintain clean separation of logic between stages.
      
      Fixes: 16c1975f ("IB/mlx5: Create profile infrastructure to add and remove stages")
      Signed-off-by: NMark Bloch <markb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      42cea83f
    • I
      IB/mlx5: Maintain a single emergency page · c44ef998
      Ilya Lesokhin 提交于
      The mlx5 driver needs to be able to issue invalidation to ODP MRs
      even if it cannot allocate memory. To this end it preallocates
      emergency pages to use when the situation arises.
      
      This flow should be extremely rare enough, that we don't need
      to worry about contention and therefore a single emergency page
      is good enough.
      Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      c44ef998
  3. 24 2月, 2018 5 次提交
  4. 15 2月, 2018 1 次提交
    • Y
      IB/mlx5: Implement fragmented completion queue (CQ) · 388ca8be
      Yonatan Cohen 提交于
      The current implementation of create CQ requires contiguous
      memory, such requirement is problematic once the memory is
      fragmented or the system is low in memory, it causes for
      failures in dma_zalloc_coherent().
      
      This patch implements new scheme of fragmented CQ to overcome
      this issue by introducing new type: 'struct mlx5_frag_buf_ctrl'
      to allocate fragmented buffers, rather than contiguous ones.
      
      Base the Completion Queues (CQs) on this new fragmented buffer.
      
      It fixes following crashes:
      kworker/29:0: page allocation failure: order:6, mode:0x80d0
      CPU: 29 PID: 8374 Comm: kworker/29:0 Tainted: G OE 3.10.0
      Workqueue: ib_cm cm_work_handler [ib_cm]
      Call Trace:
      [<>] dump_stack+0x19/0x1b
      [<>] warn_alloc_failed+0x110/0x180
      [<>] __alloc_pages_slowpath+0x6b7/0x725
      [<>] __alloc_pages_nodemask+0x405/0x420
      [<>] dma_generic_alloc_coherent+0x8f/0x140
      [<>] x86_swiotlb_alloc_coherent+0x21/0x50
      [<>] mlx5_dma_zalloc_coherent_node+0xad/0x110 [mlx5_core]
      [<>] ? mlx5_db_alloc_node+0x69/0x1b0 [mlx5_core]
      [<>] mlx5_buf_alloc_node+0x3e/0xa0 [mlx5_core]
      [<>] mlx5_buf_alloc+0x14/0x20 [mlx5_core]
      [<>] create_cq_kernel+0x90/0x1f0 [mlx5_ib]
      [<>] mlx5_ib_create_cq+0x3b0/0x4e0 [mlx5_ib]
      Signed-off-by: NYonatan Cohen <yonatanc@mellanox.com>
      Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      388ca8be
  5. 30 1月, 2018 1 次提交
  6. 19 1月, 2018 1 次提交
  7. 09 1月, 2018 6 次提交
  8. 04 1月, 2018 3 次提交
  9. 29 12月, 2017 3 次提交
  10. 28 12月, 2017 1 次提交
  11. 11 11月, 2017 1 次提交
  12. 26 10月, 2017 4 次提交
  13. 29 8月, 2017 1 次提交
  14. 25 8月, 2017 1 次提交
  15. 24 7月, 2017 6 次提交
    • Y
      IB/mlx5: Add support for QP with a given source QPN · c2e53b2c
      Yishai Hadas 提交于
      Allow user space applications to accelerate send and receive
      traffic which is typically handled by IPoIB ULP by creating
      a UD QP with a given source QPN of the IPoIB UD QP.
      
      UD QP with a given source QPN should basically be similar to
      RAW QP from point of view of its created resources.
      
      However,
      - Its TIS should point to the source QPN.
      - Modify can be done only on its state as the transport attributes
        are managed by its source QP.
      
      This patch manages below:
      - Creating/destroying/modifying UD QP with a given source QPN.
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Reviewed-by: NMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      c2e53b2c
    • M
      IB/mlx5: Add delay drop configuration and statistics · fe248c3a
      Maor Gottlieb 提交于
      Add debugfs interface for monitor the number of delay drop timeout
      events and the number of existing dropless RQs in the system.
      
      In addition add debugfs interface for configuring the global timeout value
      which is used in the SET_DELAY_DROP command.
      Signed-off-by: NMaor Gottlieb <maorg@mellanox.com>
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      fe248c3a
    • M
      IB/mlx5: Add support to dropless RQ · 03404e8a
      Maor Gottlieb 提交于
      RQs that were configured for "delay drop" will prevent packet drops
      when their WQEs are depleted.
      Marking an RQ to be drop-less is done by setting delay_drop_en in RQ
      context using CREATE_RQ command.
      
      Since this feature is globally activated/deactivated by using the
      SET_DELAY_DROP command on all the marked RQs, we activated/deactivated
      it according to the number of RQs with 'delay_drop' enabled.
      
      When timeout is expired, then the feature is deactivated. Therefore
      the driver handles the delay drop timeout event and reactivate it.
      Signed-off-by: NMaor Gottlieb <maorg@mellanox.com>
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      03404e8a
    • P
      IB/mlx5: Add debug control parameters for congestion control · 4a2da0b8
      Parav Pandit 提交于
      This patch adds debug control parameters for congestion control which
      can be read or written through debugfs. They are for reaction point and
      notification point nodes.
      
      These control parameters are as below:
       +------------------------------+-----------------------------------------+
       |      Name                    |           Description                   |
       |------------------------------+-----------------------------------------|
       |rp_clamp_tgt_rate             | When set target rate is updated to      |
       |                              | current rate                            |
       |------------------------------+-----------------------------------------|
       |rp_clamp_tgt_rate_ati         | When set update target rate based on    |
       |                              | timer as well                           |
       |------------------------------+-----------------------------------------|
       |rp_time_reset                 | time between rate increase if no        |
       |                              | CNP is received unit in usec            |
       |------------------------------+-----------------------------------------|
       |rp_byte_reset                 | Number of bytes between rate inease if  |
       |                              | no CNP is received                      |
       |------------------------------+-----------------------------------------|
       |rp_threshold                  | Threshold for reaction point rate       |
       |                              | control                                 |
       |------------------------------+-----------------------------------------|
       |rp_ai_rate                    | Rate for target rate, unit in Mbps      |
       |------------------------------+-----------------------------------------|
       |rp_hai_rate                   | Rate for hyper increase state           |
       |                              | unit in Mbps                            |
       |------------------------------+-----------------------------------------|
       |rp_min_dec_fac                | Minimum factor by which the current     |
       |                              | transmit rate can be changed when       |
       |                              | processing a CNP, unit is percerntage   |
       |------------------------------+-----------------------------------------|
       |rp_min_rate                   | Minimum value for rate limit,           |
       |                              | unit in Mbps                            |
       |------------------------------+-----------------------------------------|
       |rp_rate_to_set_on_first_cnp   | Rate that is set when first CNP is      |
       |                              | received, unit is Mbps                  |
       |------------------------------+-----------------------------------------|
       |rp_dce_tcp_g                  | Used to calculate alpha                 |
       |------------------------------+-----------------------------------------|
       |rp_dce_tcp_rtt                | Time between updates of alpha value,    |
       |                              | unit is usec                            |
       |------------------------------+-----------------------------------------|
       |rp_rate_reduce_monitor_period | Minimum time between consecutive rate   |
       |                              | reductions                              |
       |------------------------------+-----------------------------------------|
       |rp_initial_alpha_value        | Initial value of alpha                  |
       |------------------------------+-----------------------------------------|
       |rp_gd                         | When CNP is received, flow rate is      |
       |                              | reduced based on gd, rp_gd is given as  |
       |                              | log2(rp_gd)                             |
       |------------------------------+-----------------------------------------|
       |np_cnp_dscp                   | dscp code point for generated cnp       |
       |------------------------------+-----------------------------------------|
       |np_cnp_prio_mode              | 802.1p priority for generated cnp       |
       |------------------------------+-----------------------------------------|
       |np_cnp_prio                   | cnp priority mode                       |
       +------------------------------+-----------------------------------------+
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Reviewed-by: NDaniel Jurgens <danielj@mellanox.com>
      Reviewed-by: NEli Cohen <eli@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      4a2da0b8
    • M
      IB/mlx5: Change logic for dispatching IB events for port state · fd65f1b8
      Moni Shoua 提交于
      The old logic ignored link state. This led to missing IB events like
      when link goes down on the switch while admin state is up or to redundant
      events like when admin state goes up while link is down.
      To fix that, probe the port state on NETDEV events and compare to last
      known state to decide if IB events needs to be dispatched.
      
      FIxes: 5ec8c83e ("IB/mlx5: Port events in RoCE now rely on netdev events")
      Signed-off-by: NMoni Shoua <monis@mellanox.com>
      Reviewed-by: NNoa Osherovich <noaos@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      fd65f1b8
    • H
      IB/mlx5: Add raw ethernet local loopback support · c85023e1
      Huy Nguyen 提交于
      Currently, unicast/multicast loopback raw ethernet
      (non-RDMA) packets are sent back to the vport.
      A unicast loopback packet is the packet with destination
      MAC address the same as the source MAC address.
      For multicast, the destination MAC address is in the
      vport's multicast filter list.
      
      Moreover, the local loopback is not needed if
      there is one or none user space context.
      
      After this patch, the raw ethernet unicast and multicast
      local loopback are disabled by default. When there is more
      than one user space context, the local loopback is enabled.
      
      Note that when local loopback is disabled, raw ethernet
      packets are not looped back to the vport and are forwarded
      to the next routing level (eswitch, or multihost switch,
      or out to the wire depending on the configuration).
      Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
      Reviewed-by: NDaniel Jurgens <danielj@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      c85023e1
  16. 02 6月, 2017 1 次提交
  17. 02 5月, 2017 1 次提交
  18. 26 4月, 2017 1 次提交