1. 16 12月, 2014 5 次提交
    • S
      RDMA/cxgb4: Wake up waiters after flushing the qp · 5b341808
      Steve Wise 提交于
      When transitioning into ERROR state, the QP was getting flushed after
      waking up any waiters.  This can cause applications to miss flushed work
      requests which can stall an NFS mount.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      5b341808
    • H
      RDMA/cxgb4: Limit MRs to < 8GB for T4/T5 devices · 2550a88d
      Hariprasad Shenai 提交于
      T4/T5 hardware can't handle MRs >= 8GB due to a hardware bug.  So limit
      registrations to < 8GB for thse devices.
      
      Based on original work by Steve Wise <swise@opengridcomputing.com>.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      2550a88d
    • H
      RDMA/cxgb4: Fix locking issue in process_mpa_request · 10be6b48
      Hariprasad Shenai 提交于
      Fix the following lockdep report:
      
          =============================================
          [ INFO: possible recursive locking detected ]
          3.17.0+ #3 Tainted: G            E
          ---------------------------------------------
          kworker/u64:3/299 is trying to acquire lock:
           (&epc->mutex){+.+.+.}, at: [<ffffffffa074e07a>]
          process_mpa_request+0x1aa/0x3e0 [iw_cxgb4]
      
          but task is already holding lock:
           (&epc->mutex){+.+.+.}, at: [<ffffffffa074e34e>] rx_data+0x9e/0x1f0 [iw_cxgb4]
      
          other info that might help us debug this:
           Possible unsafe locking scenario:
      
                 CPU0
                 ----
            lock(&epc->mutex);
            lock(&epc->mutex);
      
           *** DEADLOCK ***
      
           May be due to missing lock nesting notation
      
          3 locks held by kworker/u64:3/299:
           #0:  ("%s""iw_cxgb4"){.+.+.+}, at: [<ffffffff8106f14d>]
          process_one_work+0x13d/0x4d0
           #1:  (skb_work){+.+.+.}, at: [<ffffffff8106f14d>] process_one_work+0x13d/0x4d0
           #2:  (&epc->mutex){+.+.+.}, at: [<ffffffffa074e34e>] rx_data+0x9e/0x1f0
          [iw_cxgb4]
      
          stack backtrace:
          CPU: 2 PID: 299 Comm: kworker/u64:3 Tainted: G            E  3.17.0+ #3
          Hardware name: Dell Inc. PowerEdge T110/0X744K, BIOS 1.2.1 01/28/2010
          Workqueue: iw_cxgb4 process_work [iw_cxgb4]
           ffff8800b91593d0 ffff8800b8a2f9f8 ffffffff815df107 0000000000000001
           ffff8800b9158750 ffff8800b8a2fa28 ffffffff8109f0e2 ffff8800bb768a00
           ffff8800b91593d0 ffff8800b9158750 0000000000000000 ffff8800b8a2fa88
          Call Trace:
           [<ffffffff815df107>] dump_stack+0x49/0x62
           [<ffffffff8109f0e2>] print_deadlock_bug+0xf2/0x100
           [<ffffffff810a0f04>] validate_chain+0x454/0x700
           [<ffffffff810a1574>] __lock_acquire+0x3c4/0x580
           [<ffffffffa074e07a>] ? process_mpa_request+0x1aa/0x3e0 [iw_cxgb4]
           [<ffffffff810a17cc>] lock_acquire+0x9c/0x110
           [<ffffffffa074e07a>] ? process_mpa_request+0x1aa/0x3e0 [iw_cxgb4]
           [<ffffffff815e111b>] mutex_lock_nested+0x4b/0x360
           [<ffffffffa074e07a>] ? process_mpa_request+0x1aa/0x3e0 [iw_cxgb4]
           [<ffffffff810c181a>] ? del_timer_sync+0xaa/0xd0
           [<ffffffff810c1770>] ? try_to_del_timer_sync+0x70/0x70
           [<ffffffffa074e07a>] process_mpa_request+0x1aa/0x3e0 [iw_cxgb4]
           [<ffffffffa074a3ec>] ? update_rx_credits+0xec/0x140 [iw_cxgb4]
           [<ffffffffa074e381>] rx_data+0xd1/0x1f0 [iw_cxgb4]
           [<ffffffff8109ff23>] ? mark_held_locks+0x73/0xa0
           [<ffffffff815e4b90>] ? _raw_spin_unlock_irqrestore+0x40/0x70
           [<ffffffff810a020d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
           [<ffffffff810a02dd>] ? trace_hardirqs_on+0xd/0x10
           [<ffffffffa074c931>] process_work+0x51/0x80 [iw_cxgb4]
           [<ffffffff8106f1c8>] process_one_work+0x1b8/0x4d0
           [<ffffffff8106f14d>] ? process_one_work+0x13d/0x4d0
           [<ffffffff8106f600>] worker_thread+0x120/0x3c0
           [<ffffffff8106f4e0>] ? process_one_work+0x4d0/0x4d0
           [<ffffffff81074a0e>] kthread+0xde/0x100
           [<ffffffff815e4b40>] ? _raw_spin_unlock_irq+0x30/0x40
           [<ffffffff81074930>] ? __init_kthread_worker+0x70/0x70
           [<ffffffff815e512c>] ret_from_fork+0x7c/0xb0
           [<ffffffff81074930>] ? __init_kthread_worker+0x70/0x70
      
      Based on original work by Steve Wise <swise@opengridcomputing.com>.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      10be6b48
    • P
      RDMA/cxgb4: Configure 0B MRs to match HW implementation · 123bc2a2
      Pramod Kumar 提交于
      0B MRs need some tweaks to work correctly with HW. When writing the
      TPTE, if the MR length is zero we now:
      
      1) turn off all permissions
      2) set the length to -1
      
      While functionality/capabilities of the MR are the same with these
      changes, it resolves a dapltest 0B RDMA Read test failure.  Based on
      original work by Steve Wise <swise@opengridcomputing.com>.
      Signed-off-by: NPramod Kumar <pramod@chelsio.com>
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      123bc2a2
    • P
      RDMA/cxgb4: Increase epd buff size for debug interface · 63a71ba6
      Pramod Kumar 提交于
      IPv6 address string lengths require increasing the buffer size for
      debugfs handlers.
      Signed-off-by: NPramod Kumar <pramod@chelsio.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      63a71ba6
  2. 12 12月, 2014 3 次提交
    • M
      net/mlx4: Add A0 hybrid steering · d57febe1
      Matan Barak 提交于
      A0 hybrid steering is a form of high performance flow steering.
      By using this mode, mlx4 cards use a fast limited table based steering,
      in order to enable fast steering of unicast packets to a QP.
      
      In order to implement A0 hybrid steering we allocate resources
      from different zones:
      (1) General range
      (2) Special MAC-assigned QPs [RSS, Raw-Ethernet] each has its own region.
      
      When we create a rss QP or a raw ethernet (A0 steerable and BF ready) QP,
      we try hard to allocate the QP from range (2). Otherwise, we try hard not
      to allocate from this  range. However, when the system is pushed to its
      limits and one needs every resource, the allocator uses every region it can.
      
      Meaning, when we run out of raw-eth qps, the allocator allocates from the
      general range (and the special-A0 area is no longer active). If we run out
      of RSS qps, the mechanism tries to allocate from the raw-eth QP zone. If that
      is also exhausted, the allocator will allocate from the general range
      (and the A0 region is no longer active).
      
      Note that if a raw-eth qp is allocated from the general range, it attempts
      to allocate the range such that bits 6 and 7 (blueflame bits) in the
      QP number are not set.
      
      When the feature is used in SRIOV, the VF has to notify the PF what
      kind of QP attributes it needs. In order to do that, along with the
      "Eth QP blueflame" bit, we reserve a new "A0 steerable QP". According
      to the combination of these bits, the PF tries to allocate a suitable QP.
      
      In order to maintain backward compatibility (with older PFs), the PF
      notifies which QP attributes it supports via QUERY_FUNC_CAP command.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d57febe1
    • E
      net/mlx4: Change QP allocation scheme · ddae0349
      Eugenia Emantayev 提交于
      When using BF (Blue-Flame), the QPN overrides the VLAN, CV, and SV fields
      in the WQE. Thus, BF may only be used for QPNs with bits 6,7 unset.
      
      The current Ethernet driver code reserves a Tx QP range with 256b alignment.
      
      This is wrong because if there are more than 64 Tx QPs in use,
      QPNs >= base + 65 will have bits 6/7 set.
      
      This problem is not specific for the Ethernet driver, any entity that
      tries to reserve more than 64 BF-enabled QPs should fail. Also, using
      ranges is not necessary here and is wasteful.
      
      The new mechanism introduced here will support reservation for
      "Eth QPs eligible for BF" for all drivers: bare-metal, multi-PF, and VFs
      (when hypervisors support WC in VMs). The flow we use is:
      
      1. In mlx4_en, allocate Tx QPs one by one instead of a range allocation,
         and request "BF enabled QPs" if BF is supported for the function
      
      2. In the ALLOC_RES FW command, change param1 to:
      a. param1[23:0]  - number of QPs
      b. param1[31-24] - flags controlling QPs reservation
      
      Bit 31 refers to Eth blueflame supported QPs. Those QPs must have
      bits 6 and 7 unset in order to be used in Ethernet.
      
      Bits 24-30 of the flags are currently reserved.
      
      When a function tries to allocate a QP, it states the required attributes
      for this QP. Those attributes are considered "best-effort". If an attribute,
      such as Ethernet BF enabled QP, is a must-have attribute, the function has
      to check that attribute is supported before trying to do the allocation.
      
      In a lower layer of the code, mlx4_qp_reserve_range masks out the bits
      which are unsupported. If SRIOV is used, the PF validates those attributes
      and masks out unsupported attributes as well. In order to notify VFs which
      attributes are supported, the VF uses QUERY_FUNC_CAP command. This command's
      mailbox is filled by the PF, which notifies which QP allocation attributes
      it supports.
      Signed-off-by: NEugenia Emantayev <eugenia@mellanox.co.il>
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ddae0349
    • M
      net/mlx4_core: Use tasklet for user-space CQ completion events · 3dca0f42
      Matan Barak 提交于
      Previously, we've fired all our completion callbacks straight from our ISR.
      
      Some of those callbacks were lightweight (for example, mlx4_en's and
      IPoIB napi callbacks), but some of them did more work (for example,
      the user-space RDMA stack uverbs' completion handler). Besides that,
      doing more than the minimal work in ISR is generally considered wrong,
      it could even lead to a hard lockup of the system. Since when a lot
      of completion events are generated by the hardware, the loop over those
      events could be so long, that we'll get into a hard lockup by the system
      watchdog.
      
      In order to avoid that, add a new way of invoking completion events
      callbacks. In the interrupt itself, we add the CQs which receive completion
      event to a per-EQ list and schedule a tasklet. In the tasklet context
      we loop over all the CQs in the list and invoke the user callback.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3dca0f42
  3. 09 12月, 2014 2 次提交
  4. 24 11月, 2014 3 次提交
  5. 23 11月, 2014 3 次提交
  6. 22 11月, 2014 1 次提交
  7. 20 11月, 2014 1 次提交
  8. 14 11月, 2014 2 次提交
  9. 12 11月, 2014 10 次提交
  10. 11 11月, 2014 1 次提交
  11. 03 11月, 2014 2 次提交
  12. 31 10月, 2014 1 次提交
    • O
      mlx4: Avoid leaking steering rules on flow creation error flow · 571e1b2c
      Or Gerlitz 提交于
      If mlx4_ib_create_flow() attempts to create > 1 rules with the
      firmware, and one of these registrations fail, we leaked the
      already created flow rules.
      
      One example of the leak is when the registration of the VXLAN ghost
      steering rule fails, we didn't unregister the original rule requested
      by the user, introduced in commit d2fce8a9 "mlx4: Set
      user-space raw Ethernet QPs to properly handle VXLAN traffic".
      
      While here, add dump of the VXLAN portion of steering rules
      so it can actually be seen when flow creation fails.
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      571e1b2c
  13. 29 10月, 2014 1 次提交
  14. 14 10月, 2014 5 次提交