1. 20 1月, 2016 7 次提交
    • D
      RDMA/be2net: Remove open and close entry points · 9781808c
      Devesh Sharma 提交于
      Recently Dough Ledford reported a deadlock happening
      between ocrdma-load sequence and NetworkManager service
      issueing "open" on be2net interface.
      
      The deadlock happens when any be2net hook (e.g. open/close) is called
      in parallel to insmod ocrdma.ko.
      
      A. be2net is sending administrative open/close event to ocrdma holding
         device_list_mutex. It does this from ndo_open/ndo_stop hooks of be2net.
         So sequence of locks is rtnl_lock---> device_list lock
      
      B.  When new ocrdma roce device gets registered, infiniband stack now
          takes rtnl_lock in ib_register_device() in GID initialization routines.
          So sequence of locks in this path is device_list lock ---> rtnl_lock.
      
      This improper locking sequence causes deadlock.
      
      In order to resolve the above deadlock condition, ocrdma intorduced a
      patch to stop listening to administrative open/close events generated from
      be2net driver. It now depends on link-state-change async-event generated from
      CNA. This change leaves behind dead code which used to generate administrative
      open/close events. This patch cleans-up all that dead code from be2net.
      Reported-by: NDoug Ledford <dledford@redhat.com>
      CC: Sathya Perla <sathya.perla@avagotech.com>
      Signed-off-by: NPadmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
      Signed-off-by: NSelvin Xavier <selvin.xavier@avagotech.com>
      Signed-off-by: NDevesh Sharma <devesh.sharma@avagotech.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      9781808c
    • D
      RDMA/ocrdma: Depend on async link events from CNA · 3b1ea430
      Devesh Sharma 提交于
      Recently Dough Ledford reported a deadlock happening
      between ocrdma-load sequence and NetworkManager service
      issuing "open" on be2net interface.
      
      The deadlock happens when any be2net hook (e.g. open/close) is called
      in parallel to insmod ocrdma.ko.
      
      A. be2net is sending administrative open/close event to ocrdma holding
         device_list_mutex. It does this from ndo_open/ndo_stop hooks of be2net.
         So sequence of locks is rtnl_lock---> device_list lock
      
      B.  When new ocrdma roce device gets registered, infiniband stack now
          takes rtnl_lock in ib_register_device() in GID initialization routines.
          So sequence of locks in this path is device_list lock ---> rtnl_lock.
      
      This improper locking sequence causes deadlock.
      
      With this patch we stop using administrative open and close events
      injected by be2net driver. These events were used to dispatch PORT_ACTIVE
      and PORT_ERROR events to the IB-stack. This patch implements a logic
      to receive async-link-events generated from CNA whenever link-state-change
      is detected. Now on, these async-events will be used to dispatch
      PORT_ACTIVE and PORT_ERROR events to IB-stack.
      
      Depending on async-events from CNA removes the need to hold device-list-mutex
      and thus breaks the busy-wait scenario.
      Reported-by: NDoug Ledford <dledford@redhat.com>
      CC: Sathya Perla <sathya.perla@avagotech.com>
      Signed-off-by: NPadmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
      Signed-off-by: NSelvin Xavier <selvin.xavier@avagotech.com>
      Signed-off-by: NDevesh Sharma <devesh.sharma@avagotech.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      3b1ea430
    • D
      RDMA/ocrdma: Dispatch only port event when port state changes · d310a344
      Devesh Sharma 提交于
      Dispatch only port event to IB stack when port state changes.
      Don't explicitly modify qps to error. Let application listen to
      port events on async event queue or let QP fail with retry-exceeded
      completion error.
      Signed-off-by: NPadmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
      Signed-off-by: NDevesh Sharma <devesh.sharma@avagotech.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      d310a344
    • D
      RDMA/ocrdma: Fix vlan-id assignment in qp parameters · a2addf94
      Devesh Sharma 提交于
      vlan-id is wrongly getting as 0 when PFC is enabled.
      Set vlan-id configured by user in QP parameters.
      In case vlan interface is not used, flash a warning to
      user to configure vlan and assign vlan-id as 0 in qp params.
      
      Fixes: dbf727de ('IB/core: Use GID table in AH creation and dmac resolution')
      Cc: Matan Barak <matanb@mellanox.com>
      Signed-off-by: NDevesh Sharma <devesh.sharma@avagotech.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      a2addf94
    • M
      IB/cma: Fix RDMA port validation for iWarp · 64936773
      Matan Barak 提交于
      cma_validate_port wrongly assumed that Ethernet devices are RoCE
      devices and thus their ndev should be matched in the GID table.
      This broke the iWarp support. Fixing that matching the ndev only if
      we work on a RoCE port.
      
      Cc: <stable@vger.kernel.org> # 4.4.x-
      Fixes: abae1b71 ('IB/cma: cma_validate_port should verify the port
      		     and netdevice')
      Reported-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Tested-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      64936773
    • M
      IB/qib: fix mcast detach when qp not attached · 09dc9cd6
      Mike Marciniszyn 提交于
      The code produces the following trace:
      
      [1750924.419007] general protection fault: 0000 [#3] SMP
      [1750924.420364] Modules linked in: nfnetlink autofs4 rpcsec_gss_krb5 nfsv4
      dcdbas rfcomm bnep bluetooth nfsd auth_rpcgss nfs_acl dm_multipath nfs lockd
      scsi_dh sunrpc fscache radeon ttm drm_kms_helper drm serio_raw parport_pc
      ppdev i2c_algo_bit lpc_ich ipmi_si ib_mthca ib_qib dca lp parport ib_ipoib
      mac_hid ib_cm i3000_edac ib_sa ib_uverbs edac_core ib_umad ib_mad ib_core
      ib_addr tg3 ptp dm_mirror dm_region_hash dm_log psmouse pps_core
      [1750924.420364] CPU: 1 PID: 8401 Comm: python Tainted: G D
      3.13.0-39-generic #66-Ubuntu
      [1750924.420364] Hardware name: Dell Computer Corporation PowerEdge
      860/0XM089, BIOS A04 07/24/2007
      [1750924.420364] task: ffff8800366a9800 ti: ffff88007af1c000 task.ti:
      ffff88007af1c000
      [1750924.420364] RIP: 0010:[<ffffffffa0131d51>] [<ffffffffa0131d51>]
      qib_mcast_qp_free+0x11/0x50 [ib_qib]
      [1750924.420364] RSP: 0018:ffff88007af1dd70  EFLAGS: 00010246
      [1750924.420364] RAX: 0000000000000001 RBX: ffff88007b822688 RCX:
      000000000000000f
      [1750924.420364] RDX: ffff88007b822688 RSI: ffff8800366c15a0 RDI:
      6764697200000000
      [1750924.420364] RBP: ffff88007af1dd78 R08: 0000000000000001 R09:
      0000000000000000
      [1750924.420364] R10: 0000000000000011 R11: 0000000000000246 R12:
      ffff88007baa1d98
      [1750924.420364] R13: ffff88003ecab000 R14: ffff88007b822660 R15:
      0000000000000000
      [1750924.420364] FS:  00007ffff7fd8740(0000) GS:ffff88007fc80000(0000)
      knlGS:0000000000000000
      [1750924.420364] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [1750924.420364] CR2: 00007ffff597c750 CR3: 000000006860b000 CR4:
      00000000000007e0
      [1750924.420364] Stack:
      [1750924.420364]  ffff88007b822688 ffff88007af1ddf0 ffffffffa0132429
      000000007af1de20
      [1750924.420364]  ffff88007baa1dc8 ffff88007baa0000 ffff88007af1de70
      ffffffffa00cb313
      [1750924.420364]  00007fffffffde88 0000000000000000 0000000000000008
      ffff88003ecab000
      [1750924.420364] Call Trace:
      [1750924.420364]  [<ffffffffa0132429>] qib_multicast_detach+0x1e9/0x350
      [ib_qib]
      [1750924.568035]  [<ffffffffa00cb313>] ? ib_uverbs_modify_qp+0x323/0x3d0
      [ib_uverbs]
      [1750924.568035]  [<ffffffffa0092d61>] ib_detach_mcast+0x31/0x50 [ib_core]
      [1750924.568035]  [<ffffffffa00cc213>] ib_uverbs_detach_mcast+0x93/0x170
      [ib_uverbs]
      [1750924.568035]  [<ffffffffa00c61f6>] ib_uverbs_write+0xc6/0x2c0 [ib_uverbs]
      [1750924.568035]  [<ffffffff81312e68>] ? apparmor_file_permission+0x18/0x20
      [1750924.568035]  [<ffffffff812d4cd3>] ? security_file_permission+0x23/0xa0
      [1750924.568035]  [<ffffffff811bd214>] vfs_write+0xb4/0x1f0
      [1750924.568035]  [<ffffffff811bdc49>] SyS_write+0x49/0xa0
      [1750924.568035]  [<ffffffff8172f7ed>] system_call_fastpath+0x1a/0x1f
      [1750924.568035] Code: 66 2e 0f 1f 84 00 00 00 00 00 31 c0 5d c3 66 2e 0f 1f
      84 00 00 00 00 00 66 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 8b 7f 10
      <f0> ff 8f 40 01 00 00 74 0e 48 89 df e8 8e f8 06 e1 5b 5d c3 0f
      [1750924.568035] RIP  [<ffffffffa0131d51>] qib_mcast_qp_free+0x11/0x50
      [ib_qib]
      [1750924.568035]  RSP <ffff88007af1dd70>
      [1750924.650439] ---[ end trace 73d5d4b3f8ad4851 ]
      
      The fix is to note the qib_mcast_qp that was found.   If none is found, then
      return EINVAL indicating the error.
      
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Reported-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      09dc9cd6
    • E
      IB/IPoIB: Fix kernel panic on multicast flow · 50be28de
      Erez Shitrit 提交于
      ipoib_mcast_restart_task calls ipoib_mcast_remove_list with the
      parameter mcast->dev. That mcast is a temporary (used as an iterator)
      variable that may be uninitialized.
      There is no need to send the variable dev to the function, as each mcast
      has its dev as a member in the mcast struct.
      
      This causes the next panic:
      RIP: 0010: ipoib_mcast_leave+0x6d/0xf0 [ib_ipoib]
      RSP: 0018: EFLAGS: 00010246
      RAX: f0201 RBX: 24e00 RCX: 00000
      ....
      ....
      Stack:
      Call Trace:
      	ipoib_mcast_remove_list+0x3a/0x70 [ib_ipoib]
      	ipoib_mcast_restart_task+0x3bb/0x520 [ib_ipoib]
      	process_one_work+0x164/0x470
      	worker_thread+0x11d/0x420
      	...
      
      Fixes: 5a0e81f6 ('IB/IPoIB: factor out common multicast list removal code')
      Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
      Reported-by: NDoron Tsur <doront@mellanox.com>
      Reviewed-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      50be28de
  2. 27 12月, 2015 1 次提交
  3. 24 12月, 2015 32 次提交