提交 · c3efe7500add077f79d37b18e9c66df6621409b6 · openanolis / cloud-kernel

20 1月, 2016 3 次提交

IB/core: Use hop-limit from IP stack for RoCE · c3efe750

由 Matan Barak 提交于 1月 04, 2016

Previously, IPV6_DEFAULT_HOPLIMIT was used as the hop limit value for
RoCE. Fixing that by taking ip4_dst_hoplimit and ip6_dst_hoplimit as
hop limit values.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c3efe750

IB/cm: Fix a recently introduced deadlock · 4bfdf635

由 Bart Van Assche 提交于 1月 01, 2016

ib_send_cm_drep() calls cm_enter_timewait() while holding a spinlock
that can be locked from inside an interrupt handler. Hence do not
enable interrupts inside cm_enter_timewait() if called with interrupts
disabled.

This patch fixes e.g. the following deadlock:
Acked-by: NErez Shitrit <erezsh@mellanox.com>

=================================
[ INFO: inconsistent lock state ]
4.4.0-rc7+ #1 Tainted: G            E
---------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
swapper/8/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
(&(&cm_id_priv->lock)->rlock){?.+...}, at: [<ffffffffa036eec4>] cm_establish+0x
74/0x1b0 [ib_cm]
{HARDIRQ-ON-W} state was registered at:
  [<ffffffff810a3c11>] mark_held_locks+0x71/0x90
  [<ffffffff810a3e87>] trace_hardirqs_on_caller+0xa7/0x1c0
  [<ffffffff810a3fad>] trace_hardirqs_on+0xd/0x10
  [<ffffffff8151c40b>] _raw_spin_unlock_irq+0x2b/0x40
  [<ffffffffa036ea8e>] cm_enter_timewait+0xae/0x100 [ib_cm]
  [<ffffffffa036ff76>] ib_send_cm_drep+0xb6/0x190 [ib_cm]
  [<ffffffffa052ed08>] srp_cm_handler+0x128/0x1a0 [ib_srp]
  [<ffffffffa0370340>] cm_process_work+0x20/0xf0 [ib_cm]
  [<ffffffffa0371335>] cm_dreq_handler+0x135/0x2c0 [ib_cm]
  [<ffffffffa03733c5>] cm_work_handler+0x75/0xd0 [ib_cm]
  [<ffffffff8107184d>] process_one_work+0x1bd/0x460
  [<ffffffff81073148>] worker_thread+0x118/0x420
  [<ffffffff81078454>] kthread+0xe4/0x100
  [<ffffffff8151cbbf>] ret_from_fork+0x3f/0x70
irq event stamp: 1672286
hardirqs last  enabled at (1672283): [<ffffffff81408ec0>] poll_idle+0x10/0x80
hardirqs last disabled at (1672284): [<ffffffff8151d304>] common_interrupt+0x84/0x89
softirqs last  enabled at (1672286): [<ffffffff8105b4dc>] _local_bh_enable+0x1c/0x50
softirqs last disabled at (1672285): [<ffffffff8105b697>] irq_enter+0x47/0x70

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&cm_id_priv->lock)->rlock);
  <Interrupt>
    lock(&(&cm_id_priv->lock)->rlock);

 *** DEADLOCK ***

no locks held by swapper/8/0.

stack backtrace:
CPU: 8 PID: 0 Comm: swapper/8 Tainted: G            E   4.4.0-rc7+ #1
Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
 ffff88045af5e950 ffff88046e503a88 ffffffff81251c1b 0000000000000007
 0000000000000006 0000000000000003 ffff88045af5ddc0 ffff88046e503ad8
 ffffffff810a32f4 0000000000000000 0000000000000000 0000000000000001
Call Trace:
 <IRQ>  [<ffffffff81251c1b>] dump_stack+0x4f/0x74
 [<ffffffff810a32f4>] print_usage_bug+0x184/0x190
 [<ffffffff810a36e2>] mark_lock_irq+0xf2/0x290
 [<ffffffff810a3995>] mark_lock+0x115/0x1b0
 [<ffffffff810a3b8c>] mark_irqflags+0x15c/0x170
 [<ffffffff810a4fef>] __lock_acquire+0x1ef/0x560
 [<ffffffff810a53c2>] lock_acquire+0x62/0x80
 [<ffffffff8151bd33>] _raw_spin_lock_irqsave+0x43/0x60
 [<ffffffffa036eec4>] cm_establish+0x74/0x1b0 [ib_cm]
 [<ffffffffa036f031>] ib_cm_notify+0x31/0x100 [ib_cm]
 [<ffffffffa0637f24>] srpt_qp_event+0x54/0xd0 [ib_srpt]
 [<ffffffffa0196052>] mlx4_ib_qp_event+0x72/0xc0 [mlx4_ib]
 [<ffffffffa00775b9>] mlx4_qp_event+0x69/0xd0 [mlx4_core]
 [<ffffffffa006000e>] mlx4_eq_int+0x51e/0xd50 [mlx4_core]
 [<ffffffffa006084f>] mlx4_msi_x_interrupt+0xf/0x20 [mlx4_core]
 [<ffffffff810b67b0>] handle_irq_event_percpu+0x40/0x110
 [<ffffffff810b68bf>] handle_irq_event+0x3f/0x70
 [<ffffffff810ba7f9>] handle_edge_irq+0x79/0x120
 [<ffffffff81007f3d>] handle_irq+0x5d/0x130
 [<ffffffff810071fd>] do_IRQ+0x6d/0x130
 [<ffffffff8151d309>] common_interrupt+0x89/0x89
 <EOI>  [<ffffffff8140895f>] cpuidle_enter_state+0xcf/0x200
 [<ffffffff81408aa2>] cpuidle_enter+0x12/0x20
 [<ffffffff810990d6>] call_cpuidle+0x36/0x60
 [<ffffffff81099163>] cpuidle_idle_call+0x63/0x110
 [<ffffffff8109930a>] cpu_idle_loop+0xfa/0x130
 [<ffffffff8109934e>] cpu_startup_entry+0xe/0x10
 [<ffffffff8103c443>] start_secondary+0x83/0x90

Fixes: commit be4b4993 ("IB/cm: Do not queue work to a device that's going away")
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Cc: Erez Shitrit <erezsh@mellanox.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4bfdf635

IB/mad: pass ib_mad_send_buf explicitly to the recv_handler · ca281265

由 Christoph Hellwig 提交于 1月 04, 2016

Stop abusing wr_id and just pass the parameter explicitly.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHal Rosenstock <hal@mellanox.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ca281265

23 12月, 2015 4 次提交

IB/core: Validate route when we init ah · 20029832

由 Matan Barak 提交于 12月 23, 2015

In order to make sure API users don't try to use SGIDs which don't
conform to the routing table, validate the route before searching
the RoCE GID table.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

20029832

IB/cm: Use the source GID index type · cb57bb84

由 Matan Barak 提交于 12月 23, 2015

Previosuly, cm and cma modules supported only IB and RoCE v1 GID type.
In order to support multiple GID types, the gid_type is passed to
cm_init_av_by_path and stored in the path record.

The rdma cm client would use a default GID type that will be saved in
rdma_id_private.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

cb57bb84

IB/core: Add gid_type to gid attribute · b39ffa1d

由 Matan Barak 提交于 12月 23, 2015

In order to support multiple GID types, we need to store the gid_type
with each GID. This is also aligned with the RoCE v2 annex "RoCEv2 PORT
GID table entries shall have a "GID type" attribute that denotes the L3
Address type". The currently supported GID is IB_GID_TYPE_IB which is
also RoCE v1 GID type.

This implies that gid_type should be added to roce_gid_table meta-data.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b39ffa1d

IB/core: Avoid calling ib_query_device · 86bee4c9

由 Or Gerlitz 提交于 12月 18, 2015

Use the cached copy of the attributes present on the device, except for
the case of a query originating from user-space, where we have to invoke
the driver query_device entry, so they can fill in their udata.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

86bee4c9

22 10月, 2015 4 次提交

IB/cm: Remove the usage of smac and vid of qp_attr and cm_av · 5c266b23

由 Matan Barak 提交于 10月 15, 2015

The cm and cma don't need to explicitly handle vlan and smac,
as they are resolved from the GID index now. Removing this
portion of code.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Reviewed-By: NDevesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5c266b23

IB/cm: cm_init_av_by_path should find a GID by its netdevice · c2c6ff13

由 Matan Barak 提交于 10月 15, 2015

Previously, the CM has searched the cache for any sgid_index whose
GID matches the path's GID. Since the path record stores the net
device, the CM should now search only for GIDs which originated from
this net device.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Reviewed-By: NDevesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c2c6ff13

IB/core: Add netdev and gid attributes paramteres to cache · 55ee3ab2

由 Matan Barak 提交于 10月 15, 2015

Adding an ability to query the IB cache by a netdev and get the
attributes of a GID. These parameters are necessary in order to
successfully resolve the required GID (when the netdevice is known)
and get the Ethernet L2 attributes from a GID.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Reviewed-By: NDevesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

55ee3ab2

IB/cm: Fix rb-tree duplicate free and use-after-free · 0ca81a28

由 Doron Tsur 提交于 10月 11, 2015

ib_send_cm_sidr_rep could sometimes erase the node from the sidr
(depending on errors in the process). Since ib_send_cm_sidr_rep is
called both from cm_sidr_req_handler and cm_destroy_id, cm_id_priv
could be either erased from the rb_tree twice or not erased at all.
Fixing that by making sure it's erased only once before freeing
cm_id_priv.

Fixes: a977049d ('[PATCH] IB: Add the kernel CM implementation')
Signed-off-by: NDoron Tsur <doront@mellanox.com>
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0ca81a28

31 8月, 2015 5 次提交

IB/cm: Remove compare_data checks · 73fec7fd

由 Haggai Eran 提交于 7月 30, 2015

Now that there are no ib_cm clients using the compare_data feature for
matching IB CM requests' private data, remove the compare_data parameter of
ib_cm_listen and remove the code implementing the feature.
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

73fec7fd

IB/cm: Expose BTH P_Key in CM and SIDR request events · 24cad9a7

由 Haggai Eran 提交于 7月 30, 2015

The rdma_cm module will later use the P_Key from the BTH to de-mux
requests.

See discussion at:
  http://www.spinics.net/lists/netdev/msg336067.html

Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Liran Liss <liranl@mellanox.com>
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

24cad9a7

IB/cm: Share listening CM IDs · 067b171b

由 Haggai Eran 提交于 7月 30, 2015

Enabling network namespaces for RDMA CM will allow processes on different
namespaces to listen on the same port. In order to leave namespace support
out of the CM layer, this requires that multiple RDMA CM IDs will be able
to share a single CM ID.

This patch adds infrastructure to retrieve an existing listening ib_cm_id,
based on its device and service ID, or create a new one if one does not
already exist. It also adds a reference count for such instances
(cm_id_private.listen_sharecount), and prevents cm_destroy_id from
destroying a CM if it is still shared. See the relevant discussion [1].

[1] Re: [PATCH v3 for-next 05/13] IB/cm: Reference count ib_cm_ids
    http://www.spinics.net/lists/netdev/msg328860.htmlReviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

067b171b

IB/cm: Expose service ID in request events · 15865e7d

由 Haggai Eran 提交于 7月 30, 2015

Expose the service ID on an incoming CM or SIDR request to the event
handler. This will allow the RDMA CM module to de-multiplex connection
requests based on the information encoded in the service ID.
Acked-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

15865e7d

IB/core: lock client data with lists_rwsem · 7c1eb45a

由 Haggai Eran 提交于 7月 30, 2015

An ib_client callback that is called with the lists_rwsem locked only for
read is protected from changes to the IB client lists, but not from
ib_unregister_device() freeing its client data. This is because
ib_unregister_device() will remove the device from the device list with
lists_rwsem locked for write, but perform the rest of the cleanup,
including the call to remove() without that lock.

Mark client data that is undergoing de-registration with a new going_down
flag in the client data context. Lock the client data list with lists_rwsem
for write in addition to using the spinlock, so that functions calling the
callback would be able to lock only lists_rwsem for read and let callbacks
sleep.

Since ib_unregister_client() now marks the client data context, no need for
remove() to search the context again, so pass the client data directly to
remove() callbacks.
Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

7c1eb45a

15 7月, 2015 1 次提交

IB/cm: Do not queue work to a device that's going away · be4b4993

由 Erez Shitrit 提交于 6月 25, 2015

Whenever ib_cm gets remove_one call, like when there is a hot-unplug
event, the driver should mark itself as going_down and confirm that no
new works are going to be queued for that device.
so, the order of the actions are:
1. mark the going_down bit.
2. flush the wq.
3. [make sure no new works for that device.]
4. unregister mad agent.

otherwise, works that are already queued can be scheduled after the mad
agent was freed.
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

be4b4993

13 6月, 2015 1 次提交

IB/mad: Support alternate Base Versions when creating MADs · da2dfaa3

由 Ira Weiny 提交于 6月 06, 2015

In preparation to support the new OPA MAD Base version, add a base version
parameter to ib_create_send_mad and set it to IB_MGMT_BASE_VERSION for current
users.

Definition of the new base version and it's processing will occur in later
patches.
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

da2dfaa3

21 5月, 2015 1 次提交

ib/cm: Change reject message type when destroying cm_id · c29ed5a4

由 Ted Kim 提交于 5月 14, 2015

Problem reported by: Ted Kim <ted.h.kim@oracle.com>:

We have a case where a Linux system and a non-Linux system are
trying to interoperate.  The Linux host is the active side and
starts the connection establishment, but later decides to not go
through with the connection setup and does rdma_destroy_id().

The rdma_destroy_id() eventually works its way down to cm_destroy_id()
in core/cm.c, where a REJ is sent. The non-Linux system
has some trouble recognizing the REJ because of:

A. CM states which can't receive the REJ
B. Some issues about REJ formatting (missing comm ID)

ISSUE A: That part of the spec says, a Consumer Reject REJ can be
sent for a connection abort, but it goes further
and says: can send a REJ message with a "Consumer Reject"
Reason code if they are in a CM state (i.e. REP
Rcvd, MRA(REP) Sent, REQ Rcvd, MRA Sent) that allows
a REJ to be sent (lines 35-38).

Of the states listed there in that sentence, it would
seem to limit the active side to using the Consumer Reject
(for the abort case) in just the REP-Rcvd and MRA-REP-Sent
states. That is basically only after the active side
sees a REP (or alternatively goes down the state transitions
to timeout in which case a Timeout REJ is sent).

As a fix, in cm-destroy-id() move the IB-CM-MRA-REQ-RCVD case
to the same as REQ-SENT.  Essentially, make a REJ sent after
getting an MRA on active side a timeout rather than Consumer-
Reject, which is arguably more correct with the CM state
diagrams previous to getting a REP.
Signed-off-by: NTed Kim <ted.h.kim@oracle.com>
Signed-off-by: NSean Hefty <sean.hefty@intel.com>

c29ed5a4

19 5月, 2015 2 次提交

IB/Verbs: Use management helper rdma_cap_ib_cm() · 72219cea

由 Michael Wang 提交于 5月 05, 2015

Introduce helper rdma_cap_ib_cm() to help us check if the port of an
IB device support Infiniband Communication Manager.
Signed-off-by: NMichael Wang <yun.wang@profitbricks.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Tested-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NSean Hefty <sean.hefty@intel.com>
Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

72219cea

IB/Verbs: Reform IB-core cm · 091e6a4c

由 Michael Wang 提交于 5月 05, 2015

Use raw management helpers to reform IB-core cm.
Signed-off-by: NMichael Wang <yun.wang@profitbricks.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Tested-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NSean Hefty <sean.hefty@intel.com>
Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

091e6a4c

06 5月, 2015 1 次提交

IB/core: Fix unaligned accesses · 0d0f738f

由 David Ahern 提交于 5月 03, 2015

Addresses the following kernel logs seen during boot of sparc systems:

Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm]
Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm]
Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm]
Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm]
Kernel unaligned access at TPC[103bce50] cm_find_listen+0x34/0xf8 [ib_cm]
Signed-off-by: NDavid Ahern <david.ahern@oracle.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0d0f738f

11 8月, 2014 1 次提交

IB/mad: add new ioctl to ABI to support new registration options · 0f29b46d

由 Ira Weiny 提交于 8月 08, 2014

Registrations options are specified through flags.  Definitions of flags will
be in subsequent patches.
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

0f29b46d

02 4月, 2014 1 次提交

IB/core: Don't resolve passive side RoCE L2 address in CMA REQ handler · b2853fd6

由 Moni Shoua 提交于 3月 27, 2014

The code that resolves the passive side source MAC within the rdma_cm
connection request handler was both redundant and buggy, so remove it.

It was redundant since later, when an RC QP is modified to RTR state,
the resolution will take place in the ib_core module.  It was buggy
because this callback also deals with UD SIDR exchange, for which we
incorrectly looked at the REQ member of the CM event and dereferenced
a random value.

Fixes: dd5f03be ("IB/core: Ethernet L2 attributes in verbs/cm structures")
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

b2853fd6

20 1月, 2014 1 次提交

IB/cm: Fix missing unlock on error in cm_init_qp_rtr_attr() · 990acea6

由 Wei Yongjun 提交于 1月 16, 2014

Add the missing unlock before return from function cm_init_qp_rtr_attr()
in the error handling case.

Fixes: dd5f03be ("IB/core: Ethernet L2 attributes in verbs/cm structures")
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

990acea6

15 1月, 2014 1 次提交

IB/core: Ethernet L2 attributes in verbs/cm structures · dd5f03be

由 Matan Barak 提交于 12月 12, 2013

This patch add the support for Ethernet L2 attributes in the
verbs/cm/cma structures.

When dealing with L2 Ethernet, we should use smac, dmac, vlan ID and priority
in a similar manner that the IB L2 (and the L4 PKEY) attributes are used.

Thus, those attributes were added to the following structures:

* ib_ah_attr - added dmac
* ib_qp_attr - added smac and vlan_id, (sl remains vlan priority)
* ib_wc - added smac, vlan_id
* ib_sa_path_rec - added smac, dmac, vlan_id
* cm_av - added smac and vlan_id

For the path record structure, extra care was taken to avoid the new
fields when packing it into wire format, so we don't break the IB CM
and SA wire protocol.

On the active side, the CM fills. its internal structures from the
path provided by the ULP.  We add there taking the ETH L2 attributes
and placing them into the CM Address Handle (struct cm_av).

On the passive side, the CM fills its internal structures from the WC
associated with the REQ message.  We add there taking the ETH L2
attributes from the WC.

When the HW driver provides the required ETH L2 attributes in the WC,
they set the IB_WC_WITH_SMAC and IB_WC_WITH_VLAN flags. The IB core
code checks for the presence of these flags, and in their absence does
address resolution from the ib_init_ah_from_wc() helper function.

ib_modify_qp_is_ok is also updated to consider the link layer. Some
parameters are mandatory for Ethernet link layer, while they are
irrelevant for IB.  Vendor drivers are modified to support the new
function signature.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

dd5f03be

17 11月, 2013 1 次提交

IB/cm: Convert to using idr_alloc_cyclic() · ab626d1a

由 Zhao Hongjiang 提交于 11月 15, 2013

Commit 3e6628c4 ("idr: introduce idr_alloc_cyclic()") adds a new
idr_alloc_cyclic() routine and converts several of these users to it.
This is just a missed one - add it.
Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

ab626d1a

28 2月, 2013 2 次提交

idr: remove MAX_IDR_MASK and move left MAX_IDR_* into idr.c · e8c8d1bc

由 Tejun Heo 提交于 2月 27, 2013

MAX_IDR_MASK is another weirdness in the idr interface.  As idr covers
whole positive integer range, it's defined as 0x7fffffff or INT_MAX.

Its usage in idr_find(), idr_replace() and idr_remove() is bizarre.
They basically mask off the sign bit and operate on the rest, so if
the caller, by accident, passes in a negative number, the sign bit
will be masked off and the remaining part will be used as if that was
the input, which is worse than crashing.

The constant is visible in idr.h and there are several users in the
kernel.

* drivers/i2c/i2c-core.c:i2c_add_numbered_adapter()

  Basically used to test if adap->nr is a negative number which isn't
  -1 and returns -EINVAL if so.  idr_alloc() already has negative
  @start checking (w/ WARN_ON_ONCE), so this can go away.

* drivers/infiniband/core/cm.c:cm_alloc_id()
  drivers/infiniband/hw/mlx4/cm.c:id_map_alloc()

  Used to wrap cyclic @start.  Can be replaced with max(next, 0).
  Note that this type of cyclic allocation using idr is buggy.  These
  are prone to spurious -ENOSPC failure after the first wraparound.

* fs/super.c:get_anon_bdev()

  The ID allocated from ida is masked off before being tested whether
  it's inside valid range.  ida allocated ID can never be a negative
  number and the masking is unnecessary.

Update idr_*() functions to fail with -EINVAL when negative @id is
specified and update other MAX_IDR_MASK users as described above.

This leaves MAX_IDR_MASK without any user, remove it and relocate
other MAX_IDR_* constants to lib/idr.c.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: "Marciniszyn, Mike" <mike.marciniszyn@intel.com>
Cc: Jack Morgenstein <jackm@dev.mellanox.co.il>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: NWolfram Sang <wolfram@the-dreams.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e8c8d1bc

IB/core: convert to idr_alloc() · 3b069c5d

由 Tejun Heo 提交于 2月 27, 2013

Convert to the much saner new idr interface.

v2: Mike triggered WARN_ON() in idr_preload() because send_mad(),
    which may be used from non-process context, was calling
    idr_preload() unconditionally.  Preload iff @gfp_mask has
    __GFP_WAIT.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NSean Hefty <sean.hefty@intel.com>
Reported-by: N"Marciniszyn, Mike" <mike.marciniszyn@intel.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3b069c5d

06 10月, 2012 1 次提交

idr: rename MAX_LEVEL to MAX_IDR_LEVEL · 125c4c70

由 Fengguang Wu 提交于 10月 04, 2012

To avoid name conflicts:

  drivers/video/riva/fbdev.c:281:9: sparse: preprocessor token MAX_LEVEL redefined

While at it, also make the other names more consistent and add
parentheses.

[akpm@linux-foundation.org: repair fallout]
[sfr@canb.auug.org.au: IB/mlx4: fix for MAX_ID_MASK to MAX_IDR_MASK name change]
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: walter harms <wharms@bfs.de>
Cc: Glauber Costa <glommer@parallels.com>
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

125c4c70

12 7月, 2012 1 次提交

IB/cm: Destroy idr as part of the module init error flow · 87d4abda

由 Dotan Barak 提交于 7月 11, 2012

Clean the idr as part of the error flow since it is a resource too.
Signed-off-by: NDotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

87d4abda

04 1月, 2012 1 次提交

switch device_get_devnode() and ->devnode() to umode_t * · 2c9ede55

由 Al Viro 提交于 7月 23, 2011

both callers of device_get_devnode() are only interested in lower 16bits
and nobody tries to return anything wider than 16bit anyway.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2c9ede55

01 11月, 2011 1 次提交

infiniband: Fix up module files that need to include module.h · e4dd23d7

由 Paul Gortmaker 提交于 5月 27, 2011

They had been getting it implicitly via device.h but we can't
rely on that for the future, due to a pending cleanup so fix
it now.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

e4dd23d7

14 10月, 2011 3 次提交

IB/cm: Do not automatically disconnect XRC TGT QPs · 2622e18e

由 Sean Hefty 提交于 5月 30, 2011

Because an XRC TGT QP can end up being shared among multiple
processes, don't have the ib_cm automatically send a DREQ when the
userspace process that owns the ib_cm_id exits.  Disconnect can be
initiated by the user directly; otherwise, the owner of the XRC INI QP
controls the connection.

Note that as a result of the process exiting, the ib_cm will stop
tracking the XRC connection on the target side.  For the purposes of
disconnecting, this isn't a big deal.  The ib_cm will respond to the
DREQ appropriately.  For other messages, mainly LAP, the CM will
reject the request, since there's no one available to route the
request to.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

2622e18e

IB/cm: Update XRC support based on XRC annex errata · ef700446

由 Sean Hefty 提交于 8月 02, 2011

The XRC annex was updated to have XRC behave more like RD. Specifically,
the XRC TGT QPN moves from the local QPN to local EECN field.  Lookup of
SRQN is done using the REQ/REP protocol.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

ef700446

IB/cm: Update protocol to support XRC · d26a360b

由 Sean Hefty 提交于 5月 13, 2011

Update the REQ and REP messages to support XRC connection setup
according to the XRC Annex.  Several existing fields must be set to 0 or
1 when connecting XRC QPs, and a reserved field is changed to an
extended transport type.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

d26a360b

05 7月, 2011 1 次提交

RDMA: Check for NULL mode in .devnode methods · b2bc4782

由 Goldwyn Rodrigues 提交于 7月 04, 2011

Commits 71c29bd5 ("IB/uverbs: Add devnode method to set path/mode")
and c3af0980 ("IB: Add devnode methods to cm_class and umad_class")
added devnode methods that set the mode.

However, these methods don't check for a NULL mode, and so we get a
crash when unloading modules because devtmpfs_delete_node() calls
device_get_devnode() with mode == NULL.

Add the missing checks.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.de>
[ Also fix cm.c.  - Roland ]
Signed-off-by: NRoland Dreier <roland@purestorage.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b2bc4782

24 5月, 2011 1 次提交

IB: Add devnode methods to cm_class and umad_class · c3af0980

由 Roland Dreier 提交于 5月 23, 2011

We want the ucmX, umadX and issmX device nodes to show up under
/dev/infiniband, and additionally ucmX should have mode 0666.  Add
appropriate devnode methods to their class structs for this.
Signed-off-by: NRoland Dreier <roland@purestorage.com>

c3af0980

16 3月, 2011 2 次提交

IB/cm: Cancel pending LAP message when exiting IB_CM_ESTABLISH state · 8d8ac865

由 Sean Hefty 提交于 3月 03, 2011

This problem was reported by Moni Shoua <monis@mellanox.com> and Amir
Vadai <amirv@mellanox.com>:

	When destroying a cm_id from a context of a work queue and if
	the lap_state of this cm_id is IB_CM_LAP_SENT, we need to
	release the reference of this id that was taken upon the send
	of the LAP message.  Otherwise, if the expected APR message
	gets lost, it is only after a long time that the reference
	will be released, while during that the work handler thread is
	not available to process other things.

It turns out that we need to cancel any pending LAP messages whenever
we transition out of the IB_CM_ESTABLISH state.  This occurs when
disconnecting - either sending or receiving a DREQ.  It can also
happen in a corner case where we receive a REJ message after sending
an RTU, followed by a LAP.  Add checks and cancel any outstanding LAP
messages in these three cases.

Canceling the LAP when sending a DREQ fixes the destroy problem
reported by Moni.  When a cm_id is destroyed in the IB_CM_ESTABLISHED
state, it sends a DREQ to the remote side to notify the peer that the
connection is going away.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

8d8ac865

IB/cm: Bump reference count on cm_id before invoking callback · 29963437

由 Sean Hefty 提交于 2月 23, 2011

When processing a SIDR REQ, the ib_cm allocates a new cm_id.  The
refcount of the cm_id is initialized to 1.  However, cm_process_work
will decrement the refcount after invoking all callbacks.  The result
is that the cm_id will end up with refcount set to 0 by the end of the
sidr req handler.

If a user tries to destroy the cm_id, the destruction will proceed,
under the incorrect assumption that no other threads are referencing
the cm_id.  This can lead to a crash when the cm callback thread tries
to access the cm_id.

This problem was noticed as part of a larger investigation with kernel
crashes in the rdma_cm when running on a real time OS.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Acked-by: NDoug Ledford <dledford@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

29963437

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功