提交 · 109d67e4f12b828113ca8ccf4a735972dd984f40 · openeuler / Kernel

28 4月, 2009 10 次提交

RDMA/nes: Fix hang issues for large cluster dynamic connections · 109d67e4

由 Faisal Latif 提交于 4月 27, 2009

Running large cluster setup, we are hanging after many hours of
testing.  Fixing this required going over the code and making sure the
rexmit entry was properly removed based on the cm_node's state and
packet received.  Also when receiving a FIN packet, check seq# and
make sure there were no errors before calling handle_fin().

Following are the changes done in nes_cm.c:

* handle_ack_pkt() needs to return error value, so in case of error,
  handle_fin() is not called. Some cleanup done while going over the code.

* handle_rst_pkt(), handling of cm_node's NES_CM_STATE_LAST_ACK is missing.

* process_packet(), in case of FIN only packet is received, call
  check_seq() before processing.

* in handle_fin_pkt(), we are calling cleanup_retrans_entry() for all
  conditions, even if the packets need to be dropped.
Signed-off-by: NFaisal Latif <faisal.latif@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

109d67e4

RDMA/nes: Increase rexmit timeout interval · 4e9c3900

由 Faisal Latif 提交于 4月 27, 2009

Under heavy load with large cluster testing, it may take longer to
receive a response to MPA requests.  Change the driver to wait longer
after each rexmit to max time value.
Signed-off-by: NFaisal Latif <faisal.latif@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

4e9c3900

RDMA/nes: Check for sequence number wrap-around · c11470f9

由 Faisal Latif 提交于 4月 27, 2009

check_seq() was not checking if the seq#s have wrapped.  Fix it.
Signed-off-by: NFaisal Latif <faisal.latif@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

c11470f9

RDMA/nes: Do not set apbvt entry for loopback · 53094c38

由 Faisal Latif 提交于 4月 27, 2009

When a connect request comes, apbvt should only be set for
non-loopback connections.
Signed-off-by: NFaisal Latif <faisal.latif@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

53094c38

RDMA/nes: Fix unused variable compile warning when INFINIBAND_NES_DEBUG=n · 1f0dba1e

由 Chien Tung 提交于 4月 27, 2009

Remove the NES_DEBUG that is causing the compile warning about an
unused variable when INFINIBAND_NES_DEBUG is not enabled.
Signed-off-by: NChien Tung <chien.tin.tung@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

1f0dba1e

RDMA/nes: Fix fw_ver in /sys · 0e4562da

由 Chien Tung 提交于 4月 27, 2009

/sys/class/infiniband/nes?/fw_ver is not displaying firmware version
properly (it shows 0.0.0 with the current code).  Fill in the correct
firmware version number.
Signed-off-by: NChien Tung <chien.tin.tung@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

0e4562da

RDMA/nes: Set trace length to 1 inch for SFP_D · 92322377

由 Chien Tung 提交于 4月 27, 2009

With updated PHY firmware for SFP_D, setting the trace length to 1
inch for SFP_D provides a more stable link.
Signed-off-by: NChien Tung <chien.tin.tung@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

92322377

RDMA/nes: Enable repause timer for port 1 · e998c25b

由 Chien Tung 提交于 4月 27, 2009

Enable repause timer for port 1.  Without this setting, under stress,
the chip may misbehave.
Signed-off-by: NChien Tung <chien.tin.tung@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

e998c25b

RDMA/nes: Correct CDR loop filter setting for port 1 · 366835e2

由 Chien Tung 提交于 4月 27, 2009

In commit 1b949324 ("RDMA/nes: Fix SFP+ PHY initialization") there is
a mistake in the clean up code that removed port 1 CDR loop filter
settings for 10G cards other than CX4.  Put the correct setting back
for appropriate PHY types.
Signed-off-by: NChien Tung <chien.tin.tung@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

366835e2

RDMA/nes: Modify thermo mitigation to flip SerDes1 ref clk to internal · 010db4d1

由 Chien Tung 提交于 4月 27, 2009

Change thermo mitigation code to flip the SerDes1 reference clock to
internal, to match the change in commit a4849fc1 ("RDMA/nes: Add
wide_ppm_offset parm for switch compatibility").
Signed-off-by: NChien Tung <chien.tin.tung@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

010db4d1

22 4月, 2009 2 次提交

RDMA/nes: Fix resource issues in nes_create_cq() and nes_destroy_cq() · 5d1af5c8

由 Miroslaw Walukiewicz 提交于 4月 21, 2009

In error paths where a CQ is not created, pbl is not freeed properly.

In nes_destroy_cq(), add the corresponding check for nescq->mcrqf to
not call nes_free_resource() when it is already done in nes_create_cq().
Signed-off-by: NMiroslaw Walukiewicz <miroslaw.walukiewicz@intel.com>
Signed-off-by: NChien Tung <chien.tin.tung@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

5d1af5c8

RDMA/nes: Remove root_256()'s unused pbl_count_256 parameter · cc005fa2

由 Matt Kraai 提交于 4月 21, 2009

Signed-off-by: NMatt Kraai <kraai@ftbfs.org>
Acked-by: NChien Tung <chien.tin.tung@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

cc005fa2

21 4月, 2009 2 次提交

RDMA/nes: Fix bugs in nes_reg_phys_mr() · 3f32eb11

由 Don Wood 提交于 4月 20, 2009

The code incorrectly failed memory registration if the buffer was not
page aligned.  Also, the length field is mangled causing the hardware
to think the registration is much larger than it really is.

The fix is to remove the page alignment restriction as well the
incorrect length adjustment.  Also make sure that all buffers after
the first start at a page boundary, and all buffers except the last
end on a page boundary.
Signed-off-by: NDon Wood <donald.e.wood@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

3f32eb11

RDMA/nes: Fix compiler warning at nes_verbs.c:1955 · 1af9222b

由 Chien Tung 提交于 4月 20, 2009

Initialize pbl_count_256 to 0 to get rid of the warning:

drivers/infiniband/hw/nes/nes_verbs.c: In function 'nes_reg_mr':
drivers/infiniband/hw/nes/nes_verbs.c:1955: warning: 'pbl_count_256' may be used uninitialized in this function
Reported-by: NRoland Dreier <rdreier@cisco.com>
Signed-off-by: NChien Tung <chien.tin.tung@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

1af9222b

09 4月, 2009 6 次提交

RDMA/nes: Add support for new SFP+ PHY · 4303565d

由 Chien Tung 提交于 4月 08, 2009

Add new register settings for new SFP+ PHY/firmware.
Add new PHY to to nes_netdev_get/set_settings.
Signed-off-by: NChien Tung <chien.tin.tung@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

4303565d

RDMA/nes: Add wide_ppm_offset parm for switch compatibility · a4849fc1

由 Chien Tung 提交于 4月 08, 2009

We have observed unstable link with a new BNT switch.

Add wide_ppm_offset parameter to allow the user to control the clock
ppm offset on the CX4 interface for better compatibility.  Default is
100ppm, setting it to 1 will increase it to 300ppm.  Change default
SerDes1 reference clock to external source.
Signed-off-by: NChien Tung <chien.tin.tung@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

a4849fc1

RDMA/nes: Fix SFP+ PHY initialization · 1b949324

由 Chien Tung 提交于 4月 08, 2009

SFP+ PHY initialization has very long delays, incorrect settings for
direct attach copper cables, and inconsistent link detection.

Adjust delays to the minimum required by the PHY.  Worst case is now
less than 4 seconds.  Add new register settings for direct attach
cables.  Change link detection logic to use two new registers for more
consistent link state detection.  Reorganize code to shorten line
length.
Signed-off-by: NChien Tung <chien.tin.tung@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

1b949324

RDMA/nes: Fix nes_nic_cm_xmit() error handling · 5962c2c8

由 Faisal Latif 提交于 4月 08, 2009

We are getting crash or hung situation when we are running network
cable pull tests during RDMA traffic.

In schedule_nes_timer(), we return an error if nes_nic_cm_xmit()
returns failure.  This is changed to success as skb is being put on
the timer routines to be processed later.  In send_syn() case, we are
indicating connect failure once from nes_connect() and the other when
the rexmit retries expires.

The other issue is skb->users which we are incrementing before calling
nes_nic_cm_xmit() which calls dev_queue_xmit() but in case of failure
we are decrementing the skb->users at the same time putting the skb on
the rexmit path.  Even if dev_queue_xmit() fails, the skb->users is
decremented already.  We are removing the decrement of skb->users in
case of failure from both schedule_nes_timer() as well as from
nes_cm_timer_tick().

There is also extra check in nes_cm_timer_tick() for rexmit failure
which does a break from the loop is removed.  This causes problem as
the other nodes have their cm_node->ref_count incremented and are not
processed.
Signed-off-by: NFaisal Latif <faisal.latif@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

5962c2c8

RDMA/nes: Fix error handling issues · 79fc3d74

由 Faisal Latif 提交于 4月 08, 2009

Fix issues found by static code analysis:

(1) Check if cm_node was successfully created for loopback connection.

(2) schedule_nes_timer() does not free up allocated memory after
    encountering an error.  There is a WARN_ON() for this condition.

(3) there is a cm_node->freed flag which is set but not used.
Reported-by: NDan Carpenter <error27@gmail.com>
Signed-off-by: NFaisal Latif <faisal.latif@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

79fc3d74

RDMA/nes: Fix incorrect casts on 32-bit architectures · 7a5efb62

由 Don Wood 提交于 4月 08, 2009

The were some incorrect casts to unsigned long that caused 64-bit values
to be truncated on 32-bit architectures and made the driver pass invalid
adresses and lengths to the hardware.  The problems were primarily seen
with kernels with highmem configured but some could show up in
non-highmem kernels, too.
Signed-off-by: NDon Wood <donald.e.wood@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

7a5efb62

07 4月, 2009 2 次提交

dma-mapping: replace all DMA_32BIT_MASK macro with DMA_BIT_MASK(32) · 284901a9

由 Yang Hongyang 提交于 4月 06, 2009

Replace all DMA_32BIT_MASK macro with DMA_BIT_MASK(32)

Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

284901a9

dma-mapping: replace all DMA_64BIT_MASK macro with DMA_BIT_MASK(64) · 6a35528a

由 Yang Hongyang 提交于 4月 06, 2009

Replace all DMA_64BIT_MASK macro with DMA_BIT_MASK(64)

Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6a35528a

30 3月, 2009 3 次提交

RDMA/cxgb3: Release dependent resources only when endpoint memory is freed. · 874d8df5

由 Steve Wise 提交于 3月 30, 2009

The cxgb3 l2t entry, hwtid, and dst entry were being released before
all the iwch_ep references were released.  This can cause a crash in
t3_l2t_send_slow() and other places where the l2t entry is used.

The fix is to defer releasing these resources until all endpoint
references are gone.

Details:

- move flags field to the iwch_ep_common struct.
- add a flag indicating resources are to be released.
- release resources at endpoint free time instead of close/abort time.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

874d8df5

RDMA/cxgb3: Handle EEH events · 04b5d028

由 Steve Wise 提交于 3月 30, 2009

- wrap calls into cxgb3 and fail them if we're in the middle
  of a PCI EEH event.

- correctly unwind and release endpoint and other resources when
  we are in an EEH event.

- dispatch IB_EVENT_DEVICE_FATAL event when cxgb3 notifies iw_cxgb3 of
  a fatal error.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

04b5d028

IB/mlx4: Use pgprot_writecombine() for BlueFlame pages · e1d60ec6

由 Roland Dreier 提交于 3月 30, 2009

The PAT work on x86 has finally made pgprot_writecombine() a usable API
for modular drivers. As the comment indicates, this is exactly what we
want to use in mlx4_ib to map BlueFlame pages up to userspace, since
using WC for these pages improves small message latency significantly.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

e1d60ec6

27 3月, 2009 1 次提交

RDMA/nes: Fix mis-merge · 7c757eb9

由 Roland Dreier 提交于 3月 26, 2009

When net-next and infiniband were merged upstream, each branch deleted
one of a pair of adjacent lines from nes_nic.c, but when Linus fixed the
conflict up, he brought back both of the lines.  Fix up to the intended
final tree state.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7c757eb9

25 3月, 2009 1 次提交

RDMA/cxgb3: Enforce required firmware · d1fbe04e

由 Steve Wise 提交于 3月 24, 2009

The cxgb3 NIC driver can handle more firmware versions than iw_cxgb3,
and since commit 8207befa ("cxgb3: untie strict FW matching") cxgb3
will load with firmware versions that iw_cxgb3 can't handle.  The FW
major number indicates a specific interface between the FW and
iw_cxgb3.  Thus if the major number of the running firmware does not
match the required version compiled into iw_cxgb3, then iw_cxgb3 must
not register that device.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

d1fbe04e

22 3月, 2009 2 次提交

infiniband: convert nes driver to net_device_ops · d0929553

由 Stephen Hemminger 提交于 3月 20, 2009

Also, removed unnecessary memset() since alloc_netdev returns
zeroed memory.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d0929553

infiniband: convert c2 to net_device_ops · 687c75dc

由 Stephen Hemminger 提交于 3月 20, 2009

Convert this driver to new net_device_ops infrastructure.
Also use default net_device get-stats infrastructure
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

687c75dc

19 3月, 2009 1 次提交

IB/mlx4: Unregister IB device prior to CLOSE PORT command · a6a47771

由 Yevgeny Petrilin 提交于 3月 18, 2009

According to the ConnectX programmer's reference manual, all
operations should be stopped, all QPs should be torn down and all WQEs
flushed before the CLOSE_PORT command is invoked.  In some cases
reversing the order of operations (as implemented now) could cause
a loss of completions.
Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

a6a47771

13 3月, 2009 1 次提交

RDMA/nes: Don't allow userspace QPs to use STag zero · c12e56ef

由 Faisal Latif 提交于 3月 12, 2009

STag zero is a special STag that allows consumers to access any bus
address without registering memory.  The nes driver unfortunately
allows STag zero to be used even with QPs created by unprivileged
userspace consumers, which means that any process with direct verbs
access to the nes device can read and write any memory accessible to
the underlying PCI device (usually any memory in the system).  Such
access is usually given for cluster software such as MPI to use, so
this is a local privilege escalation bug on most systems running this
driver.

The driver was using STag zero to receive the last streaming mode
data; to allow STag zero to be disabled for unprivileged QPs, the
driver now registers a special MR for this data.

Cc: <stable@kernel.org>
Signed-off-by: NFaisal Latif <faisal.latif@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c12e56ef

07 3月, 2009 8 次提交

RDMA/nes: Handle MPA Reject message properly · 9d5ab133

由 Faisal Latif 提交于 3月 06, 2009

While doing testing, there are failures as MPA Reject call is not
handled.  To handle MPA Reject call, following changes are done:

*Handle inbound/outbound MPA Reject response message.
	When nes_reject() is called for pending MPA request reply,
	send the MPA Reject message to its peer (active
	side)cm_node. The peer cm_node (active side) will indicate
	Reject message event for the pending Connect Request.

*Handle MPA Reject response message for loopback connections and listener.
	When MPA Request is rejected, check if it is a loopback
	connection and if it is then it will send Reject message event
	to its peer loopback node. Also when destroying listener,
	check if the cm_nodes for that listener are loopback or not.

*Add gracefull connection close with the MPA Reject response message.
	Send gracefull close (FIN, FIN ACK..) to terminate the cm_nodes.

*Some code re-org while making the above changes.
	Removed recv_list and recv_list_lock from the cm_node
	structure as there can be only one receive close entry on the
	timer. Also implemented handle_recv_entry() as receive close
	entry is processed from both nes_rem_ref_cm_node() as well as
	nes_cm_timer_tick().
Signed-off-by: NFaisal Latif <faisal.latif@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

9d5ab133

RDMA/nes: Improve use of PBLs · 0145f341

由 Don Wood 提交于 3月 06, 2009

Two level 256 byte PBLs was not implemented so the driver could report
out of memory when in fact there were PBLs still available.

This solution prefers to use 4KB PBLs over two level 256B PBLs until
the number of 4KB PBLs falls below a threshold.  At this point the 4KB
PBL structure is converted to use 256B PBLs which prevents the driver
from running out of 4KB PBLs too quickly.
Signed-off-by: NDon Wood <donald.e.wood@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

0145f341

RDMA/nes: Remove LLTX · 2869975c

由 Faisal Latif 提交于 3月 06, 2009

NETIF_F_LLTX is deprecated. Remove private TX locking from the driver
and remove the NETIF_F_LLTX feature flag.  This also fixes a warning
in some configs that comes from doing skb_linearize() call in the
hard_start_xmit method with IRQs disabled (if HIGHMEM is enabled,
skb_linearize() may end up enabling BHs, which is a no-no if hard IRQs
are disabled in that context).  By getting rid of LLTX, we do not
disable IRQs when skb_linearize() is called.

Remove the sq_lock as it is not needed for non-LLTX.  Fix ethtool not
to show the counter for sq_lock.

Reported-by: aluno3@poczta.onet.pl
Signed-off-by: NFaisal Latif <faisal.latif@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

2869975c

RDMA/nes: Inform hardware that asynchronous event has been handled · fd87778c

由 Don Wood 提交于 3月 06, 2009

When asynchronous events are processed by software, it is necessary
to let the hardware know that software has handled the event.  This
frees up the entry in the asynchronous event queue.
Signed-off-by: NDon Wood <donald.e.wood@intel.com>
Signed-off-by: NChien Tung <chien.tin.tung@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

fd87778c

RDMA/nes: Fix tmp_addr compilation warning · 7b14ab0b

由 Chien Tung 提交于 3月 06, 2009

In find_node(), tmp_addr causes an "unused variable" warning when
INFINIBAND_NES_DEBUG is not defined.  It's only used in a nes_debug()
and the print does not make sense.  So take out the whole thing.
Reported-by: NManish Katiyar <mkatiyar@gmail.com>
Signed-off-by: NChien Tung <chien.tin.tung@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

7b14ab0b

RDMA/nes: Report correct vendor_id and vendor_part_id · b9c367e7

由 Chien Tung 提交于 3月 06, 2009

ibv_devinfo displays 0 for vendor_id and vendor_part_id.  Fill in OUI
and device_id for those two fields.
Signed-off-by: NChien Tung <chien.tin.tung@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

b9c367e7

RDMA/nes: Update copyright to new legal entity and year · cd6853d3

由 Chien Tung 提交于 3月 06, 2009

Update copyright to the new legal entity, Intel-NE, Inc., an Intel
company.  Update copyright for the new year.
Signed-off-by: NChien Tung <chien.tin.tung@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

cd6853d3

RDMA/nes: Account for freed PBL after HW operation · dae5d13a

由 Don Wood 提交于 3月 06, 2009

Fix occurrences where the software PBL counts were changed before the
hardware was updated.  This bug allowed another thread to overallocate
the hardware resources.

Add proper PBL accounting in case nes_reg_mr() fails.
Signed-off-by: NDon Wood <donald.e.wood@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

dae5d13a

23 2月, 2009 1 次提交

IB/ipath: Really run work in ipath_release_user_pages_on_close() · e5380527

由 Roland Dreier 提交于 2月 22, 2009

ipath_release_user_pages_on_close() just allocated a structure to
schedule work with but just returned (leaking the structure) rather than
actually doing schedule_work(). Fix the logic to what was intended.

This was spotted by the Coverity checker (CID 2700).
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

e5380527

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功