提交 · e48f129c2f200dde8899f6ea5c6e7173674fc482 · openanolis / cloud-kernel

26 9月, 2011 1 次提交

[SCSI] cxgb3i: convert cdev->l2opt to use rcu to prevent NULL dereference · e48f129c

由 Neil Horman 提交于 9月 06, 2011

This oops was reported recently:
d:mon> e
cpu 0xd: Vector: 300 (Data Access) at [c0000000fd4c7120]
    pc: d00000000076f194: .t3_l2t_get+0x44/0x524 [cxgb3]
    lr: d000000000b02108: .init_act_open+0x150/0x3d4 [cxgb3i]
    sp: c0000000fd4c73a0
   msr: 8000000000009032
   dar: 0
 dsisr: 40000000
  current = 0xc0000000fd640d40
  paca    = 0xc00000000054ff80
    pid   = 5085, comm = iscsid
d:mon> t
[c0000000fd4c7450] d000000000b02108 .init_act_open+0x150/0x3d4 [cxgb3i]
[c0000000fd4c7500] d000000000e45378 .cxgbi_ep_connect+0x784/0x8e8 [libcxgbi]
[c0000000fd4c7650] d000000000db33f0 .iscsi_if_rx+0x71c/0xb18
[scsi_transport_iscsi2]
[c0000000fd4c7740] c000000000370c9c .netlink_data_ready+0x40/0xa4
[c0000000fd4c77c0] c00000000036f010 .netlink_sendskb+0x4c/0x9c
[c0000000fd4c7850] c000000000370c18 .netlink_sendmsg+0x358/0x39c
[c0000000fd4c7950] c00000000033be24 .sock_sendmsg+0x114/0x1b8
[c0000000fd4c7b50] c00000000033d208 .sys_sendmsg+0x218/0x2ac
[c0000000fd4c7d70] c00000000033f55c .sys_socketcall+0x228/0x27c
[c0000000fd4c7e30] c0000000000086a4 syscall_exit+0x0/0x40
--- Exception: c01 (System Call) at 00000080da560cfc

The root cause was an EEH error, which sent us down the offload_close path in
the cxgb3 driver, which in turn sets cdev->l2opt to NULL, without regard for
upper layer driver (like the cxgbi drivers) which might have execution contexts
in the middle of its use. The result is the oops above, when t3_l2t_get attempts
to dereference L2DATA(cdev)->nentries in arp_hash right after the EEH error handler sets it to NULL.

The fix is to prevent the setting of the NULL pointer until after there are no
further users of it.  The t3cdev->l2opt pointer is now converted to be an rcu
pointer and the L2DATA macro is now called under the protection of the
rcu_read_lock().  When the EEH error path:
t3_adapter_error->offload_close->cxgb3_offload_deactivate
Is exectured, setting of that l2opt pointer to NULL, is now gated on an rcu
quiescence point, preventing, allowing L2DATA callers to safely check for a NULL
pointer without concern that the underlying data will be freeded before the
pointer is dereferenced.

This has been tested by the reporter and shown to fix the reproted oops

[nhorman: fix up unitinialised variable reported by Dan Carpenter]
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
Reviewed-by: NKaren Xie <kxie@chelsio.com>
Cc: stable@kernel.org
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

e48f129c

27 7月, 2011 1 次提交

atomic: use <linux/atomic.h> · 60063497

由 Arun Sharma 提交于 7月 26, 2011

This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>
Signed-off-by: NArun Sharma <asharma@fb.com>
Reviewed-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: NMike Frysinger <vapier@gentoo.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

60063497

23 7月, 2011 1 次提交

IB/qib: Defer HCA error events to tasklet · e67306a3

由 Mike Marciniszyn 提交于 7月 21, 2011

With ib_qib options:

    options ib_qib krcvqs=1 pcie_caps=0x51 rcvhdrcnt=4096 singleport=1 ibmtu=4

a run of ib_write_bw -a yields the following:

    ------------------------------------------------------------------
     #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]
     1048576   5000           2910.64            229.80
    ------------------------------------------------------------------

The top cpu use in a profile is:

    CPU: Intel Architectural Perfmon, speed 2400.15 MHz (estimated)
    Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask
    of 0x00 (No unit mask) count 1002300
    Counted LLC_MISSES events (Last level cache demand requests from this core that
    missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000
    samples  %        samples  %        app name                 symbol name
    15237    29.2642  964      17.1195  ib_qib.ko                qib_7322intr
    12320    23.6618  1040     18.4692  ib_qib.ko                handle_7322_errors
    4106      7.8860  0              0  vmlinux                  vsnprintf


Analysis of the stats, profile, the code, and the annotated profile indicate:
 - All of the overflow interrupts (one per packet overflow) are
   serviced on CPU0 with no mitigation on the frequency.
 - All of the receive interrupts are being serviced by CPU0.  (That is
   the way truescale.cmds statically allocates the kctx IRQs to CPU)
 - The code is spending all of its time servicing QIB_I_C_ERROR
   RcvEgrFullErr interrupts on CPU0, starving the packet receive
   processing.
 - The decode_err routine is very inefficient, using a printf variant
   to format a "%s" and continues to loop when the errs mask has been
   cleared.
 - Both qib_7322intr and handle_7322_errors read pci registers, which
   is very inefficient.

The fix does the following:
 - Adds a tasklet to service QIB_I_C_ERROR
 - Replaces the very inefficient scnprintf() with a memcpy().  A field
   is added to qib_hwerror_msgs to save the sizeof("string") at
   compile time so that a strlen is not needed during err_decode().
 - The most frequent errors (Overflows) are serviced first to exit the
   loop as early as possible.
 - The loop now exits as soon as the errs mask is clear rather than
   fruitlessly looping through the msp array.

With this fix the performance changes to:

    ------------------------------------------------------------------
     #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]
     1048576   5000           2990.64            2941.35
    ------------------------------------------------------------------

During testing of the error handling overflow patch, it was determined
that some CPU's were slower when servicing both overflow and receive
interrupts on CPU0 with different MSI interrupt vectors.

This patch adds an option (krcvq01_no_msi) to not use a dedicated MSI
interrupt for kctx's < 2 and to service them on the default interrupt.
For some CPUs, the cost of the interrupt enter/exit is more costly
than then the additional PCI read in the default handler.
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

e67306a3

22 7月, 2011 1 次提交

nes: do vlan cleanup · 7033c4ad

由 Jiri Pirko 提交于 7月 20, 2011

- unify vlan and nonvlan rx path
- kill nesvnic->vlan_grp and nes_netdev_vlan_rx_register
- allow to turn on/off rx/tx vlan accel via ethtool (set_features)
Signed-off-by: NJiri Pirko <jpirko@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7033c4ad

19 7月, 2011 14 次提交

RDMA/cxgb4: Use printk_ratelimited() instead of printk_ratelimit() · 3cbe182a

由 Manuel Zerpies 提交于 6月 16, 2011

Since printk_ratelimit() shouldn't be used anymore (see comment in
include/linux/printk.h), replace it with printk_ratelimited().
Signed-off-by: NManuel Zerpies <manuel.f.zerpies@ww.stud.uni-erlangen.de>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

3cbe182a

IB/mlx4: Support PMA counters for IBoE · c3779134

由 Or Gerlitz 提交于 6月 15, 2011

Use the per port counter attached to all QPs created on that port to
implement port level packets/bytes performance counters a la IB.
Derived from a patch by Eli Cohen <eli@mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

c3779134

IB/mlx4: Use flow counters on IBoE ports · cfcde11c

由 Or Gerlitz 提交于 6月 15, 2011

Allocate flow counter per Ethernet/IBoE port, and attach this counter
to all the QPs created on that port.  Based on patch by Eli Cohen
<eli@mellanox.co.il>.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

cfcde11c

IB/pma: Add include file for IBA performance counters definitions · 6aea213a

由 Or Gerlitz 提交于 7月 05, 2011

Move the various definitions and mad structures needed for software
implementation of IBA PM agent from the ipath and qib drivers into a
single include file, which in turn could be used by more consumers.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

6aea213a

IB/mlx4: Generate GID change events in IBoE code · 6451c712

由 Or Gerlitz 提交于 6月 15, 2011

IBoE doesn't use LIDs.  Use the GID change event to update the IB core
cache for addition/deletion of GIDs.
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

6451c712

RDMA: Allow for NULL .modify_device() and .modify_port() methods · 10e1b54b

由 Bart Van Assche 提交于 6月 18, 2011

These methods don't make sense for iWARP devices, so rather than
forcing them to implement stubs, just return -ENOSYS in the core if
the hardware driver doesn't set .modify_device and/or .modify_port.
Signed-off-by: NRoland Dreier <roland@purestorage.com>

10e1b54b

IB/qib: Update active link width · e800bd03

由 Mitko Haralanov 提交于 7月 14, 2011

Update the active link width on QLE7220 chips when link goes down if
chip width does not match shadowed width.
Signed-off-by: NMitko Haralanov <mitko@qlogic.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

e800bd03

IB/qib: Fix potential deadlock with link down interrupt · 4356d0b6

由 Ram Vepa 提交于 5月 27, 2011

There is a possibility of a deadlock due to the way locks are
acquired and released in qib_set_uevent_bits(). The function
qib_set_uevent_bits() is called in process context and it uses
spin_lock() and spin_unlock().  This same lock is acquired/released
in interrupt context which can lead to a deadlock when running on
the same cpu.

The fix is to replace spin_lock() and spin_unlock() with
spin_lock_irqsave() and spin_unlock_irqrestore() respectively in
qib_set_uevent_bits().
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

4356d0b6

IB/qib: Add sysfs interface to read free contexts · 2df4f757

由 Ram Vepa 提交于 5月 31, 2011

Indicate the number of free user contexts via the sysfs file
/sys/class/infiniband/qib0/nfreectxts as required for PSM.
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

2df4f757

IB/mthca: Remove unnecessary read of PCI_CAP_ID_EXP · 9b89925c

由 Jon Mason 提交于 6月 27, 2011

The PCIE capability offset is saved during PCI bus walking.  It will
remove an unnecessary search in the PCI configuration space if this
value is referenced instead of reacquiring it.  Also, pci_is_pcie is a
better way of determining if the device is PCIE or not (as it uses the
same saved PCIE capability offset).
Signed-off-by: NJon Mason <jdmason@kudzu.us>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

9b89925c

IB/qib: Remove double define · ac0cae44

由 Edwin van Vliet 提交于 7月 10, 2011

Signed-off-by: NEdwin van Vliet <edwin@cheatah.nl>
Reviewed-by: NJesper Juhl <jj@chaosbits.net>
Acked-by: NMike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

ac0cae44

IB/qib: Remove unnecessary read of PCI_CAP_ID_EXP · 7f27cda0

由 Jon Mason 提交于 6月 27, 2011

The PCIE capability offset is saved during PCI bus walking.  It will
remove an unnecessary search in the PCI configuration space if this
value is referenced instead of reacquiring it.
Signed-off-by: NJon Mason <jdmason@kudzu.us>
Acked-by: NMike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

7f27cda0

IB/ipath: Convert old cpumask api into new one · 57631811

由 Motohiro KOSAKI 提交于 5月 19, 2011

Adapt to new api.  We plan to remove old one later.  Almost all
changes are trivial, but there is one real fix: the following code is
unsafe:

	int ncpus = num_online_cpus()
	for (i = 0; i < ncpus; i++) {
		..
	}

because 1) we don't guarantee last bit of online cpus is equal to
num_online_cpus(). some arch assign sparse cpu number.  2) cpu
hotplugging may change cpu_online_mask at same time.  we need to pin
it by get_online_cpus().
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NMike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

57631811

IB/qib: Convert old cpumask api into new one · 0cd85e67

由 Motohiro KOSAKI 提交于 5月 19, 2011

Adapt to use new APIs.  We plan to remove old one later and plan to
change current->cpus_allowed implementation.

No functional change.
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NMike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

0cd85e67

18 7月, 2011 1 次提交
- D
  net: Abstract dst->neighbour accesses behind helpers. · 69cce1d1
  由 David S. Miller 提交于 7月 17, 2011
```
dst_{get,set}_neighbour()
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  69cce1d1
16 7月, 2011 1 次提交

IB/mthca: Stop returning separate error and status from FW commands · cdb73db0

由 Goldwyn Rodrigues 提交于 7月 07, 2011

Instead of having firmware command functions return an error and also
a status, leading to code like:

err = mthca_FW_COMMAND(..., &status);
if (err)
goto out;
if (status) {
err = -E...;
goto out;
}

all over the place, just handle the FW status inside the FW command
handling code (the way mlx4 does it), so we can simply write:

err = mthca_FW_COMMAND(...);
if (err)
goto out;

In addition to simplifying the source code, this also saves a healthy
chunk of text:

add/remove: 0/0 grow/shrink: 10/88 up/down: 510/-3357 (-2847)
function old new delta
static.trans_table 324 584 +260
mthca_cmd_poll 352 477 +125
mthca_cmd_wait 511 567 +56
mthca_table_put 213 240 +27
mthca_cleanup_db_tab 372 387 +15
__mthca_remove_one 314 323 +9
mthca_cleanup_user_db_tab 275 283 +8
__mthca_init_one 1738 1746 +8
mthca_cleanup 20 21 +1
mthca_MAD_IFC 1081 1082 +1
mthca_MGID_HASH 43 40 -3
mthca_MAP_ICM_AUX 23 20 -3
mthca_MAP_ICM 19 16 -3
mthca_MAP_FA 23 20 -3
mthca_READ_MGM 43 38 -5
mthca_QUERY_SRQ 43 38 -5
mthca_QUERY_QP 59 54 -5
mthca_HW2SW_SRQ 43 38 -5
mthca_HW2SW_MPT 60 55 -5
mthca_HW2SW_EQ 43 38 -5
mthca_HW2SW_CQ 43 38 -5
mthca_free_icm_table 120 114 -6
mthca_query_srq 214 206 -8
mthca_free_qp 662 654 -8
mthca_cmd 38 28 -10
mthca_alloc_db 1321 1311 -10
mthca_setup_hca 1067 1055 -12
mthca_WRITE_MTT 35 22 -13
mthca_WRITE_MGM 40 27 -13
mthca_UNMAP_ICM_AUX 36 23 -13
mthca_UNMAP_FA 36 23 -13
mthca_SYS_DIS 36 23 -13
mthca_SYNC_TPT 36 23 -13
mthca_SW2HW_SRQ 35 22 -13
mthca_SW2HW_MPT 35 22 -13
mthca_SW2HW_EQ 35 22 -13
mthca_SW2HW_CQ 35 22 -13
mthca_RUN_FW 36 23 -13
mthca_DISABLE_LAM 36 23 -13
mthca_CLOSE_IB 36 23 -13
mthca_CLOSE_HCA 38 25 -13
mthca_ARM_SRQ 39 26 -13
mthca_free_icms 178 164 -14
mthca_QUERY_DDR 389 375 -14
mthca_resize_cq 1063 1048 -15
mthca_unmap_eq_icm 123 107 -16
mthca_map_eq_icm 396 380 -16
mthca_cmd_box 90 74 -16
mthca_SET_IB 433 417 -16
mthca_RESIZE_CQ 369 353 -16
mthca_MAP_ICM_page 240 224 -16
mthca_MAP_EQ 183 167 -16
mthca_INIT_IB 473 457 -16
mthca_INIT_HCA 745 729 -16
mthca_map_user_db 816 798 -18
mthca_SYS_EN 157 139 -18
mthca_cleanup_qp_table 78 59 -19
mthca_cleanup_eq_table 168 149 -19
mthca_UNMAP_ICM 143 121 -22
mthca_modify_srq 172 149 -23
mthca_unmap_fmr 198 174 -24
mthca_query_qp 814 790 -24
mthca_query_pkey 343 319 -24
mthca_SET_ICM_SIZE 34 10 -24
mthca_QUERY_DEV_LIM 1870 1846 -24
mthca_map_cmd 1130 1105 -25
mthca_ENABLE_LAM 401 375 -26
mthca_modify_port 247 220 -27
mthca_query_device 884 850 -34
mthca_NOP 75 41 -34
mthca_table_get 287 249 -38
mthca_init_qp_table 333 293 -40
mthca_MODIFY_QP 348 308 -40
mthca_close_hca 131 89 -42
mthca_free_eq 435 390 -45
mthca_query_port 755 705 -50
mthca_free_cq 581 528 -53
mthca_alloc_icm_table 578 524 -54
mthca_multicast_attach 1041 986 -55
mthca_init_hca 326 271 -55
mthca_query_gid 487 431 -56
mthca_free_srq 524 468 -56
mthca_free_mr 168 111 -57
mthca_create_eq 1560 1501 -59
mthca_multicast_detach 790 728 -62
mthca_write_mtt 918 854 -64
mthca_register_device 1406 1342 -64
mthca_fmr_alloc 947 883 -64
mthca_mr_alloc 652 582 -70
mthca_process_mad 1242 1164 -78
mthca_dev_lim 910 830 -80
find_mgm 482 400 -82
mthca_modify_qp 3852 3753 -99
mthca_init_cq 1281 1181 -100
mthca_alloc_srq 1719 1610 -109
mthca_init_eq_table 1807 1679 -128
mthca_init_tavor 761 491 -270
mthca_init_arbel 2617 2098 -519
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.de>

cdb73db0

18 6月, 2011 4 次提交

IB/qib: Ensure that LOS and DFE are being turned off · 31264484

由 Mitko Haralanov 提交于 6月 09, 2011

Due to timing, it is possible for the LOS and DFE to remain on. This
is due to the link progressing to LinkUP prior to the driver getting
the first Status Changed interrupt.  By expanding the conditions under
which LOS is turned off and DFE timeout is being set, timing is no
longer an issue.
Signed-off-by: NMitko Haralanov <mitko@qlogic.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

31264484

RDMA/cxgb4: Couple of abort fixes · 8da7e7a5

由 Steve Wise 提交于 6月 14, 2011

- fix a race where the driver could end up sending a close_con_req
  after an abort_rpl.  In c4iw_ep_disconnect(), send abort or close
  request with the ep mutex held.

- fix a hang where driver fails to wake up when a connection is reset
  during a normal close.  Wake up any waiters in the interrupt path,
  and correctly cleanup after rdma_fini() failures.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

8da7e7a5

RDMA/cxgb4: Don't truncate MR lengths · 301c2c3f

由 Steve Wise 提交于 6月 14, 2011

Remove left-over code from T3 that limited MR sizes to 32b.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

301c2c3f

RDMA/cxgb4: Don't exceed hw IQ depth limit for user CQs · 2ff7d09a

由 Steve Wise 提交于 6月 01, 2011

Memory allocated for user CQs gets rounded up to the next page
boundary. And after rounding, we recalculate the resulting IQ depth
and we need to make sure we don't exceed the HW limits.

This bug can result a much smaller CQ allocated than was expected if
the HW size field is exceeded, resulting in CQ overflow failures.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

2ff7d09a

07 6月, 2011 1 次提交

net: remove interrupt.h inclusion from netdevice.h · a6b7a407

由 Alexey Dobriyan 提交于 6月 06, 2011

* remove interrupt.g inclusion from netdevice.h -- not needed
* fixup fallout, add interrupt.h and hardirq.h back where needed.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6b7a407

25 5月, 2011 3 次提交

RDMA/nes: Add a check for strict_strtoul() · 52f81dba

由 Liu Yuan 提交于 4月 01, 2011

It should check if strict_strtoul() succeeds before using
'wqm_quanta_value'.
Signed-off-by: NLiu Yuan <tailai.ly@taobao.com>

[ Convert to kstrtoul() directly while we're here.  - Roland ]
Signed-off-by: NRoland Dreier <roland@purestorage.com>

52f81dba

RDMA/cxgb3: Don't post zero-byte read if endpoint is going away · 80783868

由 Steve Wise 提交于 5月 13, 2011

tx_ack() wasn't checking the endpoint state and consequently would
attempt to post the p2p 0B read on an endpoint/QP that is closing or
aborting.  This causes a NULL pointer dereference crash.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

80783868

RDMA/cxgb4: Use completion objects for event blocking · c337374b

由 Steve Wise 提交于 5月 20, 2011

There exists a race condition when using wait_queue_head_t objects
that are declared on the stack.  This was being done in a few places
where we are sending work requests to the FW and awaiting replies, but
we don't have an endpoint structure with an embedded c4iw_wr_wait
struct.  So the code was allocating it locally on the stack.  Bad
design.  The race is:

  1) thread on cpuX declares the wait_queue_head_t on the stack, then
     posts a firmware WR with that wait object ptr as the cookie to be
     returned in the WR reply.  This thread will proceed to block in
     wait_event_timeout() but before it does:

  2) An interrupt runs on cpuY with the WR reply.  fw6_msg() handles
     this and calls c4iw_wake_up().  c4iw_wake_up() sets the condition
     variable in the c4iw_wr_wait object to TRUE and will call
     wake_up(), but before it calls wake_up():

  3) The thread on cpuX calls c4iw_wait_for_reply(), which calls
     wait_event_timeout().  The wait_event_timeout() macro checks the
     condition variable and returns immediately since it is TRUE.  So
     this thread never blocks/sleeps. The function then returns
     effectively deallocating the c4iw_wr_wait object that was on the
     stack.

  4) So at this point cpuY has a pointer to the c4iw_wr_wait object
     that is no longer valid.  Further its pointing to a stack frame
     that might now be in use by some other context/thread.  So cpuY
     continues execution and calls wake_up() on a ptr to a wait object
     that as been effectively deallocated.

This race, when it hits, can cause a crash in wake_up(), which I've
seen under heavy stress. It can also corrupt the referenced stack
which can cause any number of failures.

The fix:

Use struct completion, which supports on-stack declarations.
Completions use a spinlock around setting the condition to true and
the wake up so that steps 2 and 4 above are atomic and step 3 can
never happen in-between.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>

c337374b

23 5月, 2011 1 次提交

Add appropriate <linux/prefetch.h> include for prefetch users · 70c71606

由 Paul Gortmaker 提交于 5月 22, 2011

After discovering that wide use of prefetch on modern CPUs
could be a net loss instead of a win, net drivers which were
relying on the implicit inclusion of prefetch.h via the list
headers showed up in the resulting cleanup fallout.  Give
them an explicit include via the following $0.02 script.

 =========================================
 #!/bin/bash
 MANUAL=""
 for i in `git grep -l 'prefetch(.*)' .` ; do
 	grep -q '<linux/prefetch.h>' $i
 	if [ $? = 0 ] ; then
 		continue
 	fi

 	(	echo '?^#include <linux/?a'
 		echo '#include <linux/prefetch.h>'
 		echo .
 		echo w
 		echo q
 	) | ed -s $i > /dev/null 2>&1
 	if [ $? != 0 ]; then
 		echo $i needs manual fixup
 		MANUAL="$i $MANUAL"
 	fi
 done
 echo ------------------- 8\<----------------------
 echo vi $MANUAL
 =========================================
Signed-off-by: NPaul <paul.gortmaker@windriver.com>
[ Fixed up some incorrect #include placements, and added some
  non-network drivers and the fib_trie.c case    - Linus ]
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

70c71606

21 5月, 2011 1 次提交

RDMA: Add netlink infrastructure · b2cbae2c

由 Roland Dreier 提交于 5月 20, 2011

Add basic RDMA netlink infrastructure that allows for registration of
RDMA clients for which data is to be exported and supplies message
construction callbacks.
Signed-off-by: NNir Muchtar <nirm@voltaire.com>

[ Reorganize a few things, add CONFIG_NET dependency.  - Roland ]
Signed-off-by: NRoland Dreier <roland@purestorage.com>

b2cbae2c

12 5月, 2011 1 次提交

IB/qib: Use pci_dev->revision · 1c653357

由 Sergei Shtylyov 提交于 5月 09, 2011

The driver reads PCI revision ID from the PCI configuration register
while it's already stored by PCI subsystem in the revision field of
struct pci_dev.
Signed-off-by: NSergei Shtylyov <sshtylyov@ru.mvista.com>
Acked-by: NMike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

1c653357

10 5月, 2011 8 次提交

RDMA/iwcm: Get rid of enum iw_cm_event_status · d0c49bf3

由 Roland Dreier 提交于 5月 09, 2011

The IW_CM_EVENT_STATUS_xxx values were used in only a couple of places;
cma.c uses -Exxx values instead, and so do the amso1100, cxgb3 and cxgb4
drivers -- only nes was using the enum values (with the mild consequence
that all nes connection failures were treated as generic errors rather
than reported as timeouts or rejections).

We can fix this confusion by getting rid of enum iw_cm_event_status and
using a plain int for struct iw_cm_event.status, and converting nes to
use -Exxx as the other iWARP drivers do.

This also gets rid of the warning

drivers/infiniband/core/cma.c: In function 'cma_iw_handler':
drivers/infiniband/core/cma.c:1333:3: warning: case value '4294967185' not in enumerated type 'enum iw_cm_event_status'
drivers/infiniband/core/cma.c:1336:3: warning: case value '4294967186' not in enumerated type 'enum iw_cm_event_status'
drivers/infiniband/core/cma.c:1332:3: warning: case value '4294967192' not in enumerated type 'enum iw_cm_event_status'
Signed-off-by: NRoland Dreier <roland@purestorage.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Reviewed-by: NSean Hefty <sean.hefty@intel.com>
Reviewed-by: NFaisal Latif <faisal.latif@intel.com>

d0c49bf3

IB/ipath: Use pci_dev->revision, again · ec03d677

由 Sergei Shtylyov 提交于 5月 09, 2011

Commit 44c10138 ("PCI: Change all drivers to use pci_device->revision")
already converted this driver to using the revision field of struct
pci_dev but commit bb917144 ("IB/ipath: Misc changes to prepare
for IB7220 introduction") later reverted that change for some strange
reason.  Restore the change.
Signed-off-by: NSergei Shtylyov <sshtylyov@ru.mvista.com>
Acked-by: NMike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

ec03d677

IB/qib: Prevent driver hang with unprogrammed boards · 9f5754e3

由 Mitko Haralanov 提交于 5月 09, 2011

The time limit test now correctly checks against current jiffies to
avoid the hang.
Signed-off-by: NMitko Haralanov <mitko@qlogic.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

9f5754e3

RDMA/cxgb4: EEH errors can hang the driver · 2f25e9a5

由 Steve Wise 提交于 5月 09, 2011

A few more EEH fixes:

c4iw_wait_for_reply(): detect fatal EEH condition on timeout and
return an error.

The iw_cxgb4 driver was only calling ib_deregister_device() on an EEH
event followed by a ib_register_device() when the device was
reinitialized.  However, the RDMA core doesn't allow multiple
iterations of register/deregister by the provider. See
drivers/infiniband/core/sysfs.c: ib_device_unregister_sysfs() where
the kobject ref is held until the device is deallocated in
ib_deallocate_device().  Calling deregister adds this kobj reference,
and then a subsequent register call will generate a WARN_ON() from the
kobject subsystem because the kobject is being initialized but is
already initialized with the ref held.

So the provider must deregister and dealloc when resetting for an EEH
event, then alloc/register to re-initialize.  To do this, we cannot
use the device ptr as our ULD handle since it will change with each
reallocation.  This commit adds a ULD context struct which is used as
the ULD handle, and then contains the device pointer and other state
needed.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

2f25e9a5

RDMA/cxgb4: Reset wait condition atomically · d9594d99

由 Steve Wise 提交于 5月 09, 2011

The driver was never really waiting for RDMA_WR/FINI completions
because the condition variable used to determine if the completion
happened was never reset, and this condition variable is reused for
both connection setup and teardown.  This causes various driver
crashes under heavy loads due to releasing resources too early.

The fix is to use atomic bits to correctly reset the condition
immediately after the completion is detected.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

d9594d99

RDMA/cxgb4: Fix missing parentheses · 85d215b0

由 Roel Kluin 提交于 5月 09, 2011

Parens are missing: '|' has a higher presedence than '?'.
Signed-off-by: NRoel Kluin <roel.kluin@gmail.com>
Acked-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

85d215b0

RDMA/cxgb4: Initialization errors can cause crash · bbe9a0a2

由 Steve Wise 提交于 5月 09, 2011

c4iw_uld_add() must return ERR_PTR() values instead of NULL on failure.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

bbe9a0a2

RDMA/cxgb4: Don't change QP state outside EP lock · 30c95c2d

由 Steve Wise 提交于 5月 09, 2011

Concurrent ingress CLOSE and ULP ABORT operations causes a crash due
to a race condition where the close path releases the EP lock and then
tries to move the QP state to CLOSED.  This must be done inside the EP
lock to avoid the race.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

30c95c2d

04 5月, 2011 1 次提交
- D
  ipv4: Make caller provide on-stack flow key to ip_route_output_ports(). · 31e4543d
  由 David S. Miller 提交于 5月 03, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  31e4543d

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功