1. 11 10月, 2011 3 次提交
  2. 07 10月, 2011 1 次提交
  3. 26 9月, 2011 1 次提交
    • N
      [SCSI] cxgb3i: convert cdev->l2opt to use rcu to prevent NULL dereference · e48f129c
      Neil Horman 提交于
      This oops was reported recently:
      d:mon> e
      cpu 0xd: Vector: 300 (Data Access) at [c0000000fd4c7120]
          pc: d00000000076f194: .t3_l2t_get+0x44/0x524 [cxgb3]
          lr: d000000000b02108: .init_act_open+0x150/0x3d4 [cxgb3i]
          sp: c0000000fd4c73a0
         msr: 8000000000009032
         dar: 0
       dsisr: 40000000
        current = 0xc0000000fd640d40
        paca    = 0xc00000000054ff80
          pid   = 5085, comm = iscsid
      d:mon> t
      [c0000000fd4c7450] d000000000b02108 .init_act_open+0x150/0x3d4 [cxgb3i]
      [c0000000fd4c7500] d000000000e45378 .cxgbi_ep_connect+0x784/0x8e8 [libcxgbi]
      [c0000000fd4c7650] d000000000db33f0 .iscsi_if_rx+0x71c/0xb18
      [scsi_transport_iscsi2]
      [c0000000fd4c7740] c000000000370c9c .netlink_data_ready+0x40/0xa4
      [c0000000fd4c77c0] c00000000036f010 .netlink_sendskb+0x4c/0x9c
      [c0000000fd4c7850] c000000000370c18 .netlink_sendmsg+0x358/0x39c
      [c0000000fd4c7950] c00000000033be24 .sock_sendmsg+0x114/0x1b8
      [c0000000fd4c7b50] c00000000033d208 .sys_sendmsg+0x218/0x2ac
      [c0000000fd4c7d70] c00000000033f55c .sys_socketcall+0x228/0x27c
      [c0000000fd4c7e30] c0000000000086a4 syscall_exit+0x0/0x40
      --- Exception: c01 (System Call) at 00000080da560cfc
      
      The root cause was an EEH error, which sent us down the offload_close path in
      the cxgb3 driver, which in turn sets cdev->l2opt to NULL, without regard for
      upper layer driver (like the cxgbi drivers) which might have execution contexts
      in the middle of its use. The result is the oops above, when t3_l2t_get attempts
      to dereference L2DATA(cdev)->nentries in arp_hash right after the EEH error handler sets it to NULL.
      
      The fix is to prevent the setting of the NULL pointer until after there are no
      further users of it.  The t3cdev->l2opt pointer is now converted to be an rcu
      pointer and the L2DATA macro is now called under the protection of the
      rcu_read_lock().  When the EEH error path:
      t3_adapter_error->offload_close->cxgb3_offload_deactivate
      Is exectured, setting of that l2opt pointer to NULL, is now gated on an rcu
      quiescence point, preventing, allowing L2DATA callers to safely check for a NULL
      pointer without concern that the underlying data will be freeded before the
      pointer is dereferenced.
      
      This has been tested by the reporter and shown to fix the reproted oops
      
      [nhorman: fix up unitinialised variable reported by Dan Carpenter]
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Reviewed-by: NKaren Xie <kxie@chelsio.com>
      Cc: stable@kernel.org
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      e48f129c
  4. 27 7月, 2011 1 次提交
  5. 23 7月, 2011 1 次提交
    • M
      IB/qib: Defer HCA error events to tasklet · e67306a3
      Mike Marciniszyn 提交于
      With ib_qib options:
      
          options ib_qib krcvqs=1 pcie_caps=0x51 rcvhdrcnt=4096 singleport=1 ibmtu=4
      
      a run of ib_write_bw -a yields the following:
      
          ------------------------------------------------------------------
           #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]
           1048576   5000           2910.64            229.80
          ------------------------------------------------------------------
      
      The top cpu use in a profile is:
      
          CPU: Intel Architectural Perfmon, speed 2400.15 MHz (estimated)
          Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask
          of 0x00 (No unit mask) count 1002300
          Counted LLC_MISSES events (Last level cache demand requests from this core that
          missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000
          samples  %        samples  %        app name                 symbol name
          15237    29.2642  964      17.1195  ib_qib.ko                qib_7322intr
          12320    23.6618  1040     18.4692  ib_qib.ko                handle_7322_errors
          4106      7.8860  0              0  vmlinux                  vsnprintf
      
      
      Analysis of the stats, profile, the code, and the annotated profile indicate:
       - All of the overflow interrupts (one per packet overflow) are
         serviced on CPU0 with no mitigation on the frequency.
       - All of the receive interrupts are being serviced by CPU0.  (That is
         the way truescale.cmds statically allocates the kctx IRQs to CPU)
       - The code is spending all of its time servicing QIB_I_C_ERROR
         RcvEgrFullErr interrupts on CPU0, starving the packet receive
         processing.
       - The decode_err routine is very inefficient, using a printf variant
         to format a "%s" and continues to loop when the errs mask has been
         cleared.
       - Both qib_7322intr and handle_7322_errors read pci registers, which
         is very inefficient.
      
      The fix does the following:
       - Adds a tasklet to service QIB_I_C_ERROR
       - Replaces the very inefficient scnprintf() with a memcpy().  A field
         is added to qib_hwerror_msgs to save the sizeof("string") at
         compile time so that a strlen is not needed during err_decode().
       - The most frequent errors (Overflows) are serviced first to exit the
         loop as early as possible.
       - The loop now exits as soon as the errs mask is clear rather than
         fruitlessly looping through the msp array.
      
      With this fix the performance changes to:
      
          ------------------------------------------------------------------
           #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]
           1048576   5000           2990.64            2941.35
          ------------------------------------------------------------------
      
      During testing of the error handling overflow patch, it was determined
      that some CPU's were slower when servicing both overflow and receive
      interrupts on CPU0 with different MSI interrupt vectors.
      
      This patch adds an option (krcvq01_no_msi) to not use a dedicated MSI
      interrupt for kctx's < 2 and to service them on the default interrupt.
      For some CPUs, the cost of the interrupt enter/exit is more costly
      than then the additional PCI read in the default handler.
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@qlogic.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      e67306a3
  6. 22 7月, 2011 1 次提交
  7. 19 7月, 2011 14 次提交
  8. 18 7月, 2011 1 次提交
  9. 16 7月, 2011 1 次提交
    • G
      IB/mthca: Stop returning separate error and status from FW commands · cdb73db0
      Goldwyn Rodrigues 提交于
      Instead of having firmware command functions return an error and also
      a status, leading to code like:
      
      	err = mthca_FW_COMMAND(..., &status);
      	if (err)
      		goto out;
              if (status) {
      		err = -E...;
      		goto out;
      	}
      
      all over the place, just handle the FW status inside the FW command
      handling code (the way mlx4 does it), so we can simply write:
      
      	err = mthca_FW_COMMAND(...);
      	if (err)
      		goto out;
      
      In addition to simplifying the source code, this also saves a healthy
      chunk of text:
      
          add/remove: 0/0 grow/shrink: 10/88 up/down: 510/-3357 (-2847)
          function                                     old     new   delta
          static.trans_table                           324     584    +260
          mthca_cmd_poll                               352     477    +125
          mthca_cmd_wait                               511     567     +56
          mthca_table_put                              213     240     +27
          mthca_cleanup_db_tab                         372     387     +15
          __mthca_remove_one                           314     323      +9
          mthca_cleanup_user_db_tab                    275     283      +8
          __mthca_init_one                            1738    1746      +8
          mthca_cleanup                                 20      21      +1
          mthca_MAD_IFC                               1081    1082      +1
          mthca_MGID_HASH                               43      40      -3
          mthca_MAP_ICM_AUX                             23      20      -3
          mthca_MAP_ICM                                 19      16      -3
          mthca_MAP_FA                                  23      20      -3
          mthca_READ_MGM                                43      38      -5
          mthca_QUERY_SRQ                               43      38      -5
          mthca_QUERY_QP                                59      54      -5
          mthca_HW2SW_SRQ                               43      38      -5
          mthca_HW2SW_MPT                               60      55      -5
          mthca_HW2SW_EQ                                43      38      -5
          mthca_HW2SW_CQ                                43      38      -5
          mthca_free_icm_table                         120     114      -6
          mthca_query_srq                              214     206      -8
          mthca_free_qp                                662     654      -8
          mthca_cmd                                     38      28     -10
          mthca_alloc_db                              1321    1311     -10
          mthca_setup_hca                             1067    1055     -12
          mthca_WRITE_MTT                               35      22     -13
          mthca_WRITE_MGM                               40      27     -13
          mthca_UNMAP_ICM_AUX                           36      23     -13
          mthca_UNMAP_FA                                36      23     -13
          mthca_SYS_DIS                                 36      23     -13
          mthca_SYNC_TPT                                36      23     -13
          mthca_SW2HW_SRQ                               35      22     -13
          mthca_SW2HW_MPT                               35      22     -13
          mthca_SW2HW_EQ                                35      22     -13
          mthca_SW2HW_CQ                                35      22     -13
          mthca_RUN_FW                                  36      23     -13
          mthca_DISABLE_LAM                             36      23     -13
          mthca_CLOSE_IB                                36      23     -13
          mthca_CLOSE_HCA                               38      25     -13
          mthca_ARM_SRQ                                 39      26     -13
          mthca_free_icms                              178     164     -14
          mthca_QUERY_DDR                              389     375     -14
          mthca_resize_cq                             1063    1048     -15
          mthca_unmap_eq_icm                           123     107     -16
          mthca_map_eq_icm                             396     380     -16
          mthca_cmd_box                                 90      74     -16
          mthca_SET_IB                                 433     417     -16
          mthca_RESIZE_CQ                              369     353     -16
          mthca_MAP_ICM_page                           240     224     -16
          mthca_MAP_EQ                                 183     167     -16
          mthca_INIT_IB                                473     457     -16
          mthca_INIT_HCA                               745     729     -16
          mthca_map_user_db                            816     798     -18
          mthca_SYS_EN                                 157     139     -18
          mthca_cleanup_qp_table                        78      59     -19
          mthca_cleanup_eq_table                       168     149     -19
          mthca_UNMAP_ICM                              143     121     -22
          mthca_modify_srq                             172     149     -23
          mthca_unmap_fmr                              198     174     -24
          mthca_query_qp                               814     790     -24
          mthca_query_pkey                             343     319     -24
          mthca_SET_ICM_SIZE                            34      10     -24
          mthca_QUERY_DEV_LIM                         1870    1846     -24
          mthca_map_cmd                               1130    1105     -25
          mthca_ENABLE_LAM                             401     375     -26
          mthca_modify_port                            247     220     -27
          mthca_query_device                           884     850     -34
          mthca_NOP                                     75      41     -34
          mthca_table_get                              287     249     -38
          mthca_init_qp_table                          333     293     -40
          mthca_MODIFY_QP                              348     308     -40
          mthca_close_hca                              131      89     -42
          mthca_free_eq                                435     390     -45
          mthca_query_port                             755     705     -50
          mthca_free_cq                                581     528     -53
          mthca_alloc_icm_table                        578     524     -54
          mthca_multicast_attach                      1041     986     -55
          mthca_init_hca                               326     271     -55
          mthca_query_gid                              487     431     -56
          mthca_free_srq                               524     468     -56
          mthca_free_mr                                168     111     -57
          mthca_create_eq                             1560    1501     -59
          mthca_multicast_detach                       790     728     -62
          mthca_write_mtt                              918     854     -64
          mthca_register_device                       1406    1342     -64
          mthca_fmr_alloc                              947     883     -64
          mthca_mr_alloc                               652     582     -70
          mthca_process_mad                           1242    1164     -78
          mthca_dev_lim                                910     830     -80
          find_mgm                                     482     400     -82
          mthca_modify_qp                             3852    3753     -99
          mthca_init_cq                               1281    1181    -100
          mthca_alloc_srq                             1719    1610    -109
          mthca_init_eq_table                         1807    1679    -128
          mthca_init_tavor                             761     491    -270
          mthca_init_arbel                            2617    2098    -519
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.de>
      cdb73db0
  10. 18 6月, 2011 4 次提交
  11. 07 6月, 2011 1 次提交
  12. 25 5月, 2011 3 次提交
    • L
      RDMA/nes: Add a check for strict_strtoul() · 52f81dba
      Liu Yuan 提交于
      It should check if strict_strtoul() succeeds before using
      'wqm_quanta_value'.
      Signed-off-by: NLiu Yuan <tailai.ly@taobao.com>
      
      [ Convert to kstrtoul() directly while we're here.  - Roland ]
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      52f81dba
    • S
      RDMA/cxgb3: Don't post zero-byte read if endpoint is going away · 80783868
      Steve Wise 提交于
      tx_ack() wasn't checking the endpoint state and consequently would
      attempt to post the p2p 0B read on an endpoint/QP that is closing or
      aborting.  This causes a NULL pointer dereference crash.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      80783868
    • S
      RDMA/cxgb4: Use completion objects for event blocking · c337374b
      Steve Wise 提交于
      There exists a race condition when using wait_queue_head_t objects
      that are declared on the stack.  This was being done in a few places
      where we are sending work requests to the FW and awaiting replies, but
      we don't have an endpoint structure with an embedded c4iw_wr_wait
      struct.  So the code was allocating it locally on the stack.  Bad
      design.  The race is:
      
        1) thread on cpuX declares the wait_queue_head_t on the stack, then
           posts a firmware WR with that wait object ptr as the cookie to be
           returned in the WR reply.  This thread will proceed to block in
           wait_event_timeout() but before it does:
      
        2) An interrupt runs on cpuY with the WR reply.  fw6_msg() handles
           this and calls c4iw_wake_up().  c4iw_wake_up() sets the condition
           variable in the c4iw_wr_wait object to TRUE and will call
           wake_up(), but before it calls wake_up():
      
        3) The thread on cpuX calls c4iw_wait_for_reply(), which calls
           wait_event_timeout().  The wait_event_timeout() macro checks the
           condition variable and returns immediately since it is TRUE.  So
           this thread never blocks/sleeps. The function then returns
           effectively deallocating the c4iw_wr_wait object that was on the
           stack.
      
        4) So at this point cpuY has a pointer to the c4iw_wr_wait object
           that is no longer valid.  Further its pointing to a stack frame
           that might now be in use by some other context/thread.  So cpuY
           continues execution and calls wake_up() on a ptr to a wait object
           that as been effectively deallocated.
      
      This race, when it hits, can cause a crash in wake_up(), which I've
      seen under heavy stress. It can also corrupt the referenced stack
      which can cause any number of failures.
      
      The fix:
      
      Use struct completion, which supports on-stack declarations.
      Completions use a spinlock around setting the condition to true and
      the wake up so that steps 2 and 4 above are atomic and step 3 can
      never happen in-between.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      c337374b
  13. 23 5月, 2011 1 次提交
    • P
      Add appropriate <linux/prefetch.h> include for prefetch users · 70c71606
      Paul Gortmaker 提交于
      After discovering that wide use of prefetch on modern CPUs
      could be a net loss instead of a win, net drivers which were
      relying on the implicit inclusion of prefetch.h via the list
      headers showed up in the resulting cleanup fallout.  Give
      them an explicit include via the following $0.02 script.
      
       =========================================
       #!/bin/bash
       MANUAL=""
       for i in `git grep -l 'prefetch(.*)' .` ; do
       	grep -q '<linux/prefetch.h>' $i
       	if [ $? = 0 ] ; then
       		continue
       	fi
      
       	(	echo '?^#include <linux/?a'
       		echo '#include <linux/prefetch.h>'
       		echo .
       		echo w
       		echo q
       	) | ed -s $i > /dev/null 2>&1
       	if [ $? != 0 ]; then
       		echo $i needs manual fixup
       		MANUAL="$i $MANUAL"
       	fi
       done
       echo ------------------- 8\<----------------------
       echo vi $MANUAL
       =========================================
      Signed-off-by: NPaul <paul.gortmaker@windriver.com>
      [ Fixed up some incorrect #include placements, and added some
        non-network drivers and the fib_trie.c case    - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      70c71606
  14. 21 5月, 2011 1 次提交
  15. 12 5月, 2011 1 次提交
  16. 10 5月, 2011 5 次提交
    • R
      RDMA/iwcm: Get rid of enum iw_cm_event_status · d0c49bf3
      Roland Dreier 提交于
      The IW_CM_EVENT_STATUS_xxx values were used in only a couple of places;
      cma.c uses -Exxx values instead, and so do the amso1100, cxgb3 and cxgb4
      drivers -- only nes was using the enum values (with the mild consequence
      that all nes connection failures were treated as generic errors rather
      than reported as timeouts or rejections).
      
      We can fix this confusion by getting rid of enum iw_cm_event_status and
      using a plain int for struct iw_cm_event.status, and converting nes to
      use -Exxx as the other iWARP drivers do.
      
      This also gets rid of the warning
      
          drivers/infiniband/core/cma.c: In function 'cma_iw_handler':
          drivers/infiniband/core/cma.c:1333:3: warning: case value '4294967185' not in enumerated type 'enum iw_cm_event_status'
          drivers/infiniband/core/cma.c:1336:3: warning: case value '4294967186' not in enumerated type 'enum iw_cm_event_status'
          drivers/infiniband/core/cma.c:1332:3: warning: case value '4294967192' not in enumerated type 'enum iw_cm_event_status'
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
      Reviewed-by: NSean Hefty <sean.hefty@intel.com>
      Reviewed-by: NFaisal Latif <faisal.latif@intel.com>
      d0c49bf3
    • S
      IB/ipath: Use pci_dev->revision, again · ec03d677
      Sergei Shtylyov 提交于
      Commit 44c10138 ("PCI: Change all drivers to use pci_device->revision")
      already converted this driver to using the revision field of struct
      pci_dev but commit bb917144 ("IB/ipath: Misc changes to prepare
      for IB7220 introduction") later reverted that change for some strange
      reason.  Restore the change.
      Signed-off-by: NSergei Shtylyov <sshtylyov@ru.mvista.com>
      Acked-by: NMike Marciniszyn <mike.marciniszyn@qlogic.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      ec03d677
    • M
      IB/qib: Prevent driver hang with unprogrammed boards · 9f5754e3
      Mitko Haralanov 提交于
      The time limit test now correctly checks against current jiffies to
      avoid the hang.
      Signed-off-by: NMitko Haralanov <mitko@qlogic.com>
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@qlogic.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      9f5754e3
    • S
      RDMA/cxgb4: EEH errors can hang the driver · 2f25e9a5
      Steve Wise 提交于
      A few more EEH fixes:
      
      c4iw_wait_for_reply(): detect fatal EEH condition on timeout and
      return an error.
      
      The iw_cxgb4 driver was only calling ib_deregister_device() on an EEH
      event followed by a ib_register_device() when the device was
      reinitialized.  However, the RDMA core doesn't allow multiple
      iterations of register/deregister by the provider. See
      drivers/infiniband/core/sysfs.c: ib_device_unregister_sysfs() where
      the kobject ref is held until the device is deallocated in
      ib_deallocate_device().  Calling deregister adds this kobj reference,
      and then a subsequent register call will generate a WARN_ON() from the
      kobject subsystem because the kobject is being initialized but is
      already initialized with the ref held.
      
      So the provider must deregister and dealloc when resetting for an EEH
      event, then alloc/register to re-initialize.  To do this, we cannot
      use the device ptr as our ULD handle since it will change with each
      reallocation.  This commit adds a ULD context struct which is used as
      the ULD handle, and then contains the device pointer and other state
      needed.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      2f25e9a5
    • S
      RDMA/cxgb4: Reset wait condition atomically · d9594d99
      Steve Wise 提交于
      The driver was never really waiting for RDMA_WR/FINI completions
      because the condition variable used to determine if the completion
      happened was never reset, and this condition variable is reused for
      both connection setup and teardown.  This causes various driver
      crashes under heavy loads due to releasing resources too early.
      
      The fix is to use atomic bits to correctly reset the condition
      immediately after the completion is detected.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      d9594d99