1. 05 7月, 2011 1 次提交
  2. 24 5月, 2011 1 次提交
  3. 16 3月, 2011 2 次提交
    • S
      IB/cm: Cancel pending LAP message when exiting IB_CM_ESTABLISH state · 8d8ac865
      Sean Hefty 提交于
      This problem was reported by Moni Shoua <monis@mellanox.com> and Amir
      Vadai <amirv@mellanox.com>:
      
      	When destroying a cm_id from a context of a work queue and if
      	the lap_state of this cm_id is IB_CM_LAP_SENT, we need to
      	release the reference of this id that was taken upon the send
      	of the LAP message.  Otherwise, if the expected APR message
      	gets lost, it is only after a long time that the reference
      	will be released, while during that the work handler thread is
      	not available to process other things.
      
      It turns out that we need to cancel any pending LAP messages whenever
      we transition out of the IB_CM_ESTABLISH state.  This occurs when
      disconnecting - either sending or receiving a DREQ.  It can also
      happen in a corner case where we receive a REJ message after sending
      an RTU, followed by a LAP.  Add checks and cancel any outstanding LAP
      messages in these three cases.
      
      Canceling the LAP when sending a DREQ fixes the destroy problem
      reported by Moni.  When a cm_id is destroyed in the IB_CM_ESTABLISHED
      state, it sends a DREQ to the remote side to notify the peer that the
      connection is going away.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      8d8ac865
    • S
      IB/cm: Bump reference count on cm_id before invoking callback · 29963437
      Sean Hefty 提交于
      When processing a SIDR REQ, the ib_cm allocates a new cm_id.  The
      refcount of the cm_id is initialized to 1.  However, cm_process_work
      will decrement the refcount after invoking all callbacks.  The result
      is that the cm_id will end up with refcount set to 0 by the end of the
      sidr req handler.
      
      If a user tries to destroy the cm_id, the destruction will proceed,
      under the incorrect assumption that no other threads are referencing
      the cm_id.  This can lead to a crash when the cm callback thread tries
      to access the cm_id.
      
      This problem was noticed as part of a larger investigation with kernel
      crashes in the rdma_cm when running on a real time OS.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Acked-by: NDoug Ledford <dledford@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      29963437
  4. 29 7月, 2010 1 次提交
    • S
      IB/cm: Check LAP state before sending an MRA · 50a025c6
      Sean Hefty 提交于
      NULL pointer dereferences in ib_cm_init_qp_attr() were seen by some
      users.  From a crash dump, I determined that we died in
      cm_init_qp_rts_attr() (it's inlined, so it doesn't show up in the
      traceback) on the line labeled below:
      
      static int cm_init_qp_rts_attr(struct cm_id_private *cm_id_priv,
                                     struct ib_qp_attr *qp_attr,
                                     int *qp_attr_mask)
      {
              ........
              if (cm_id_priv->id.lap_state == IB_CM_LAP_UNINIT) {
                      .....
              } else {
                     *qp_attr_mask = IB_QP_ALT_PATH | IB_QP_PATH_MIG_STATE;
                     qp_attr->alt_port_num = cm_id_priv->alt_av.port->port_num; <-die
      
      
      The problem is that the rdma_cm can call ib_send_cm_mra() after a
      connection has been established.  The ib_cm incorrectly assumes that
      the MRA is in response to a LAP (load alternate path) message, even
      though no LAP message has been received.  The ib_cm needs to check the
      lap_state before sending an MRA if the cm_id state is established.
      Reported-by: NArthur Kepner <akepner@sgi.com>
      Reported-by: NJosh England <jjengla@gmail.com>
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      50a025c6
  5. 01 4月, 2010 1 次提交
  6. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  7. 08 3月, 2010 1 次提交
  8. 18 1月, 2009 1 次提交
  9. 20 10月, 2008 1 次提交
    • P
      x86: sysfs: kill owner field from attribute · 01e8ef11
      Parag Warudkar 提交于
      Tejun's commit 7b595756 made sysfs
      attribute->owner unnecessary.  But the field was left in the structure to
      ease the merge.  It's been over a year since that change and it is now
      time to start killing attribute->owner along with its users - one arch at
      a time!
      
      This patch is attempt #1 to get rid of attribute->owner only for
      CONFIG_X86_64 or CONFIG_X86_32 .  We will deal with other arches later on
      as and when possible - avr32 will be the next since that is something I
      can test.  Compile (make allyesconfig / make allmodconfig / custom config)
      and boot tested.
      
      akpm: the idea is that we put the declaration of sttribute.owner inside
      `#ifndef CONFIG_X86'.  But that proved to be too ambitious for now because
      new usages kept on turning up in subsystem trees.
      
      [akpm: remove the ifdef for now]
      Signed-off-by: NParag Warudkar <parag.lkml@gmail.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Tejun Heo <htejun@gmail.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Jean Delvare <khali@linux-fr.org>
      Cc: Roland Dreier <rolandd@cisco.com>
      Cc: David Brownell <david-b@pacbell.net>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      01e8ef11
  10. 17 10月, 2008 1 次提交
  11. 01 10月, 2008 1 次提交
  12. 22 7月, 2008 2 次提交
  13. 15 7月, 2008 1 次提交
  14. 17 4月, 2008 1 次提交
  15. 31 3月, 2008 1 次提交
  16. 01 3月, 2008 1 次提交
    • S
      IB/cm: Flush workqueue when removing device · 84ba284c
      Sean Hefty 提交于
      When a CM MAD is received, it is queued to a CM workqueue for
      processing.  The queued work item references the port and device on
      which the MAD was received.  If that device is removed from the system
      before the work item can execute, the work item will reference freed
      memory.
      
      To fix this, flush the workqueue after unregistering to receive MAD,
      and before the device is be freed.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      84ba284c
  17. 13 2月, 2008 2 次提交
  18. 05 2月, 2008 1 次提交
    • S
      IB/cm: Add interim support for routed paths · 3971c9f6
      Sean Hefty 提交于
      Paths with hop_limit > 1 indicate that the connection will be routed
      between IB subnets.  Update the subnet local field in the CM REQ based
      on the hop_limit value.  In addition, if the path is routed, then set
      the LIDs in the REQ to the permissive LIDs.  This is used to indicate
      to the passive side that it should use the LIDs in the received local
      route header (LRH) associated with the REQ when programming the QP.
      
      This is a temporary work-around to the IB CM to support IB router
      development until the IB router specification is completed.  It is not
      anticipated that this work-around will cause any interoperability
      issues with existing stacks or future stacks that will properly
      support IB routers when defined.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      3971c9f6
  19. 26 1月, 2008 1 次提交
  20. 10 10月, 2007 1 次提交
    • S
      IB/cm: Modify interface to send MRAs in response to duplicate messages · de98b693
      Sean Hefty 提交于
      The IB CM provides a message received acknowledged (MRA) message that
      can be sent to indicate that a REQ or REP message has been received, but
      will require more time to process than the timeout specified by those
      messages.  In many cases, the application may not know how long it will
      take to respond to a CM message, but the majority of the time, it will
      usually respond before a retry has been sent.  Rather than sending an
      MRA in response to all messages just to handle the case where a longer
      timeout is needed, it is more efficient to queue the MRA for sending in
      case a duplicate message is received.
      
      This avoids sending an MRA when it is not needed, but limits the number
      of times that a REQ or REP will be resent.  It also provides for a
      simpler implementation than generating the MRA based on a timer event.
      (That is, trying to send the MRA after receiving the first REQ or REP if
      a response has not been generated, so that it is received at the remote
      side before a duplicate REQ or REP has been received)
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      de98b693
  21. 18 7月, 2007 1 次提交
  22. 11 7月, 2007 4 次提交
  23. 30 5月, 2007 1 次提交
  24. 22 5月, 2007 1 次提交
    • M
      IB/cm: Improve local id allocation · 9f81036c
      Michael S. Tsirkin 提交于
      The IB CM uses an idr for local id allocations, with a running counter
      as start_id.  This fails to generate distinct ids if
      
      1. An id is constantly created and destroyed
      2. A chunk of ids just beyond the current next_id value is occupied
      
      This in turn leads to an increased chance of connection request being
      mis-detected as a duplicate, sometimes for several retries, until
      next_id gets past the block of allocated ids. This has been observed
      in practice.
      
      As a fix, remember the last id allocated and start immediately above it.
      This also fixes a problem with the old code, where next_id might
      overflow and become negative.
      Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>
      Acked-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      9f81036c
  25. 03 5月, 2007 1 次提交
    • J
      PCI: Cleanup the includes of <linux/pci.h> · 6473d160
      Jean Delvare 提交于
      I noticed that many source files include <linux/pci.h> while they do
      not appear to need it. Here is an attempt to clean it all up.
      
      In order to find all possibly affected files, I searched for all
      files including <linux/pci.h> but without any other occurence of "pci"
      or "PCI". I removed the include statement from all of these, then I
      compiled an allmodconfig kernel on both i386 and x86_64 and fixed the
      false positives manually.
      
      My tests covered 66% of the affected files, so there could be false
      positives remaining. Untested files are:
      
      arch/alpha/kernel/err_common.c
      arch/alpha/kernel/err_ev6.c
      arch/alpha/kernel/err_ev7.c
      arch/ia64/sn/kernel/huberror.c
      arch/ia64/sn/kernel/xpnet.c
      arch/m68knommu/kernel/dma.c
      arch/mips/lib/iomap.c
      arch/powerpc/platforms/pseries/ras.c
      arch/ppc/8260_io/enet.c
      arch/ppc/8260_io/fcc_enet.c
      arch/ppc/8xx_io/enet.c
      arch/ppc/syslib/ppc4xx_sgdma.c
      arch/sh64/mach-cayman/iomap.c
      arch/xtensa/kernel/xtensa_ksyms.c
      arch/xtensa/platform-iss/setup.c
      drivers/i2c/busses/i2c-at91.c
      drivers/i2c/busses/i2c-mpc.c
      drivers/media/video/saa711x.c
      drivers/misc/hdpuftrs/hdpu_cpustate.c
      drivers/misc/hdpuftrs/hdpu_nexus.c
      drivers/net/au1000_eth.c
      drivers/net/fec_8xx/fec_main.c
      drivers/net/fec_8xx/fec_mii.c
      drivers/net/fs_enet/fs_enet-main.c
      drivers/net/fs_enet/mac-fcc.c
      drivers/net/fs_enet/mac-fec.c
      drivers/net/fs_enet/mac-scc.c
      drivers/net/fs_enet/mii-bitbang.c
      drivers/net/fs_enet/mii-fec.c
      drivers/net/ibm_emac/ibm_emac_core.c
      drivers/net/lasi_82596.c
      drivers/parisc/hppb.c
      drivers/sbus/sbus.c
      drivers/video/g364fb.c
      drivers/video/platinumfb.c
      drivers/video/stifb.c
      drivers/video/valkyriefb.c
      include/asm-arm/arch-ixp4xx/dma.h
      sound/oss/au1550_ac97.c
      
      I would welcome test reports for these files. I am fine with removing
      the untested files from the patch if the general opinion is that these
      changes aren't safe. The tested part would still be nice to have.
      
      Note that this patch depends on another header fixup patch I submitted
      to LKML yesterday:
        [PATCH] scatterlist.h needs types.h
        http://lkml.org/lkml/2007/3/01/141Signed-off-by: NJean Delvare <khali@linux-fr.org>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      6473d160
  26. 23 2月, 2007 1 次提交
  27. 13 12月, 2006 1 次提交
  28. 30 11月, 2006 3 次提交
  29. 22 11月, 2006 1 次提交
  30. 11 10月, 2006 2 次提交
    • S
      IB/cm: Send DREP in response to unmatched DREQ · 82a9c16a
      Sean Hefty 提交于
      Currently a DREP is only sent in response to a DREQ if a connection
      has been found matching the DREQ, and it is in the proper state.  Once
      a DREP is sent, the local connection moves into timewait.  Duplicate
      DREQs received while in this state result in re-sending the DREP.
      
      However, it's likely that the local connection will enter and exit
      timewait before the remote side times out a lost DREP and resends a DREQ.
      To handle this, we send a DREP in response to a DREQ, even if a local
      connection is not found.  This avoids maintaining disconnected
      id's in timewait states for excessively long times, just to handle a
      lost DREP.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      82a9c16a
    • S
      IB/cm: Fix timewait crash after module unload · 8575329d
      Sean Hefty 提交于
      If the ib_cm module is unloaded while id's are still in timewait, the
      CM will destroy the work queue used to process timewait.  Once the
      id's exit timewait, their timers will fire, leading to a crash trying
      to access the destroyed work queue.
      
      We need to track id's that are in timewait, and cancel their deferred
      work on module unload.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      8575329d
  31. 23 9月, 2006 1 次提交