1. 10 12月, 2009 6 次提交
    • F
      RDMA/nes: Fix Xansation test crash on cm_node ref_count · 886f98a3
      Faisal Latif 提交于
      While running a Xansation test, an active side node crashed.  The
      problem started on the passive side, which generated an STtag that was
      0.  The passive side sent a TERMINATE instead of an MPA REJECT msg.
      The active side, receives TERMINATE and sends connect_err() and set
      the cm_node state to CLOSED.  The passive side sends FIN + ACK after
      TERMINATE.  Active side ends up in handle_ack_pkt() and send_reset().
      send_reset() consumes 1 cm_node's ref_count.  Because the cm_node is
      in CLOSED state, which means that cm_node will be destroyed after
      completion of the connect_err() indication, CM will crash after
      send_reset().
      Signed-off-by: NFaisal Latif <faisal.latif@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      886f98a3
    • F
      RDMA/nes: Abnormal listener exit causes loopback node crash · f9f3f1e0
      Faisal Latif 提交于
      When the listener is destroyed for a loopback connection, the listener
      node gets a reset event.  This causes a crash as the listener is not
      expecting a reset event.  Code review of cm_event_reset() during
      debugging showed the cm_id ref count is incremented after calling its
      event handler and not before.
      Signed-off-by: NFaisal Latif <faisal.latif@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      f9f3f1e0
    • F
      RDMA/nes: Fix crash in nes_accept() · c5a7d489
      Faisal Latif 提交于
      While running IMP_EXT's window test, we saw a crash in nes_accept().
      Here is the sequence of what happened:
      
      (1) In MVAPICH2, connect request is received for port #0.
      
      FIX:  Add a nes_connect() check to make sure local or remote tcp port
            is not 0.
      
      (2) Remote node's (passive) TCP stack sends a reset when it gets a
          connect request because of port = 0.  Active side set the connect
          error to IW_CM_EVENT_STATUS_REJECTED when it received the RST from
          remote node.
      
      FIX: The corect error code is -ECONNRESET.
      
      (3) Wrong error code of IW_CM_EVENT_STATUS_REJECTED causes the core to
          destroy its listener ports.  Here there are connections that may
          have sent an MPA request up and waiting for accept or reject.  But
          the listener and its cm_nodes have been freed already causing the
          crash noticed.
      
      FIX: The cm_node is freed only if its state is not
           NES_CM_STATE_MPAREQ_RCVD.  If cm_node's state is
           NES_CM_STATE_MPAREQ_RCVD then its new state is set to
           NES_CM_STATE_LISTENER_DESTROYED and it is not freed.  When
           nes_accept() or nes_reject() is received, its state is checked
           for NES_CM_STATE_LISTENER_DESTROYED and in this case the cm_node
           is freed and error is returned.
      Signed-off-by: NFaisal Latif <faisal.latif@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      c5a7d489
    • F
      RDMA/nes: Resource not freed for REJECTed connections · 69524e1a
      Faisal Latif 提交于
      During testing of REJECT connection error handling, we saw that the
      cm_id resources are not released.  When the retransmit timer expires,
      we need to send a reset message to remote node before issuing the
      ABORTED event.
      Signed-off-by: NFaisal Latif <faisal.latif@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      69524e1a
    • F
      RDMA/nes: MPA request/response error checking · 1cf078c9
      Faisal Latif 提交于
      During Xansation testing, we saw that error handling of MPA frame
      msg/response is not handled properly.
      Signed-off-by: NFaisal Latif <faisal.latif@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      1cf078c9
    • C
      RDMA/nes: Update copyright and branding string · fa6c87d5
      Chien Tung 提交于
      Update copyright from Intel-NE, Inc. to Intel Corporation.  Use proper
      branding string in Kconfig and simplify description.
      Signed-off-by: NChien Tung <chien.tin.tung@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      fa6c87d5
  2. 06 9月, 2009 3 次提交
  3. 23 6月, 2009 1 次提交
  4. 28 4月, 2009 6 次提交
  5. 09 4月, 2009 3 次提交
    • F
      RDMA/nes: Fix nes_nic_cm_xmit() error handling · 5962c2c8
      Faisal Latif 提交于
      We are getting crash or hung situation when we are running network
      cable pull tests during RDMA traffic.
      
      In schedule_nes_timer(), we return an error if nes_nic_cm_xmit()
      returns failure.  This is changed to success as skb is being put on
      the timer routines to be processed later.  In send_syn() case, we are
      indicating connect failure once from nes_connect() and the other when
      the rexmit retries expires.
      
      The other issue is skb->users which we are incrementing before calling
      nes_nic_cm_xmit() which calls dev_queue_xmit() but in case of failure
      we are decrementing the skb->users at the same time putting the skb on
      the rexmit path.  Even if dev_queue_xmit() fails, the skb->users is
      decremented already.  We are removing the decrement of skb->users in
      case of failure from both schedule_nes_timer() as well as from
      nes_cm_timer_tick().
      
      There is also extra check in nes_cm_timer_tick() for rexmit failure
      which does a break from the loop is removed.  This causes problem as
      the other nodes have their cm_node->ref_count incremented and are not
      processed.
      Signed-off-by: NFaisal Latif <faisal.latif@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      5962c2c8
    • F
      RDMA/nes: Fix error handling issues · 79fc3d74
      Faisal Latif 提交于
      Fix issues found by static code analysis:
      
      (1) Check if cm_node was successfully created for loopback connection.
      
      (2) schedule_nes_timer() does not free up allocated memory after
          encountering an error.  There is a WARN_ON() for this condition.
      
      (3) there is a cm_node->freed flag which is set but not used.
      Reported-by: NDan Carpenter <error27@gmail.com>
      Signed-off-by: NFaisal Latif <faisal.latif@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      79fc3d74
    • D
      RDMA/nes: Fix incorrect casts on 32-bit architectures · 7a5efb62
      Don Wood 提交于
      The were some incorrect casts to unsigned long that caused 64-bit values
      to be truncated on 32-bit architectures and made the driver pass invalid
      adresses and lengths to the hardware.  The problems were primarily seen
      with kernels with highmem configured but some could show up in
      non-highmem kernels, too.
      Signed-off-by: NDon Wood <donald.e.wood@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      7a5efb62
  6. 13 3月, 2009 1 次提交
    • F
      RDMA/nes: Don't allow userspace QPs to use STag zero · c12e56ef
      Faisal Latif 提交于
      STag zero is a special STag that allows consumers to access any bus
      address without registering memory.  The nes driver unfortunately
      allows STag zero to be used even with QPs created by unprivileged
      userspace consumers, which means that any process with direct verbs
      access to the nes device can read and write any memory accessible to
      the underlying PCI device (usually any memory in the system).  Such
      access is usually given for cluster software such as MPI to use, so
      this is a local privilege escalation bug on most systems running this
      driver.
      
      The driver was using STag zero to receive the last streaming mode
      data; to allow STag zero to be disabled for unprivileged QPs, the
      driver now registers a special MR for this data.
      
      Cc: <stable@kernel.org>
      Signed-off-by: NFaisal Latif <faisal.latif@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c12e56ef
  7. 07 3月, 2009 3 次提交
    • F
      RDMA/nes: Handle MPA Reject message properly · 9d5ab133
      Faisal Latif 提交于
      While doing testing, there are failures as MPA Reject call is not
      handled.  To handle MPA Reject call, following changes are done:
      
      *Handle inbound/outbound MPA Reject response message.
      	When nes_reject() is called for pending MPA request reply,
      	send the MPA Reject message to its peer (active
      	side)cm_node. The peer cm_node (active side) will indicate
      	Reject message event for the pending Connect Request.
      
      *Handle MPA Reject response message for loopback connections and listener.
      	When MPA Request is rejected, check if it is a loopback
      	connection and if it is then it will send Reject message event
      	to its peer loopback node. Also when destroying listener,
      	check if the cm_nodes for that listener are loopback or not.
      
      *Add gracefull connection close with the MPA Reject response message.
      	Send gracefull close (FIN, FIN ACK..) to terminate the cm_nodes.
      
      *Some code re-org while making the above changes.
      	Removed recv_list and recv_list_lock from the cm_node
      	structure as there can be only one receive close entry on the
      	timer. Also implemented handle_recv_entry() as receive close
      	entry is processed from both nes_rem_ref_cm_node() as well as
      	nes_cm_timer_tick().
      Signed-off-by: NFaisal Latif <faisal.latif@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      9d5ab133
    • C
      RDMA/nes: Fix tmp_addr compilation warning · 7b14ab0b
      Chien Tung 提交于
      In find_node(), tmp_addr causes an "unused variable" warning when
      INFINIBAND_NES_DEBUG is not defined.  It's only used in a nes_debug()
      and the print does not make sense.  So take out the whole thing.
      Reported-by: NManish Katiyar <mkatiyar@gmail.com>
      Signed-off-by: NChien Tung <chien.tin.tung@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      7b14ab0b
    • C
      RDMA/nes: Update copyright to new legal entity and year · cd6853d3
      Chien Tung 提交于
      Update copyright to the new legal entity, Intel-NE, Inc., an Intel
      company.  Update copyright for the new year.
      Signed-off-by: NChien Tung <chien.tin.tung@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      cd6853d3
  8. 11 1月, 2009 1 次提交
  9. 09 1月, 2009 1 次提交
  10. 25 12月, 2008 1 次提交
  11. 06 12月, 2008 7 次提交
  12. 31 10月, 2008 1 次提交
  13. 28 10月, 2008 1 次提交
  14. 10 10月, 2008 1 次提交
  15. 04 10月, 2008 1 次提交
  16. 01 10月, 2008 1 次提交
  17. 17 9月, 2008 1 次提交
  18. 25 7月, 2008 1 次提交
    • F
      RDMA/nes: CM connection setup/teardown rework · 6492cdf3
      Faisal Latif 提交于
      Major rework of CM connection setup/teardown.  We had a number of issues
      with MPI applications not starting/terminating properly over time.
      With these changes we were able to run longer on larger clusters.
      
      * Remove memory allocation from nes_connect() and nes_cm_connect().
      * Fix mini_cm_dec_refcnt_listen() when destroying listener.
      * Remove unnecessary code from schedule_nes_timer() and nes_cm_timer_tick().
      * Functionalize mini_cm_recv_pkt() and process_packet().
      * Clean up cm_node->ref_count usage.
      * Reuse skbs if available.
      Signed-off-by: NFaisal Latif <flatif@neteffect.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      6492cdf3