1. 02 6月, 2016 2 次提交
    • M
      xhci: Fix handling timeouted commands on hosts in weird states. · 3425aa03
      Mathias Nyman 提交于
      If commands timeout we mark them for abortion, then stop the command
      ring, and turn the commands to no-ops and finally restart the command
      ring.
      
      If the host is working properly the no-op commands will finish and
      pending completions are called.
      If we notice the host is failing, driver clears the command ring and
      completes, deletes and frees all pending commands.
      
      There are two separate cases reported where host is believed to work
      properly but is not. In the first case we successfully stop the ring
      but no abort or stop command ring event is ever sent and host locks up.
      
      The second case is if a host is removed, command times out and driver
      believes the ring is stopped, and assumes it will be restarted, but
      actually ends up timing out on the same command forever.
      If one of the pending commands has the xhci->mutex held it will block
      xhci_stop() in the remove codepath which otherwise would cleanup pending
      commands.
      
      Add a check that clears all pending commands in case host is removed,
      or we are stuck timing out on the same command. Also restart the
      command timeout timer when stopping the command ring to ensure we
      recive an ring stop/abort event.
      
      Cc: stable <stable@vger.kernel.org>
      Tested-by: NJoe Lawrence <joe.lawrence@stratus.com>
      Signed-off-by: NMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3425aa03
    • G
      xhci: Cleanup only when releasing primary hcd · 27a41a83
      Gabriel Krisman Bertazi 提交于
      Under stress occasions some TI devices might not return early when
      reading the status register during the quirk invocation of xhci_irq made
      by usb_hcd_pci_remove.  This means that instead of returning, we end up
      handling this interruption in the middle of a shutdown.  Since
      xhci->event_ring has already been freed in xhci_mem_cleanup, we end up
      accessing freed memory, causing the Oops below.
      
      commit 8c24d6d7 ("usb: xhci: stop everything on the first call to
      xhci_stop") is the one that changed the instant in which we clean up the
      event queue when stopping a device.  Before, we didn't call
      xhci_mem_cleanup at the first time xhci_stop is executed (for the shared
      HCD), instead, we only did it after the invocation for the primary HCD,
      much later at the removal path.  The code flow for this oops looks like
      this:
      
      xhci_pci_remove()
      	usb_remove_hcd(xhci->shared)
      	        xhci_stop(xhci->shared)
       			xhci_halt()
      			xhci_mem_cleanup(xhci);  // Free the event_queue
      	usb_hcd_pci_remove(primary)
      		xhci_irq()  // Access the event_queue if STS_EINT is set. Crash.
      		xhci_stop()
      			xhci_halt()
      			// return early
      
      The fix modifies xhci_stop to only cleanup the xhci data when releasing
      the primary HCD.  This way, we still have the event_queue configured
      when invoking xhci_irq.  We still halt the device on the first call to
      xhci_stop, though.
      
      I could reproduce this issue several times on the mainline kernel by
      doing a bind-unbind stress test with a specific storage gadget attached.
      I also ran the same test over-night with my patch applied and didn't
      observe the issue anymore.
      
      [  113.334124] Unable to handle kernel paging request for data at address 0x00000028
      [  113.335514] Faulting instruction address: 0xd00000000d4f767c
      [  113.336839] Oops: Kernel access of bad area, sig: 11 [#1]
      [  113.338214] SMP NR_CPUS=1024 NUMA PowerNV
      
      [c000000efe47ba90] c000000000720850 usb_hcd_irq+0x50/0x80
      [c000000efe47bac0] c00000000073d328 usb_hcd_pci_remove+0x68/0x1f0
      [c000000efe47bb00] d00000000daf0128 xhci_pci_remove+0x78/0xb0
      [xhci_pci]
      [c000000efe47bb30] c00000000055cf70 pci_device_remove+0x70/0x110
      [c000000efe47bb70] c00000000061c6bc __device_release_driver+0xbc/0x190
      [c000000efe47bba0] c00000000061c7d0 device_release_driver+0x40/0x70
      [c000000efe47bbd0] c000000000619510 unbind_store+0x120/0x150
      [c000000efe47bc20] c0000000006183c4 drv_attr_store+0x64/0xa0
      [c000000efe47bc60] c00000000039f1d0 sysfs_kf_write+0x80/0xb0
      [c000000efe47bca0] c00000000039e14c kernfs_fop_write+0x18c/0x1f0
      [c000000efe47bcf0] c0000000002e962c __vfs_write+0x6c/0x190
      [c000000efe47bd90] c0000000002eab40 vfs_write+0xc0/0x200
      [c000000efe47bde0] c0000000002ec85c SyS_write+0x6c/0x110
      [c000000efe47be30] c000000000009260 system_call+0x38/0x108
      Signed-off-by: NGabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
      Cc: Roger Quadros <rogerq@ti.com>
      Cc: joel@jms.id.au
      Cc: stable@vger.kernel.org
      Reviewed-by: NRoger Quadros <rogerq@ti.com>
      Cc: <stable@vger.kernel.org> #v4.3+
      Tested-by: NJoel Stanley <joel@jms.id.au>
      Signed-off-by: NMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      27a41a83
  2. 01 6月, 2016 1 次提交
  3. 31 5月, 2016 16 次提交
  4. 14 5月, 2016 1 次提交
  5. 11 5月, 2016 1 次提交
  6. 10 5月, 2016 10 次提交
  7. 09 5月, 2016 7 次提交
  8. 05 5月, 2016 2 次提交
反馈
建议
客服 返回
顶部