1. 14 11月, 2016 12 次提交
  2. 08 9月, 2016 1 次提交
  3. 16 8月, 2016 2 次提交
    • A
      xhci: really enqueue zero length TRBs. · 0d2daade
      Alban Browaeys 提交于
      Enqueue the first TRB even if full_len is zero.
      Without this "adb install <apk>" freezes the system.
      Signed-off-by: NAlban Browaeys <alban.browaeys@gmail.com>
      Fixes: 86065c27 ("xhci: don't rely on precalculated value of needed trbs in the enqueue loop")
      Signed-off-by: NMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0d2daade
    • M
      xhci: always handle "Command Ring Stopped" events · 33be1265
      Mathias Nyman 提交于
      Fix "Command completion event does not match command" errors by always
      handling the command ring stopped events.
      
      The command ring stopped event is generated as a result of aborting
      or stopping the command ring with a register write. It is not caused
      by a command in the command queue, and thus won't have a matching command
      in the comman list.
      
      Solve it by handling the command ring stopped event before checking for a
      matching command.
      
      In most command time out cases we abort the command ring, and get
      a command ring stopped event. The events command pointer will point at
      the current command ring dequeue, which in most cases matches the timed
      out command in the command list, and no error messages are seen.
      
      If we instead get a command aborted event before the command ring stopped
      event, the abort event will increse the command ring dequeue pointer, and
      the following command ring stopped events command pointer will point at the
      next, not yet queued command. This case triggered the error message
      Signed-off-by: NMathias Nyman <mathias.nyman@linux.intel.com>
      CC: <stable@vger.kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      33be1265
  4. 01 7月, 2016 1 次提交
    • A
      xhci: free the correct ring · f76a28a6
      Arnd Bergmann 提交于
      gcc warns about what first looks like a reference to an uninitialized
      variable:
      
      drivers/usb/host/xhci-ring.c: In function 'handle_cmd_completion':
      drivers/usb/host/xhci-ring.c:753:4: error: 'ep_ring' may be used uninitialized in this function [-Werror=maybe-uninitialized]
          xhci_unmap_td_bounce_buffer(xhci, ep_ring, cur_td);
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      drivers/usb/host/xhci-ring.c:647:20: note: 'ep_ring' was declared here
        struct xhci_ring *ep_ring;
                          ^~~~~~~
      
      It's clear to see that the list_empty() check means it can never be
      uninitialized, however it still looks wrong:
      
      When ep->cancelled_td_list contains more than one entry, the
      ep_ring variable will point to the ring that was retrieved
      from the last urb, and we have to look it up again in the
      second loop instead, which fixes the behavior and gets rid of the
      warning too.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Fixes: f9c589e1 ("xhci: TD-fragment, align the unsplittable case with a bounce buffer")
      Acked-by: NMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f76a28a6
  5. 27 6月, 2016 11 次提交
  6. 02 6月, 2016 2 次提交
    • M
      xhci: Fix handling timeouted commands on hosts in weird states. · 3425aa03
      Mathias Nyman 提交于
      If commands timeout we mark them for abortion, then stop the command
      ring, and turn the commands to no-ops and finally restart the command
      ring.
      
      If the host is working properly the no-op commands will finish and
      pending completions are called.
      If we notice the host is failing, driver clears the command ring and
      completes, deletes and frees all pending commands.
      
      There are two separate cases reported where host is believed to work
      properly but is not. In the first case we successfully stop the ring
      but no abort or stop command ring event is ever sent and host locks up.
      
      The second case is if a host is removed, command times out and driver
      believes the ring is stopped, and assumes it will be restarted, but
      actually ends up timing out on the same command forever.
      If one of the pending commands has the xhci->mutex held it will block
      xhci_stop() in the remove codepath which otherwise would cleanup pending
      commands.
      
      Add a check that clears all pending commands in case host is removed,
      or we are stuck timing out on the same command. Also restart the
      command timeout timer when stopping the command ring to ensure we
      recive an ring stop/abort event.
      
      Cc: stable <stable@vger.kernel.org>
      Tested-by: NJoe Lawrence <joe.lawrence@stratus.com>
      Signed-off-by: NMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3425aa03
    • G
      xhci: Cleanup only when releasing primary hcd · 27a41a83
      Gabriel Krisman Bertazi 提交于
      Under stress occasions some TI devices might not return early when
      reading the status register during the quirk invocation of xhci_irq made
      by usb_hcd_pci_remove.  This means that instead of returning, we end up
      handling this interruption in the middle of a shutdown.  Since
      xhci->event_ring has already been freed in xhci_mem_cleanup, we end up
      accessing freed memory, causing the Oops below.
      
      commit 8c24d6d7 ("usb: xhci: stop everything on the first call to
      xhci_stop") is the one that changed the instant in which we clean up the
      event queue when stopping a device.  Before, we didn't call
      xhci_mem_cleanup at the first time xhci_stop is executed (for the shared
      HCD), instead, we only did it after the invocation for the primary HCD,
      much later at the removal path.  The code flow for this oops looks like
      this:
      
      xhci_pci_remove()
      	usb_remove_hcd(xhci->shared)
      	        xhci_stop(xhci->shared)
       			xhci_halt()
      			xhci_mem_cleanup(xhci);  // Free the event_queue
      	usb_hcd_pci_remove(primary)
      		xhci_irq()  // Access the event_queue if STS_EINT is set. Crash.
      		xhci_stop()
      			xhci_halt()
      			// return early
      
      The fix modifies xhci_stop to only cleanup the xhci data when releasing
      the primary HCD.  This way, we still have the event_queue configured
      when invoking xhci_irq.  We still halt the device on the first call to
      xhci_stop, though.
      
      I could reproduce this issue several times on the mainline kernel by
      doing a bind-unbind stress test with a specific storage gadget attached.
      I also ran the same test over-night with my patch applied and didn't
      observe the issue anymore.
      
      [  113.334124] Unable to handle kernel paging request for data at address 0x00000028
      [  113.335514] Faulting instruction address: 0xd00000000d4f767c
      [  113.336839] Oops: Kernel access of bad area, sig: 11 [#1]
      [  113.338214] SMP NR_CPUS=1024 NUMA PowerNV
      
      [c000000efe47ba90] c000000000720850 usb_hcd_irq+0x50/0x80
      [c000000efe47bac0] c00000000073d328 usb_hcd_pci_remove+0x68/0x1f0
      [c000000efe47bb00] d00000000daf0128 xhci_pci_remove+0x78/0xb0
      [xhci_pci]
      [c000000efe47bb30] c00000000055cf70 pci_device_remove+0x70/0x110
      [c000000efe47bb70] c00000000061c6bc __device_release_driver+0xbc/0x190
      [c000000efe47bba0] c00000000061c7d0 device_release_driver+0x40/0x70
      [c000000efe47bbd0] c000000000619510 unbind_store+0x120/0x150
      [c000000efe47bc20] c0000000006183c4 drv_attr_store+0x64/0xa0
      [c000000efe47bc60] c00000000039f1d0 sysfs_kf_write+0x80/0xb0
      [c000000efe47bca0] c00000000039e14c kernfs_fop_write+0x18c/0x1f0
      [c000000efe47bcf0] c0000000002e962c __vfs_write+0x6c/0x190
      [c000000efe47bd90] c0000000002eab40 vfs_write+0xc0/0x200
      [c000000efe47bde0] c0000000002ec85c SyS_write+0x6c/0x110
      [c000000efe47be30] c000000000009260 system_call+0x38/0x108
      Signed-off-by: NGabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
      Cc: Roger Quadros <rogerq@ti.com>
      Cc: joel@jms.id.au
      Cc: stable@vger.kernel.org
      Reviewed-by: NRoger Quadros <rogerq@ti.com>
      Cc: <stable@vger.kernel.org> #v4.3+
      Tested-by: NJoel Stanley <joel@jms.id.au>
      Signed-off-by: NMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      27a41a83
  7. 27 4月, 2016 3 次提交
  8. 19 4月, 2016 1 次提交
  9. 14 4月, 2016 1 次提交
    • M
      xhci: fix 10 second timeout on removal of PCI hotpluggable xhci controllers · 98d74f9c
      Mathias Nyman 提交于
      PCI hotpluggable xhci controllers such as some Alpine Ridge solutions will
      remove the xhci controller from the PCI bus when the last USB device is
      disconnected.
      
      Add a flag to indicate that the host is being removed to avoid queueing
      configure_endpoint commands for the dropped endpoints.
      For PCI hotplugged controllers this will prevent 5 second command timeouts
      For static xhci controllers the configure_endpoint command is not needed
      in the removal case as everything will be returned, freed, and the
      controller is reset.
      
      For now the flag is only set for PCI connected host controllers.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      98d74f9c
  10. 15 2月, 2016 2 次提交
  11. 04 2月, 2016 2 次提交
  12. 12 12月, 2015 1 次提交
    • M
      xhci: fix usb2 resume timing and races. · f69115fd
      Mathias Nyman 提交于
      According to USB 2 specs ports need to signal resume for at least 20ms,
      in practice even longer, before moving to U0 state.
      Both host and devices can initiate resume.
      
      On device initiated resume, a port status interrupt with the port in resume
      state in issued. The interrupt handler tags a resume_done[port]
      timestamp with current time + USB_RESUME_TIMEOUT, and kick roothub timer.
      Root hub timer requests for port status, finds the port in resume state,
      checks if resume_done[port] timestamp passed, and set port to U0 state.
      
      On host initiated resume, current code sets the port to resume state,
      sleep 20ms, and finally sets the port to U0 state. This should also
      be changed to work in a similar way as the device initiated resume, with
      timestamp tagging, but that is not yet tested and will be a separate
      fix later.
      
      There are a few issues with this approach
      
      1. A host initiated resume will also generate a resume event. The event
         handler will find the port in resume state, believe it's a device
         initiated resume, and act accordingly.
      
      2. A port status request might cut the resume signalling short if a
         get_port_status request is handled during the host resume signalling.
         The port will be found in resume state. The timestamp is not set leading
         to time_after_eq(jiffies, timestamp) returning true, as timestamp = 0.
         get_port_status will proceed with moving the port to U0.
      
      3. If an error, or anything else happens to the port during device
         initiated resume signalling it will leave all the device resume
         parameters hanging uncleared, preventing further suspend, returning
         -EBUSY, and cause the pm thread to busyloop trying to enter suspend.
      
      Fix this by using the existing resuming_ports bitfield to indicate that
      resume signalling timing is taken care of.
      Check if the resume_done[port] is set before using it for timestamp
      comparison, and also clear out any resume signalling related variables
      if port is not in U0 or Resume state
      
      This issue was discovered when a PM thread busylooped, trying to runtime
      suspend the xhci USB 2 roothub on a Dell XPS
      
      Cc: stable <stable@vger.kernel.org>
      Reported-by: NDaniel J Blueman <daniel@quora.org>
      Tested-by: NDaniel J Blueman <daniel@quora.org>
      Signed-off-by: NMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f69115fd
  13. 02 12月, 2015 1 次提交