1. 27 6月, 2016 3 次提交
  2. 02 6月, 2016 2 次提交
    • M
      xhci: Fix handling timeouted commands on hosts in weird states. · 3425aa03
      Mathias Nyman 提交于
      If commands timeout we mark them for abortion, then stop the command
      ring, and turn the commands to no-ops and finally restart the command
      ring.
      
      If the host is working properly the no-op commands will finish and
      pending completions are called.
      If we notice the host is failing, driver clears the command ring and
      completes, deletes and frees all pending commands.
      
      There are two separate cases reported where host is believed to work
      properly but is not. In the first case we successfully stop the ring
      but no abort or stop command ring event is ever sent and host locks up.
      
      The second case is if a host is removed, command times out and driver
      believes the ring is stopped, and assumes it will be restarted, but
      actually ends up timing out on the same command forever.
      If one of the pending commands has the xhci->mutex held it will block
      xhci_stop() in the remove codepath which otherwise would cleanup pending
      commands.
      
      Add a check that clears all pending commands in case host is removed,
      or we are stuck timing out on the same command. Also restart the
      command timeout timer when stopping the command ring to ensure we
      recive an ring stop/abort event.
      
      Cc: stable <stable@vger.kernel.org>
      Tested-by: NJoe Lawrence <joe.lawrence@stratus.com>
      Signed-off-by: NMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3425aa03
    • G
      xhci: Cleanup only when releasing primary hcd · 27a41a83
      Gabriel Krisman Bertazi 提交于
      Under stress occasions some TI devices might not return early when
      reading the status register during the quirk invocation of xhci_irq made
      by usb_hcd_pci_remove.  This means that instead of returning, we end up
      handling this interruption in the middle of a shutdown.  Since
      xhci->event_ring has already been freed in xhci_mem_cleanup, we end up
      accessing freed memory, causing the Oops below.
      
      commit 8c24d6d7 ("usb: xhci: stop everything on the first call to
      xhci_stop") is the one that changed the instant in which we clean up the
      event queue when stopping a device.  Before, we didn't call
      xhci_mem_cleanup at the first time xhci_stop is executed (for the shared
      HCD), instead, we only did it after the invocation for the primary HCD,
      much later at the removal path.  The code flow for this oops looks like
      this:
      
      xhci_pci_remove()
      	usb_remove_hcd(xhci->shared)
      	        xhci_stop(xhci->shared)
       			xhci_halt()
      			xhci_mem_cleanup(xhci);  // Free the event_queue
      	usb_hcd_pci_remove(primary)
      		xhci_irq()  // Access the event_queue if STS_EINT is set. Crash.
      		xhci_stop()
      			xhci_halt()
      			// return early
      
      The fix modifies xhci_stop to only cleanup the xhci data when releasing
      the primary HCD.  This way, we still have the event_queue configured
      when invoking xhci_irq.  We still halt the device on the first call to
      xhci_stop, though.
      
      I could reproduce this issue several times on the mainline kernel by
      doing a bind-unbind stress test with a specific storage gadget attached.
      I also ran the same test over-night with my patch applied and didn't
      observe the issue anymore.
      
      [  113.334124] Unable to handle kernel paging request for data at address 0x00000028
      [  113.335514] Faulting instruction address: 0xd00000000d4f767c
      [  113.336839] Oops: Kernel access of bad area, sig: 11 [#1]
      [  113.338214] SMP NR_CPUS=1024 NUMA PowerNV
      
      [c000000efe47ba90] c000000000720850 usb_hcd_irq+0x50/0x80
      [c000000efe47bac0] c00000000073d328 usb_hcd_pci_remove+0x68/0x1f0
      [c000000efe47bb00] d00000000daf0128 xhci_pci_remove+0x78/0xb0
      [xhci_pci]
      [c000000efe47bb30] c00000000055cf70 pci_device_remove+0x70/0x110
      [c000000efe47bb70] c00000000061c6bc __device_release_driver+0xbc/0x190
      [c000000efe47bba0] c00000000061c7d0 device_release_driver+0x40/0x70
      [c000000efe47bbd0] c000000000619510 unbind_store+0x120/0x150
      [c000000efe47bc20] c0000000006183c4 drv_attr_store+0x64/0xa0
      [c000000efe47bc60] c00000000039f1d0 sysfs_kf_write+0x80/0xb0
      [c000000efe47bca0] c00000000039e14c kernfs_fop_write+0x18c/0x1f0
      [c000000efe47bcf0] c0000000002e962c __vfs_write+0x6c/0x190
      [c000000efe47bd90] c0000000002eab40 vfs_write+0xc0/0x200
      [c000000efe47bde0] c0000000002ec85c SyS_write+0x6c/0x110
      [c000000efe47be30] c000000000009260 system_call+0x38/0x108
      Signed-off-by: NGabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
      Cc: Roger Quadros <rogerq@ti.com>
      Cc: joel@jms.id.au
      Cc: stable@vger.kernel.org
      Reviewed-by: NRoger Quadros <rogerq@ti.com>
      Cc: <stable@vger.kernel.org> #v4.3+
      Tested-by: NJoel Stanley <joel@jms.id.au>
      Signed-off-by: NMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      27a41a83
  3. 27 4月, 2016 3 次提交
  4. 19 4月, 2016 1 次提交
  5. 14 4月, 2016 1 次提交
    • M
      xhci: fix 10 second timeout on removal of PCI hotpluggable xhci controllers · 98d74f9c
      Mathias Nyman 提交于
      PCI hotpluggable xhci controllers such as some Alpine Ridge solutions will
      remove the xhci controller from the PCI bus when the last USB device is
      disconnected.
      
      Add a flag to indicate that the host is being removed to avoid queueing
      configure_endpoint commands for the dropped endpoints.
      For PCI hotplugged controllers this will prevent 5 second command timeouts
      For static xhci controllers the configure_endpoint command is not needed
      in the removal case as everything will be returned, freed, and the
      controller is reset.
      
      For now the flag is only set for PCI connected host controllers.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      98d74f9c
  6. 15 2月, 2016 2 次提交
  7. 04 2月, 2016 2 次提交
  8. 12 12月, 2015 1 次提交
    • M
      xhci: fix usb2 resume timing and races. · f69115fd
      Mathias Nyman 提交于
      According to USB 2 specs ports need to signal resume for at least 20ms,
      in practice even longer, before moving to U0 state.
      Both host and devices can initiate resume.
      
      On device initiated resume, a port status interrupt with the port in resume
      state in issued. The interrupt handler tags a resume_done[port]
      timestamp with current time + USB_RESUME_TIMEOUT, and kick roothub timer.
      Root hub timer requests for port status, finds the port in resume state,
      checks if resume_done[port] timestamp passed, and set port to U0 state.
      
      On host initiated resume, current code sets the port to resume state,
      sleep 20ms, and finally sets the port to U0 state. This should also
      be changed to work in a similar way as the device initiated resume, with
      timestamp tagging, but that is not yet tested and will be a separate
      fix later.
      
      There are a few issues with this approach
      
      1. A host initiated resume will also generate a resume event. The event
         handler will find the port in resume state, believe it's a device
         initiated resume, and act accordingly.
      
      2. A port status request might cut the resume signalling short if a
         get_port_status request is handled during the host resume signalling.
         The port will be found in resume state. The timestamp is not set leading
         to time_after_eq(jiffies, timestamp) returning true, as timestamp = 0.
         get_port_status will proceed with moving the port to U0.
      
      3. If an error, or anything else happens to the port during device
         initiated resume signalling it will leave all the device resume
         parameters hanging uncleared, preventing further suspend, returning
         -EBUSY, and cause the pm thread to busyloop trying to enter suspend.
      
      Fix this by using the existing resuming_ports bitfield to indicate that
      resume signalling timing is taken care of.
      Check if the resume_done[port] is set before using it for timestamp
      comparison, and also clear out any resume signalling related variables
      if port is not in U0 or Resume state
      
      This issue was discovered when a PM thread busylooped, trying to runtime
      suspend the xhci USB 2 roothub on a Dell XPS
      
      Cc: stable <stable@vger.kernel.org>
      Reported-by: NDaniel J Blueman <daniel@quora.org>
      Tested-by: NDaniel J Blueman <daniel@quora.org>
      Signed-off-by: NMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f69115fd
  9. 02 12月, 2015 1 次提交
  10. 19 11月, 2015 1 次提交
  11. 17 10月, 2015 3 次提交
  12. 04 10月, 2015 2 次提交
  13. 22 9月, 2015 2 次提交
  14. 09 8月, 2015 3 次提交
  15. 04 8月, 2015 1 次提交
    • M
      xhci: fix off by one error in TRB DMA address boundary check · 7895086a
      Mathias Nyman 提交于
      We need to check that a TRB is part of the current segment
      before calculating its DMA address.
      
      Previously a ring segment didn't use a full memory page, and every
      new ring segment got a new memory page, so the off by one
      error in checking the upper bound was never seen.
      
      Now that we use a full memory page, 256 TRBs (4096 bytes), the off by one
      didn't catch the case when a TRB was the first element of the next segment.
      
      This is triggered if the virtual memory pages for a ring segment are
      next to each in increasing order where the ring buffer wraps around and
      causes errors like:
      
      [  106.398223] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 0 comp_code 1
      [  106.398230] xhci_hcd 0000:00:14.0: Looking for event-dma fffd3000 trb-start fffd4fd0 trb-end fffd5000 seg-start fffd4000 seg-end fffd4ff0
      
      The trb-end address is one outside the end-seg address.
      
      Cc: <stable@vger.kernel.org>
      Tested-by: NArkadiusz Miśkiewicz <arekm@maven.pl>
      Signed-off-by: NMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7895086a
  16. 23 7月, 2015 1 次提交
  17. 31 5月, 2015 2 次提交
    • M
      xhci: Return correct number of tranferred bytes for stalled control endpoints · 22ae47e6
      Mathias Nyman 提交于
      Fix the xhci driver from bluntly setting the transferred length to 0 if
      we get a STALL on anything else than the data stage of a control transfer.
      Signed-off-by: NMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      22ae47e6
    • R
      usb: xhci: fix xhci locking up during hcd remove · ad6b1d91
      Roger Quadros 提交于
      The problem seems to be that if a new device is detected
      while we have already removed the shared HCD, then many of the
      xhci operations (e.g.  xhci_alloc_dev(), xhci_setup_device())
      hang as command never completes.
      
      I don't think XHCI can operate without the shared HCD as we've
      already called xhci_halt() in xhci_only_stop_hcd() when shared HCD
      goes away. We need to prevent new commands from being queued
      not only when HCD is dying but also when HCD is halted.
      
      The following lockup was detected while testing the otg state
      machine.
      
      [  178.199951] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
      [  178.205799] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 1
      [  178.214458] xhci-hcd xhci-hcd.0.auto: hcc params 0x0220f04c hci version 0x100 quirks 0x00010010
      [  178.223619] xhci-hcd xhci-hcd.0.auto: irq 400, io mem 0x48890000
      [  178.230677] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
      [  178.237796] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
      [  178.245358] usb usb1: Product: xHCI Host Controller
      [  178.250483] usb usb1: Manufacturer: Linux 4.0.0-rc1-00024-g6111320 xhci-hcd
      [  178.257783] usb usb1: SerialNumber: xhci-hcd.0.auto
      [  178.267014] hub 1-0:1.0: USB hub found
      [  178.272108] hub 1-0:1.0: 1 port detected
      [  178.278371] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
      [  178.284171] xhci-hcd xhci-hcd.0.auto: new USB bus registered, assigned bus number 2
      [  178.294038] usb usb2: New USB device found, idVendor=1d6b, idProduct=0003
      [  178.301183] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
      [  178.308776] usb usb2: Product: xHCI Host Controller
      [  178.313902] usb usb2: Manufacturer: Linux 4.0.0-rc1-00024-g6111320 xhci-hcd
      [  178.321222] usb usb2: SerialNumber: xhci-hcd.0.auto
      [  178.329061] hub 2-0:1.0: USB hub found
      [  178.333126] hub 2-0:1.0: 1 port detected
      [  178.567585] dwc3 48890000.usb: usb_otg_start_host 0
      [  178.572707] xhci-hcd xhci-hcd.0.auto: remove, state 4
      [  178.578064] usb usb2: USB disconnect, device number 1
      [  178.586565] xhci-hcd xhci-hcd.0.auto: USB bus 2 deregistered
      [  178.592585] xhci-hcd xhci-hcd.0.auto: remove, state 1
      [  178.597924] usb usb1: USB disconnect, device number 1
      [  178.603248] usb 1-1: new high-speed USB device number 2 using xhci-hcd
      [  190.597337] INFO: task kworker/u4:0:6 blocked for more than 10 seconds.
      [  190.604273]       Not tainted 4.0.0-rc1-00024-g6111320 #1058
      [  190.610228] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  190.618443] kworker/u4:0    D c05c0ac0     0     6      2 0x00000000
      [  190.625120] Workqueue: usb_otg usb_otg_work
      [  190.629533] [<c05c0ac0>] (__schedule) from [<c05c10ac>] (schedule+0x34/0x98)
      [  190.636915] [<c05c10ac>] (schedule) from [<c05c1318>] (schedule_preempt_disabled+0xc/0x10)
      [  190.645591] [<c05c1318>] (schedule_preempt_disabled) from [<c05c23d0>] (mutex_lock_nested+0x1ac/0x3fc)
      [  190.655353] [<c05c23d0>] (mutex_lock_nested) from [<c046cf8c>] (usb_disconnect+0x3c/0x208)
      [  190.664043] [<c046cf8c>] (usb_disconnect) from [<c0470cf0>] (_usb_remove_hcd+0x98/0x1d8)
      [  190.672535] [<c0470cf0>] (_usb_remove_hcd) from [<c0485da8>] (usb_otg_start_host+0x50/0xf4)
      [  190.681299] [<c0485da8>] (usb_otg_start_host) from [<c04849a4>] (otg_set_protocol+0x5c/0xd0)
      [  190.690153] [<c04849a4>] (otg_set_protocol) from [<c0484b88>] (otg_set_state+0x170/0xbfc)
      [  190.698735] [<c0484b88>] (otg_set_state) from [<c0485740>] (otg_statemachine+0x12c/0x470)
      [  190.707326] [<c0485740>] (otg_statemachine) from [<c0053c84>] (process_one_work+0x1b4/0x4a0)
      [  190.716162] [<c0053c84>] (process_one_work) from [<c00540f8>] (worker_thread+0x154/0x44c)
      [  190.724742] [<c00540f8>] (worker_thread) from [<c0058f88>] (kthread+0xd4/0xf0)
      [  190.732328] [<c0058f88>] (kthread) from [<c000e810>] (ret_from_fork+0x14/0x24)
      [  190.739898] 5 locks held by kworker/u4:0/6:
      [  190.744274]  #0:  ("%s""usb_otg"){.+.+.+}, at: [<c0053bf4>] process_one_work+0x124/0x4a0
      [  190.752799]  #1:  ((&otgd->work)){+.+.+.}, at: [<c0053bf4>] process_one_work+0x124/0x4a0
      [  190.761326]  #2:  (&otgd->fsm.lock){+.+.+.}, at: [<c048562c>] otg_statemachine+0x18/0x470
      [  190.769934]  #3:  (usb_bus_list_lock){+.+.+.}, at: [<c0470ce8>] _usb_remove_hcd+0x90/0x1d8
      [  190.778635]  #4:  (&dev->mutex){......}, at: [<c046cf8c>] usb_disconnect+0x3c/0x208
      [  190.786700] INFO: task kworker/1:0:14 blocked for more than 10 seconds.
      [  190.793633]       Not tainted 4.0.0-rc1-00024-g6111320 #1058
      [  190.799567] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  190.807783] kworker/1:0     D c05c0ac0     0    14      2 0x00000000
      [  190.814457] Workqueue: usb_hub_wq hub_event
      [  190.818866] [<c05c0ac0>] (__schedule) from [<c05c10ac>] (schedule+0x34/0x98)
      [  190.826252] [<c05c10ac>] (schedule) from [<c05c4e40>] (schedule_timeout+0x13c/0x1ec)
      [  190.834377] [<c05c4e40>] (schedule_timeout) from [<c05c19f0>] (wait_for_common+0xbc/0x150)
      [  190.843062] [<c05c19f0>] (wait_for_common) from [<bf068a3c>] (xhci_setup_device+0x164/0x5cc [xhci_hcd])
      [  190.852986] [<bf068a3c>] (xhci_setup_device [xhci_hcd]) from [<c046b7f4>] (hub_port_init+0x3f4/0xb10)
      [  190.862667] [<c046b7f4>] (hub_port_init) from [<c046eb64>] (hub_event+0x704/0x1018)
      [  190.870704] [<c046eb64>] (hub_event) from [<c0053c84>] (process_one_work+0x1b4/0x4a0)
      [  190.878919] [<c0053c84>] (process_one_work) from [<c00540f8>] (worker_thread+0x154/0x44c)
      [  190.887503] [<c00540f8>] (worker_thread) from [<c0058f88>] (kthread+0xd4/0xf0)
      [  190.895076] [<c0058f88>] (kthread) from [<c000e810>] (ret_from_fork+0x14/0x24)
      [  190.902650] 5 locks held by kworker/1:0/14:
      [  190.907023]  #0:  ("usb_hub_wq"){.+.+.+}, at: [<c0053bf4>] process_one_work+0x124/0x4a0
      [  190.915454]  #1:  ((&hub->events)){+.+.+.}, at: [<c0053bf4>] process_one_work+0x124/0x4a0
      [  190.924070]  #2:  (&dev->mutex){......}, at: [<c046e490>] hub_event+0x30/0x1018
      [  190.931768]  #3:  (&port_dev->status_lock){+.+.+.}, at: [<c046eb50>] hub_event+0x6f0/0x1018
      [  190.940558]  #4:  (&bus->usb_address0_mutex){+.+.+.}, at: [<c046b458>] hub_port_init+0x58/0xb10
      Signed-off-by: NRoger Quadros <rogerq@ti.com>
      Signed-off-by: NMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ad6b1d91
  18. 10 5月, 2015 2 次提交
  19. 08 4月, 2015 1 次提交
  20. 18 3月, 2015 1 次提交
  21. 11 3月, 2015 1 次提交
  22. 07 3月, 2015 1 次提交
    • A
      xhci: fix reporting of 0-sized URBs in control endpoint · 45ba2154
      Aleksander Morgado 提交于
      When a control transfer has a short data stage, the xHCI controller generates
      two transfer events: a COMP_SHORT_TX event that specifies the untransferred
      amount, and a COMP_SUCCESS event. But when the data stage is not short, only the
      COMP_SUCCESS event occurs. Therefore, xhci-hcd must set urb->actual_length to
      urb->transfer_buffer_length while processing the COMP_SUCCESS event, unless
      urb->actual_length was set already by a previous COMP_SHORT_TX event.
      
      The driver checks this by seeing whether urb->actual_length == 0, but this alone
      is the wrong test, as it is entirely possible for a short transfer to have an
      urb->actual_length = 0.
      
      This patch changes the xhci driver to rely on a new td->urb_length_set flag,
      which is set to true when a COMP_SHORT_TX event is received and the URB length
      updated at that stage.
      
      This fixes a bug which affected the HSO plugin, which relies on URBs with
      urb->actual_length == 0 to halt re-submitting the RX URB in the control
      endpoint.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAleksander Morgado <aleksander@aleksander.es>
      Signed-off-by: NMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      45ba2154
  23. 25 2月, 2015 1 次提交
  24. 10 1月, 2015 2 次提交