1. 07 9月, 2018 12 次提交
    • L
      drm/nouveau/drm/nouveau: Don't forget to cancel hpd_work on suspend/unload · 2f7ca781
      Lyude Paul 提交于
      Currently, there's nothing in nouveau that actually cancels this work
      struct. So, cancel it on suspend/unload. Otherwise, if we're unlucky
      enough hpd_work might try to keep running up until the system is
      suspended.
      Signed-off-by: NLyude Paul <lyude@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NBen Skeggs <bskeggs@redhat.com>
      2f7ca781
    • L
      drm/nouveau/drm/nouveau: Prevent handling ACPI HPD events too early · 79e765ad
      Lyude Paul 提交于
      On most systems with ACPI hotplugging support, it seems that we always
      receive a hotplug event once we re-enable EC interrupts even if the GPU
      hasn't even been resumed yet.
      
      This can cause problems since even though we schedule hpd_work to handle
      connector reprobing for us, hpd_work synchronizes on
      pm_runtime_get_sync() to wait until the device is ready to perform
      reprobing. Since runtime suspend/resume callbacks are disabled before
      the PM core calls ->suspend(), any calls to pm_runtime_get_sync() during
      this period will grab a runtime PM ref and return immediately with
      -EACCES. Because we schedule hpd_work from our ACPI HPD handler, and
      hpd_work synchronizes on pm_runtime_get_sync(), this causes us to launch
      a connector reprobe immediately even if the GPU isn't actually resumed
      just yet. This causes various warnings in dmesg and occasionally, also
      prevents some displays connected to the dedicated GPU from coming back
      up after suspend. Example:
      
      usb 1-4: USB disconnect, device number 14
      usb 1-4.1: USB disconnect, device number 15
      WARNING: CPU: 0 PID: 838 at drivers/gpu/drm/nouveau/include/nvkm/subdev/i2c.h:170 nouveau_dp_detect+0x17e/0x370 [nouveau]
      CPU: 0 PID: 838 Comm: kworker/0:6 Not tainted 4.17.14-201.Lyude.bz1477182.V3.fc28.x86_64 #1
      Hardware name: LENOVO 20EQS64N00/20EQS64N00, BIOS N1EET77W (1.50 ) 03/28/2018
      Workqueue: events nouveau_display_hpd_work [nouveau]
      RIP: 0010:nouveau_dp_detect+0x17e/0x370 [nouveau]
      RSP: 0018:ffffa15143933cf0 EFLAGS: 00010293
      RAX: 0000000000000000 RBX: ffff8cb4f656c400 RCX: 0000000000000000
      RDX: ffffa1514500e4e4 RSI: ffffa1514500e4e4 RDI: 0000000001009002
      RBP: ffff8cb4f4a8a800 R08: ffffa15143933cfd R09: ffffa15143933cfc
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff8cb4fb57a000
      R13: ffff8cb4fb57a000 R14: ffff8cb4f4a8f800 R15: ffff8cb4f656c418
      FS:  0000000000000000(0000) GS:ffff8cb51f400000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f78ec938000 CR3: 000000073720a003 CR4: 00000000003606f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       ? _cond_resched+0x15/0x30
       nouveau_connector_detect+0x2ce/0x520 [nouveau]
       ? _cond_resched+0x15/0x30
       ? ww_mutex_lock+0x12/0x40
       drm_helper_probe_detect_ctx+0x8b/0xe0 [drm_kms_helper]
       drm_helper_hpd_irq_event+0xa8/0x120 [drm_kms_helper]
       nouveau_display_hpd_work+0x2a/0x60 [nouveau]
       process_one_work+0x187/0x340
       worker_thread+0x2e/0x380
       ? pwq_unbound_release_workfn+0xd0/0xd0
       kthread+0x112/0x130
       ? kthread_create_worker_on_cpu+0x70/0x70
       ret_from_fork+0x35/0x40
      Code: 4c 8d 44 24 0d b9 00 05 00 00 48 89 ef ba 09 00 00 00 be 01 00 00 00 e8 e1 09 f8 ff 85 c0 0f 85 b2 01 00 00 80 7c 24 0c 03 74 02 <0f> 0b 48 89 ef e8 b8 07 f8 ff f6 05 51 1b c8 ff 02 0f 84 72 ff
      ---[ end trace 55d811b38fc8e71a ]---
      
      So, to fix this we attempt to grab a runtime PM reference in the ACPI
      handler itself asynchronously. If the GPU is already awake (it will have
      normal hotplugging at this point) or runtime PM callbacks are currently
      disabled on the device, we drop our reference without updating the
      autosuspend delay. We only schedule connector reprobes when we
      successfully managed to queue up a resume request with our asynchronous
      PM ref.
      
      This also has the added benefit of preventing redundant connector
      reprobes from ACPI while the GPU is runtime resumed!
      Signed-off-by: NLyude Paul <lyude@redhat.com>
      Cc: stable@vger.kernel.org
      Cc: Karol Herbst <kherbst@redhat.com>
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1477182#c41Signed-off-by: NLyude Paul <lyude@redhat.com>
      Signed-off-by: NBen Skeggs <bskeggs@redhat.com>
      79e765ad
    • L
      drm/nouveau: Reset MST branching unit before enabling · fa3cdf8d
      Lyude Paul 提交于
      When probing a new MST device, it's not safe to make any assumptions
      about it's current state. While most well mannered MST hubs will just
      disable the branching unit on hotplug disconnects, this isn't enough to
      save us from various other scenarios that might have resulted in
      something writing to the MST branching unit before we got control of it.
      This could happen if a previous probe we tried failed, if we're booting
      in kexec context and the hub is still in the state the last kernel put
      it in, etc.
      
      Luckily; there is no reason we can't just reset the branching unit
      every time we enable a new topology. So, fix this by resetting it on
      enabling new topologies to ensure that we always start off with a clean,
      unmodified topology state on MST sinks.
      
      This fixes occasional hard-lockups on my P50's laptop dock (e.g. AUX
      times out all DPCD trasactions) observed after multiple docks, undocks,
      and module reloads.
      Signed-off-by: NLyude Paul <lyude@redhat.com>
      Cc: stable@vger.kernel.org
      Cc: Karol Herbst <karolherbst@gmail.com>
      Signed-off-by: NBen Skeggs <bskeggs@redhat.com>
      fa3cdf8d
    • L
      drm/nouveau: Only write DP_MSTM_CTRL when needed · b26b4590
      Lyude Paul 提交于
      Currently, nouveau will re-write the DP_MSTM_CTRL register for an MST
      hub every time it receives a long HPD pulse on DP. This isn't actually
      necessary and additionally, has some unintended side effects.
      
      With the P50 I've got here, rewriting DP_MSTM_CTRL constantly seems to
      make it rather likely (1 out of 5 times usually) that bringing up MST
      with it's ThinkPad dock will fail and result in sideband messages timing
      out in the middle. Afterwards, successive probes don't manage to get the
      dock to communicate properly over MST sideband properly.
      
      Many times sideband message timeouts from MST hubs are indicative of
      either the source or the sink dropping an ESI event, which can cause
      DRM's perspective of the topology's current state to go out of sync with
      reality. While it's tough to really know for sure what's happening to
      the dock, using userspace tools to write to DP_MSTM_CTRL in the middle
      of the MST link probing process does appear to make things flaky. It's
      possible that when we write to DP_MSTM_CTRL, the function that gets
      triggered to respond in the dock's firmware temporarily puts it in a
      state where it might end up not reporting an ESI to the source, or ends
      up dropping a sideband message we sent it.
      
      So, to fix this we make it so that when probing an MST topology, we
      respect it's current state. If the dock's already enabled, we simply
      read DP_MSTM_CTRL and disable the topology if it's value is not what we
      expected. Otherwise, we perform the normal MST probing dance. We avoid
      taking any action except if the state of the MST topology actually
      changes.
      
      This fixes MST sideband message timeouts and detection failures on my
      P50 with its ThinkPad dock.
      Signed-off-by: NLyude Paul <lyude@redhat.com>
      Cc: stable@vger.kernel.org
      Cc: Karol Herbst <karolherbst@gmail.com>
      Signed-off-by: NBen Skeggs <bskeggs@redhat.com>
      b26b4590
    • L
      drm/nouveau: Remove useless poll_enable() call in drm_load() · 7326ead9
      Lyude Paul 提交于
      Again, this doesn't do anything. drm_kms_helper_poll_enable() will have
      already been called in nouveau_display_init()
      Signed-off-by: NLyude Paul <lyude@redhat.com>
      Reviewed-by: NKarol Herbst <kherbst@redhat.com>
      Acked-by: NDaniel Vetter <daniel@ffwll.ch>
      Cc: Lukas Wunner <lukas@wunner.de>
      Signed-off-by: NBen Skeggs <bskeggs@redhat.com>
      7326ead9
    • L
      drm/nouveau: Remove useless poll_disable() call in switcheroo_set_state() · 0d7b2d4d
      Lyude Paul 提交于
      This won't do anything but potentially make us miss hotplugs. We already
      call drm_kms_helper_poll_disable() in
      nouveau_pmops_suspend()->nouveau_display_suspend()->nouveau_display_fini()
      Signed-off-by: NLyude Paul <lyude@redhat.com>
      Reviewed-by: NKarol Herbst <kherbst@redhat.com>
      Acked-by: NDaniel Vetter <daniel@ffwll.ch>
      Cc: Lukas Wunner <lukas@wunner.de>
      Signed-off-by: NBen Skeggs <bskeggs@redhat.com>
      0d7b2d4d
    • L
      drm/nouveau: Remove useless poll_enable() call in switcheroo_set_state() · 0445f753
      Lyude Paul 提交于
      This doesn't do anything, drm_kms_helper_poll_enable() gets called in
      nouveau_pmops_resume()->nouveau_display_resume()->nouveau_display_init()
      already.
      Signed-off-by: NLyude Paul <lyude@redhat.com>
      Reviewed-by: NKarol Herbst <kherbst@redhat.com>
      Acked-by: NDaniel Vetter <daniel@ffwll.ch>
      Cc: Lukas Wunner <lukas@wunner.de>
      Signed-off-by: NBen Skeggs <bskeggs@redhat.com>
      0445f753
    • L
      drm/nouveau: Fix deadlocks in nouveau_connector_detect() · 3e1a1275
      Lyude Paul 提交于
      When we disable hotplugging on the GPU, we need to be able to
      synchronize with each connector's hotplug interrupt handler before the
      interrupt is finally disabled. This can be a problem however, since
      nouveau_connector_detect() currently grabs a runtime power reference
      when handling connector probing. This will deadlock the runtime suspend
      handler like so:
      
      [  861.480896] INFO: task kworker/0:2:61 blocked for more than 120 seconds.
      [  861.483290]       Tainted: G           O      4.18.0-rc6Lyude-Test+ #1
      [  861.485158] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  861.486332] kworker/0:2     D    0    61      2 0x80000000
      [  861.487044] Workqueue: events nouveau_display_hpd_work [nouveau]
      [  861.487737] Call Trace:
      [  861.488394]  __schedule+0x322/0xaf0
      [  861.489070]  schedule+0x33/0x90
      [  861.489744]  rpm_resume+0x19c/0x850
      [  861.490392]  ? finish_wait+0x90/0x90
      [  861.491068]  __pm_runtime_resume+0x4e/0x90
      [  861.491753]  nouveau_display_hpd_work+0x22/0x60 [nouveau]
      [  861.492416]  process_one_work+0x231/0x620
      [  861.493068]  worker_thread+0x44/0x3a0
      [  861.493722]  kthread+0x12b/0x150
      [  861.494342]  ? wq_pool_ids_show+0x140/0x140
      [  861.494991]  ? kthread_create_worker_on_cpu+0x70/0x70
      [  861.495648]  ret_from_fork+0x3a/0x50
      [  861.496304] INFO: task kworker/6:2:320 blocked for more than 120 seconds.
      [  861.496968]       Tainted: G           O      4.18.0-rc6Lyude-Test+ #1
      [  861.497654] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  861.498341] kworker/6:2     D    0   320      2 0x80000080
      [  861.499045] Workqueue: pm pm_runtime_work
      [  861.499739] Call Trace:
      [  861.500428]  __schedule+0x322/0xaf0
      [  861.501134]  ? wait_for_completion+0x104/0x190
      [  861.501851]  schedule+0x33/0x90
      [  861.502564]  schedule_timeout+0x3a5/0x590
      [  861.503284]  ? mark_held_locks+0x58/0x80
      [  861.503988]  ? _raw_spin_unlock_irq+0x2c/0x40
      [  861.504710]  ? wait_for_completion+0x104/0x190
      [  861.505417]  ? trace_hardirqs_on_caller+0xf4/0x190
      [  861.506136]  ? wait_for_completion+0x104/0x190
      [  861.506845]  wait_for_completion+0x12c/0x190
      [  861.507555]  ? wake_up_q+0x80/0x80
      [  861.508268]  flush_work+0x1c9/0x280
      [  861.508990]  ? flush_workqueue_prep_pwqs+0x1b0/0x1b0
      [  861.509735]  nvif_notify_put+0xb1/0xc0 [nouveau]
      [  861.510482]  nouveau_display_fini+0xbd/0x170 [nouveau]
      [  861.511241]  nouveau_display_suspend+0x67/0x120 [nouveau]
      [  861.511969]  nouveau_do_suspend+0x5e/0x2d0 [nouveau]
      [  861.512715]  nouveau_pmops_runtime_suspend+0x47/0xb0 [nouveau]
      [  861.513435]  pci_pm_runtime_suspend+0x6b/0x180
      [  861.514165]  ? pci_has_legacy_pm_support+0x70/0x70
      [  861.514897]  __rpm_callback+0x7a/0x1d0
      [  861.515618]  ? pci_has_legacy_pm_support+0x70/0x70
      [  861.516313]  rpm_callback+0x24/0x80
      [  861.517027]  ? pci_has_legacy_pm_support+0x70/0x70
      [  861.517741]  rpm_suspend+0x142/0x6b0
      [  861.518449]  pm_runtime_work+0x97/0xc0
      [  861.519144]  process_one_work+0x231/0x620
      [  861.519831]  worker_thread+0x44/0x3a0
      [  861.520522]  kthread+0x12b/0x150
      [  861.521220]  ? wq_pool_ids_show+0x140/0x140
      [  861.521925]  ? kthread_create_worker_on_cpu+0x70/0x70
      [  861.522622]  ret_from_fork+0x3a/0x50
      [  861.523299] INFO: task kworker/6:0:1329 blocked for more than 120 seconds.
      [  861.523977]       Tainted: G           O      4.18.0-rc6Lyude-Test+ #1
      [  861.524644] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  861.525349] kworker/6:0     D    0  1329      2 0x80000000
      [  861.526073] Workqueue: events nvif_notify_work [nouveau]
      [  861.526751] Call Trace:
      [  861.527411]  __schedule+0x322/0xaf0
      [  861.528089]  schedule+0x33/0x90
      [  861.528758]  rpm_resume+0x19c/0x850
      [  861.529399]  ? finish_wait+0x90/0x90
      [  861.530073]  __pm_runtime_resume+0x4e/0x90
      [  861.530798]  nouveau_connector_detect+0x7e/0x510 [nouveau]
      [  861.531459]  ? ww_mutex_lock+0x47/0x80
      [  861.532097]  ? ww_mutex_lock+0x47/0x80
      [  861.532819]  ? drm_modeset_lock+0x88/0x130 [drm]
      [  861.533481]  drm_helper_probe_detect_ctx+0xa0/0x100 [drm_kms_helper]
      [  861.534127]  drm_helper_hpd_irq_event+0xa4/0x120 [drm_kms_helper]
      [  861.534940]  nouveau_connector_hotplug+0x98/0x120 [nouveau]
      [  861.535556]  nvif_notify_work+0x2d/0xb0 [nouveau]
      [  861.536221]  process_one_work+0x231/0x620
      [  861.536994]  worker_thread+0x44/0x3a0
      [  861.537757]  kthread+0x12b/0x150
      [  861.538463]  ? wq_pool_ids_show+0x140/0x140
      [  861.539102]  ? kthread_create_worker_on_cpu+0x70/0x70
      [  861.539815]  ret_from_fork+0x3a/0x50
      [  861.540521]
                     Showing all locks held in the system:
      [  861.541696] 2 locks held by kworker/0:2/61:
      [  861.542406]  #0: 000000002dbf8af5 ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620
      [  861.543071]  #1: 0000000076868126 ((work_completion)(&drm->hpd_work)){+.+.}, at: process_one_work+0x1b3/0x620
      [  861.543814] 1 lock held by khungtaskd/64:
      [  861.544535]  #0: 0000000059db4b53 (rcu_read_lock){....}, at: debug_show_all_locks+0x23/0x185
      [  861.545160] 3 locks held by kworker/6:2/320:
      [  861.545896]  #0: 00000000d9e1bc59 ((wq_completion)"pm"){+.+.}, at: process_one_work+0x1b3/0x620
      [  861.546702]  #1: 00000000c9f92d84 ((work_completion)(&dev->power.work)){+.+.}, at: process_one_work+0x1b3/0x620
      [  861.547443]  #2: 000000004afc5de1 (drm_connector_list_iter){.+.+}, at: nouveau_display_fini+0x96/0x170 [nouveau]
      [  861.548146] 1 lock held by dmesg/983:
      [  861.548889] 2 locks held by zsh/1250:
      [  861.549605]  #0: 00000000348e3cf6 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40
      [  861.550393]  #1: 000000007009a7a8 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xc1/0x870
      [  861.551122] 6 locks held by kworker/6:0/1329:
      [  861.551957]  #0: 000000002dbf8af5 ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620
      [  861.552765]  #1: 00000000ddb499ad ((work_completion)(&notify->work)#2){+.+.}, at: process_one_work+0x1b3/0x620
      [  861.553582]  #2: 000000006e013cbe (&dev->mode_config.mutex){+.+.}, at: drm_helper_hpd_irq_event+0x6c/0x120 [drm_kms_helper]
      [  861.554357]  #3: 000000004afc5de1 (drm_connector_list_iter){.+.+}, at: drm_helper_hpd_irq_event+0x78/0x120 [drm_kms_helper]
      [  861.555227]  #4: 0000000044f294d9 (crtc_ww_class_acquire){+.+.}, at: drm_helper_probe_detect_ctx+0x3d/0x100 [drm_kms_helper]
      [  861.556133]  #5: 00000000db193642 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_lock+0x4b/0x130 [drm]
      
      [  861.557864] =============================================
      
      [  861.559507] NMI backtrace for cpu 2
      [  861.560363] CPU: 2 PID: 64 Comm: khungtaskd Tainted: G           O      4.18.0-rc6Lyude-Test+ #1
      [  861.561197] Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET78W (1.51 ) 05/18/2018
      [  861.561948] Call Trace:
      [  861.562757]  dump_stack+0x8e/0xd3
      [  861.563516]  nmi_cpu_backtrace.cold.3+0x14/0x5a
      [  861.564269]  ? lapic_can_unplug_cpu.cold.27+0x42/0x42
      [  861.565029]  nmi_trigger_cpumask_backtrace+0xa1/0xae
      [  861.565789]  arch_trigger_cpumask_backtrace+0x19/0x20
      [  861.566558]  watchdog+0x316/0x580
      [  861.567355]  kthread+0x12b/0x150
      [  861.568114]  ? reset_hung_task_detector+0x20/0x20
      [  861.568863]  ? kthread_create_worker_on_cpu+0x70/0x70
      [  861.569598]  ret_from_fork+0x3a/0x50
      [  861.570370] Sending NMI from CPU 2 to CPUs 0-1,3-7:
      [  861.571426] NMI backtrace for cpu 6 skipped: idling at intel_idle+0x7f/0x120
      [  861.571429] NMI backtrace for cpu 7 skipped: idling at intel_idle+0x7f/0x120
      [  861.571432] NMI backtrace for cpu 3 skipped: idling at intel_idle+0x7f/0x120
      [  861.571464] NMI backtrace for cpu 5 skipped: idling at intel_idle+0x7f/0x120
      [  861.571467] NMI backtrace for cpu 0 skipped: idling at intel_idle+0x7f/0x120
      [  861.571469] NMI backtrace for cpu 4 skipped: idling at intel_idle+0x7f/0x120
      [  861.571472] NMI backtrace for cpu 1 skipped: idling at intel_idle+0x7f/0x120
      [  861.572428] Kernel panic - not syncing: hung_task: blocked tasks
      
      So: fix this by making it so that normal hotplug handling /only/ happens
      so long as the GPU is currently awake without any pending runtime PM
      requests. In the event that a hotplug occurs while the device is
      suspending or resuming, we can simply defer our response until the GPU
      is fully runtime resumed again.
      
      Changes since v4:
      - Use a new trick I came up with using pm_runtime_get() instead of the
        hackish junk we had before
      Signed-off-by: NLyude Paul <lyude@redhat.com>
      Reviewed-by: NKarol Herbst <kherbst@redhat.com>
      Acked-by: NDaniel Vetter <daniel@ffwll.ch>
      Cc: stable@vger.kernel.org
      Cc: Lukas Wunner <lukas@wunner.de>
      Signed-off-by: NBen Skeggs <bskeggs@redhat.com>
      3e1a1275
    • L
      drm/nouveau/drm/nouveau: Use pm_runtime_get_noresume() in connector_detect() · 6833fb1e
      Lyude Paul 提交于
      It's true we can't resume the device from poll workers in
      nouveau_connector_detect(). We can however, prevent the autosuspend
      timer from elapsing immediately if it hasn't already without risking any
      sort of deadlock with the runtime suspend/resume operations. So do that
      instead of entirely avoiding grabbing a power reference.
      Signed-off-by: NLyude Paul <lyude@redhat.com>
      Reviewed-by: NKarol Herbst <kherbst@redhat.com>
      Acked-by: NDaniel Vetter <daniel@ffwll.ch>
      Cc: stable@vger.kernel.org
      Cc: Lukas Wunner <lukas@wunner.de>
      Signed-off-by: NBen Skeggs <bskeggs@redhat.com>
      6833fb1e
    • L
      drm/nouveau/drm/nouveau: Fix deadlock with fb_helper with async RPM requests · 7fec8f53
      Lyude Paul 提交于
      Currently, nouveau uses the generic drm_fb_helper_output_poll_changed()
      function provided by DRM as it's output_poll_changed callback.
      Unfortunately however, this function doesn't grab runtime PM references
      early enough and even if it did-we can't block waiting for the device to
      resume in output_poll_changed() since it's very likely that we'll need
      to grab the fb_helper lock at some point during the runtime resume
      process. This currently results in deadlocking like so:
      
      [  246.669625] INFO: task kworker/4:0:37 blocked for more than 120 seconds.
      [  246.673398]       Not tainted 4.18.0-rc5Lyude-Test+ #2
      [  246.675271] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  246.676527] kworker/4:0     D    0    37      2 0x80000000
      [  246.677580] Workqueue: events output_poll_execute [drm_kms_helper]
      [  246.678704] Call Trace:
      [  246.679753]  __schedule+0x322/0xaf0
      [  246.680916]  schedule+0x33/0x90
      [  246.681924]  schedule_preempt_disabled+0x15/0x20
      [  246.683023]  __mutex_lock+0x569/0x9a0
      [  246.684035]  ? kobject_uevent_env+0x117/0x7b0
      [  246.685132]  ? drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper]
      [  246.686179]  mutex_lock_nested+0x1b/0x20
      [  246.687278]  ? mutex_lock_nested+0x1b/0x20
      [  246.688307]  drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper]
      [  246.689420]  drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper]
      [  246.690462]  drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper]
      [  246.691570]  output_poll_execute+0x198/0x1c0 [drm_kms_helper]
      [  246.692611]  process_one_work+0x231/0x620
      [  246.693725]  worker_thread+0x214/0x3a0
      [  246.694756]  kthread+0x12b/0x150
      [  246.695856]  ? wq_pool_ids_show+0x140/0x140
      [  246.696888]  ? kthread_create_worker_on_cpu+0x70/0x70
      [  246.697998]  ret_from_fork+0x3a/0x50
      [  246.699034] INFO: task kworker/0:1:60 blocked for more than 120 seconds.
      [  246.700153]       Not tainted 4.18.0-rc5Lyude-Test+ #2
      [  246.701182] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  246.702278] kworker/0:1     D    0    60      2 0x80000000
      [  246.703293] Workqueue: pm pm_runtime_work
      [  246.704393] Call Trace:
      [  246.705403]  __schedule+0x322/0xaf0
      [  246.706439]  ? wait_for_completion+0x104/0x190
      [  246.707393]  schedule+0x33/0x90
      [  246.708375]  schedule_timeout+0x3a5/0x590
      [  246.709289]  ? mark_held_locks+0x58/0x80
      [  246.710208]  ? _raw_spin_unlock_irq+0x2c/0x40
      [  246.711222]  ? wait_for_completion+0x104/0x190
      [  246.712134]  ? trace_hardirqs_on_caller+0xf4/0x190
      [  246.713094]  ? wait_for_completion+0x104/0x190
      [  246.713964]  wait_for_completion+0x12c/0x190
      [  246.714895]  ? wake_up_q+0x80/0x80
      [  246.715727]  ? get_work_pool+0x90/0x90
      [  246.716649]  flush_work+0x1c9/0x280
      [  246.717483]  ? flush_workqueue_prep_pwqs+0x1b0/0x1b0
      [  246.718442]  __cancel_work_timer+0x146/0x1d0
      [  246.719247]  cancel_delayed_work_sync+0x13/0x20
      [  246.720043]  drm_kms_helper_poll_disable+0x1f/0x30 [drm_kms_helper]
      [  246.721123]  nouveau_pmops_runtime_suspend+0x3d/0xb0 [nouveau]
      [  246.721897]  pci_pm_runtime_suspend+0x6b/0x190
      [  246.722825]  ? pci_has_legacy_pm_support+0x70/0x70
      [  246.723737]  __rpm_callback+0x7a/0x1d0
      [  246.724721]  ? pci_has_legacy_pm_support+0x70/0x70
      [  246.725607]  rpm_callback+0x24/0x80
      [  246.726553]  ? pci_has_legacy_pm_support+0x70/0x70
      [  246.727376]  rpm_suspend+0x142/0x6b0
      [  246.728185]  pm_runtime_work+0x97/0xc0
      [  246.728938]  process_one_work+0x231/0x620
      [  246.729796]  worker_thread+0x44/0x3a0
      [  246.730614]  kthread+0x12b/0x150
      [  246.731395]  ? wq_pool_ids_show+0x140/0x140
      [  246.732202]  ? kthread_create_worker_on_cpu+0x70/0x70
      [  246.732878]  ret_from_fork+0x3a/0x50
      [  246.733768] INFO: task kworker/4:2:422 blocked for more than 120 seconds.
      [  246.734587]       Not tainted 4.18.0-rc5Lyude-Test+ #2
      [  246.735393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  246.736113] kworker/4:2     D    0   422      2 0x80000080
      [  246.736789] Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
      [  246.737665] Call Trace:
      [  246.738490]  __schedule+0x322/0xaf0
      [  246.739250]  schedule+0x33/0x90
      [  246.739908]  rpm_resume+0x19c/0x850
      [  246.740750]  ? finish_wait+0x90/0x90
      [  246.741541]  __pm_runtime_resume+0x4e/0x90
      [  246.742370]  nv50_disp_atomic_commit+0x31/0x210 [nouveau]
      [  246.743124]  drm_atomic_commit+0x4a/0x50 [drm]
      [  246.743775]  restore_fbdev_mode_atomic+0x1c8/0x240 [drm_kms_helper]
      [  246.744603]  restore_fbdev_mode+0x31/0x140 [drm_kms_helper]
      [  246.745373]  drm_fb_helper_restore_fbdev_mode_unlocked+0x54/0xb0 [drm_kms_helper]
      [  246.746220]  drm_fb_helper_set_par+0x2d/0x50 [drm_kms_helper]
      [  246.746884]  drm_fb_helper_hotplug_event.part.28+0x96/0xb0 [drm_kms_helper]
      [  246.747675]  drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper]
      [  246.748544]  drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper]
      [  246.749439]  nv50_mstm_hotplug+0x15/0x20 [nouveau]
      [  246.750111]  drm_dp_send_link_address+0x177/0x1c0 [drm_kms_helper]
      [  246.750764]  drm_dp_check_and_send_link_address+0xa8/0xd0 [drm_kms_helper]
      [  246.751602]  drm_dp_mst_link_probe_work+0x51/0x90 [drm_kms_helper]
      [  246.752314]  process_one_work+0x231/0x620
      [  246.752979]  worker_thread+0x44/0x3a0
      [  246.753838]  kthread+0x12b/0x150
      [  246.754619]  ? wq_pool_ids_show+0x140/0x140
      [  246.755386]  ? kthread_create_worker_on_cpu+0x70/0x70
      [  246.756162]  ret_from_fork+0x3a/0x50
      [  246.756847]
                 Showing all locks held in the system:
      [  246.758261] 3 locks held by kworker/4:0/37:
      [  246.759016]  #0: 00000000f8df4d2d ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620
      [  246.759856]  #1: 00000000e6065461 ((work_completion)(&(&dev->mode_config.output_poll_work)->work)){+.+.}, at: process_one_work+0x1b3/0x620
      [  246.760670]  #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper]
      [  246.761516] 2 locks held by kworker/0:1/60:
      [  246.762274]  #0: 00000000fff6be0f ((wq_completion)"pm"){+.+.}, at: process_one_work+0x1b3/0x620
      [  246.762982]  #1: 000000005ab44fb4 ((work_completion)(&dev->power.work)){+.+.}, at: process_one_work+0x1b3/0x620
      [  246.763890] 1 lock held by khungtaskd/64:
      [  246.764664]  #0: 000000008cb8b5c3 (rcu_read_lock){....}, at: debug_show_all_locks+0x23/0x185
      [  246.765588] 5 locks held by kworker/4:2/422:
      [  246.766440]  #0: 00000000232f0959 ((wq_completion)"events_long"){+.+.}, at: process_one_work+0x1b3/0x620
      [  246.767390]  #1: 00000000bb59b134 ((work_completion)(&mgr->work)){+.+.}, at: process_one_work+0x1b3/0x620
      [  246.768154]  #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_restore_fbdev_mode_unlocked+0x4c/0xb0 [drm_kms_helper]
      [  246.768966]  #3: 000000004c8f0b6b (crtc_ww_class_acquire){+.+.}, at: restore_fbdev_mode_atomic+0x4b/0x240 [drm_kms_helper]
      [  246.769921]  #4: 000000004c34a296 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_backoff+0x8a/0x1b0 [drm]
      [  246.770839] 1 lock held by dmesg/1038:
      [  246.771739] 2 locks held by zsh/1172:
      [  246.772650]  #0: 00000000836d0438 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40
      [  246.773680]  #1: 000000001f4f4d48 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xc1/0x870
      
      [  246.775522] =============================================
      
      After trying dozens of different solutions, I found one very simple one
      that should also have the benefit of preventing us from having to fight
      locking for the rest of our lives. So, we work around these deadlocks by
      deferring all fbcon hotplug events that happen after the runtime suspend
      process starts until after the device is resumed again.
      
      Changes since v7:
       - Fixup commit message - Daniel Vetter
      
      Changes since v6:
       - Remove unused nouveau_fbcon_hotplugged_in_suspend() - Ilia
      
      Changes since v5:
       - Come up with the (hopefully final) solution for solving this dumb
         problem, one that is a lot less likely to cause issues with locking in
         the future. This should work around all deadlock conditions with fbcon
         brought up thus far.
      
      Changes since v4:
       - Add nouveau_fbcon_hotplugged_in_suspend() to workaround deadlock
         condition that Lukas described
       - Just move all of this out of drm_fb_helper. It seems that other DRM
         drivers have already figured out other workarounds for this. If other
         drivers do end up needing this in the future, we can just move this
         back into drm_fb_helper again.
      
      Changes since v3:
      - Actually check if fb_helper is NULL in both new helpers
      - Actually check drm_fbdev_emulation in both new helpers
      - Don't fire off a fb_helper hotplug unconditionally; only do it if
        the following conditions are true (as otherwise, calling this in the
        wrong spot will cause Bad Things to happen):
        - fb_helper hotplug handling was actually inhibited previously
        - fb_helper actually has a delayed hotplug pending
        - fb_helper is actually bound
        - fb_helper is actually initialized
      - Add __must_check to drm_fb_helper_suspend_hotplug(). There's no
        situation where a driver would actually want to use this without
        checking the return value, so enforce that
      - Rewrite and clarify the documentation for both helpers.
      - Make sure to return true in the drm_fb_helper_suspend_hotplug() stub
        that's provided in drm_fb_helper.h when CONFIG_DRM_FBDEV_EMULATION
        isn't enabled
      - Actually grab the toplevel fb_helper lock in
        drm_fb_helper_resume_hotplug(), since it's possible other activity
        (such as a hotplug) could be going on at the same time the driver
        calls drm_fb_helper_resume_hotplug(). We need this to check whether or
        not drm_fb_helper_hotplug_event() needs to be called anyway
      Signed-off-by: NLyude Paul <lyude@redhat.com>
      Reviewed-by: NKarol Herbst <kherbst@redhat.com>
      Acked-by: NDaniel Vetter <daniel@ffwll.ch>
      Cc: stable@vger.kernel.org
      Cc: Lukas Wunner <lukas@wunner.de>
      Signed-off-by: NBen Skeggs <bskeggs@redhat.com>
      7fec8f53
    • L
      drm/nouveau: Remove duplicate poll_enable() in pmops_runtime_suspend() · 611ce855
      Lyude Paul 提交于
      Since actual hotplug notifications don't get disabled until
      nouveau_display_fini() is called, all this will do is cause any hotplugs
      that happen between this drm_kms_helper_poll_disable() call and the
      actual hotplug disablement to potentially be dropped if ACPI isn't
      around to help us.
      Signed-off-by: NLyude Paul <lyude@redhat.com>
      Acked-by: NKarol Herbst <kherbst@redhat.com>
      Acked-by: NDaniel Vetter <daniel@ffwll.ch>
      Cc: stable@vger.kernel.org
      Cc: Lukas Wunner <lukas@wunner.de>
      Signed-off-by: NBen Skeggs <bskeggs@redhat.com>
      611ce855
    • L
      drm/nouveau/drm/nouveau: Fix bogus drm_kms_helper_poll_enable() placement · d77ef138
      Lyude Paul 提交于
      Turns out this part is my fault for not noticing when reviewing
      9a2eba33 ("drm/nouveau: Fix drm poll_helper handling"). Currently
      we call drm_kms_helper_poll_enable() from nouveau_display_hpd_work().
      This makes basically no sense however, because that means we're calling
      drm_kms_helper_poll_enable() every time we schedule the hotplug
      detection work. This is also against the advice mentioned in
      drm_kms_helper_poll_enable()'s documentation:
      
       Note that calls to enable and disable polling must be strictly ordered,
       which is automatically the case when they're only call from
       suspend/resume callbacks.
      
      Of course, hotplugs can't really be ordered. They could even happen
      immediately after we called drm_kms_helper_poll_disable() in
      nouveau_display_fini(), which can lead to all sorts of issues.
      
      Additionally; enabling polling /after/ we call
      drm_helper_hpd_irq_event() could also mean that we'd miss a hotplug
      event anyway, since drm_helper_hpd_irq_event() wouldn't bother trying to
      probe connectors so long as polling is disabled.
      
      So; simply move this back into nouveau_display_init() again. The race
      condition that both of these patches attempted to work around has
      already been fixed properly in
      
        d61a5c10 ("drm/nouveau: Fix deadlock on runtime suspend")
      
      Fixes: 9a2eba33 ("drm/nouveau: Fix drm poll_helper handling")
      Signed-off-by: NLyude Paul <lyude@redhat.com>
      Acked-by: NKarol Herbst <kherbst@redhat.com>
      Acked-by: NDaniel Vetter <daniel@ffwll.ch>
      Cc: Lukas Wunner <lukas@wunner.de>
      Cc: Peter Ujfalusi <peter.ujfalusi@ti.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NBen Skeggs <bskeggs@redhat.com>
      d77ef138
  2. 24 8月, 2018 1 次提交
  3. 23 8月, 2018 3 次提交
    • K
      drm/edid: Add 6 bpc quirk for SDC panel in Lenovo B50-80 · 25da7504
      Kai-Heng Feng 提交于
      Another panel that reports "DFP 1.x compliant TMDS" but it supports 6bpc
      instead of 8 bpc.
      
      Apply 6 bpc quirk for the panel to fix it.
      
      BugLink: https://bugs.launchpad.net/bugs/1788308
      Cc: <stable@vger.kernel.org> # v4.8+
      Signed-off-by: NKai-Heng Feng <kai.heng.feng@canonical.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180823055332.7723-1-kai.heng.feng@canonical.com
      25da7504
    • N
      include/linux/compiler*.h: make compiler-*.h mutually exclusive · 815f0ddb
      Nick Desaulniers 提交于
      Commit cafa0010 ("Raise the minimum required gcc version to 4.6")
      recently exposed a brittle part of the build for supporting non-gcc
      compilers.
      
      Both Clang and ICC define __GNUC__, __GNUC_MINOR__, and
      __GNUC_PATCHLEVEL__ for quick compatibility with code bases that haven't
      added compiler specific checks for __clang__ or __INTEL_COMPILER.
      
      This is brittle, as they happened to get compatibility by posing as a
      certain version of GCC.  This broke when upgrading the minimal version
      of GCC required to build the kernel, to a version above what ICC and
      Clang claim to be.
      
      Rather than always including compiler-gcc.h then undefining or
      redefining macros in compiler-intel.h or compiler-clang.h, let's
      separate out the compiler specific macro definitions into mutually
      exclusive headers, do more proper compiler detection, and keep shared
      definitions in compiler_types.h.
      
      Fixes: cafa0010 ("Raise the minimum required gcc version to 4.6")
      Reported-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Suggested-by: NEli Friedman <efriedma@codeaurora.org>
      Suggested-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      815f0ddb
    • M
      mm, oom: distinguish blockable mode for mmu notifiers · 93065ac7
      Michal Hocko 提交于
      There are several blockable mmu notifiers which might sleep in
      mmu_notifier_invalidate_range_start and that is a problem for the
      oom_reaper because it needs to guarantee a forward progress so it cannot
      depend on any sleepable locks.
      
      Currently we simply back off and mark an oom victim with blockable mmu
      notifiers as done after a short sleep.  That can result in selecting a new
      oom victim prematurely because the previous one still hasn't torn its
      memory down yet.
      
      We can do much better though.  Even if mmu notifiers use sleepable locks
      there is no reason to automatically assume those locks are held.  Moreover
      majority of notifiers only care about a portion of the address space and
      there is absolutely zero reason to fail when we are unmapping an unrelated
      range.  Many notifiers do really block and wait for HW which is harder to
      handle and we have to bail out though.
      
      This patch handles the low hanging fruit.
      __mmu_notifier_invalidate_range_start gets a blockable flag and callbacks
      are not allowed to sleep if the flag is set to false.  This is achieved by
      using trylock instead of the sleepable lock for most callbacks and
      continue as long as we do not block down the call chain.
      
      I think we can improve that even further because there is a common pattern
      to do a range lookup first and then do something about that.  The first
      part can be done without a sleeping lock in most cases AFAICS.
      
      The oom_reaper end then simply retries if there is at least one notifier
      which couldn't make any progress in !blockable mode.  A retry loop is
      already implemented to wait for the mmap_sem and this is basically the
      same thing.
      
      The simplest way for driver developers to test this code path is to wrap
      userspace code which uses these notifiers into a memcg and set the hard
      limit to hit the oom.  This can be done e.g.  after the test faults in all
      the mmu notifier managed memory and set the hard limit to something really
      small.  Then we are looking for a proper process tear down.
      
      [akpm@linux-foundation.org: coding style fixes]
      [akpm@linux-foundation.org: minor code simplification]
      Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: Christian König <christian.koenig@amd.com> # AMD notifiers
      Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx and umem_odp
      Reported-by: NDavid Rientjes <rientjes@google.com>
      Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
      Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
      Cc: Sudeep Dutt <sudeep.dutt@intel.com>
      Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Felix Kuehling <felix.kuehling@amd.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      93065ac7
  4. 22 8月, 2018 14 次提交
  5. 17 8月, 2018 1 次提交
  6. 16 8月, 2018 4 次提交
  7. 15 8月, 2018 1 次提交
  8. 14 8月, 2018 4 次提交