• L
    PCI: Reset Lenovo ThinkPad P50 nvgpu at boot if necessary · e0547c81
    Lyude Paul 提交于
    On ThinkPad P50 SKUs with an Nvidia Quadro M1000M instead of the M2000M
    variant, the BIOS does not always reset the secondary Nvidia GPU during
    reboot if the laptop is configured in Hybrid Graphics mode.  The reason is
    unknown, but the following steps and possibly a good bit of patience will
    reproduce the issue:
    
      1. Boot up the laptop normally in Hybrid Graphics mode
      2. Make sure nouveau is loaded and that the GPU is awake
      3. Allow the Nvidia GPU to runtime suspend itself after being idle
      4. Reboot the machine, the more sudden the better (e.g. sysrq-b may help)
      5. If nouveau loads up properly, reboot the machine again and go back to
         step 2 until you reproduce the issue
    
    This results in some very strange behavior: the GPU will be left in exactly
    the same state it was in when the previously booted kernel started the
    reboot.  This has all sorts of bad side effects: for starters, this
    completely breaks nouveau starting with a mysterious EVO channel failure
    that happens well before we've actually used the EVO channel for anything:
    
      nouveau 0000:01:00.0: disp: chid 0 mthd 0000 data 00000400 00001000 00000002
    
    This causes a timeout trying to bring up the GR ctx:
    
      nouveau 0000:01:00.0: timeout
      WARNING: CPU: 0 PID: 12 at drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgf100.c:1547 gf100_grctx_generate+0x7b2/0x850 [nouveau]
      Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET82W (1.55 ) 12/18/2018
      Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
      ...
      nouveau 0000:01:00.0: gr: wait for idle timeout (en: 1, ctxsw: 0, busy: 1)
      nouveau 0000:01:00.0: gr: wait for idle timeout (en: 1, ctxsw: 0, busy: 1)
      nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000008000 engine 00 [GR] client 15 [HUB/SCC_NB] reason c4 [] on channel -1 [0000000000 unknown]
    
    The GPU never manages to recover.  Booting without loading nouveau causes
    issues as well, since the GPU starts sending spurious interrupts that cause
    other device's IRQs to get disabled by the kernel:
    
      irq 16: nobody cared (try booting with the "irqpoll" option)
      ...
      handlers:
      [<000000007faa9e99>] i801_isr [i2c_i801]
      Disabling IRQ #16
      ...
      serio: RMI4 PS/2 pass-through port at rmi4-00.fn03
      i801_smbus 0000:00:1f.4: Timeout waiting for interrupt!
      i801_smbus 0000:00:1f.4: Transaction timeout
      rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-110).
      i801_smbus 0000:00:1f.4: Timeout waiting for interrupt!
      i801_smbus 0000:00:1f.4: Transaction timeout
      rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts!
    
    This causes the touchpad and sometimes other things to get disabled.
    
    Since this happens without nouveau, we can't fix this problem from nouveau
    itself.
    
    Add a PCI quirk for the specific P50 variant of this GPU.  Make sure the
    GPU is advertising NoReset- so we don't reset the GPU when the machine is
    in Dedicated graphics mode (where the GPU being initialized by the BIOS is
    normal and expected).  Map the GPU MMIO space and read the magic 0x2240c
    register, which will have bit 1 set if the device was POSTed during a
    previous boot.  Once we've confirmed all of this, reset the GPU and
    re-disable it - bringing it back to a healthy state.
    
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=203003
    Link: https://lore.kernel.org/lkml/20190212220230.1568-1-lyude@redhat.comSigned-off-by: NLyude Paul <lyude@redhat.com>
    Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
    Cc: nouveau@lists.freedesktop.org
    Cc: dri-devel@lists.freedesktop.org
    Cc: Karol Herbst <kherbst@redhat.com>
    Cc: Ben Skeggs <skeggsb@gmail.com>
    Cc: stable@vger.kernel.org
    e0547c81
quirks.c 181.9 KB