1. 14 10月, 2016 22 次提交
  2. 13 10月, 2016 10 次提交
  3. 12 10月, 2016 8 次提交
    • C
      drm/i915: Update debugfs describe_obj() to show fault-mappable · 8baa1f04
      Chris Wilson 提交于
      The current meaning of whether an object has a GGTT vma is very
      ill-defined (and note we don't check for any partials either), it just
      means that at some point it was in the GGTT but it may not be now. The
      information we really care about here is whether it is taking up
      precious mappable aperture space. This is the obj->fault_mappable flag.
      We have a redundant long form reprinting of this information, so remove
      that in favour of the compact flag.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161012114827.17031-2-chris@chris-wilson.co.uk
      8baa1f04
    • C
      drm/i915: Use fence_write() from rpm resume · 4676dc83
      Chris Wilson 提交于
      During rpm resume we restore the fences, but we do not have the
      protection of struct_mutex. This rules out updating the activity
      tracking on the fences, and requires us to rely on the rpm as the
      serialisation barrier instead.
      
      [  350.298052] [drm:intel_runtime_resume [i915]] Resuming device
      [  350.308606]
      [  350.310520] ===============================
      [  350.315560] [ INFO: suspicious RCU usage. ]
      [  350.320554] 4.8.0-rc8-bsw-rapl+ #3133 Tainted: G     U  W
      [  350.327208] -------------------------------
      [  350.331977] ../drivers/gpu/drm/i915/i915_gem_request.h:371 suspicious rcu_dereference_protected() usage!
      [  350.342619]
      [  350.342619] other info that might help us debug this:
      [  350.342619]
      [  350.351593]
      [  350.351593] rcu_scheduler_active = 1, debug_locks = 0
      [  350.358952] 3 locks held by Xorg/320:
      [  350.363077]  #0:  (&dev->mode_config.mutex){+.+.+.}, at: [<ffffffffa030589c>] drm_modeset_lock_all+0x3c/0xd0 [drm]
      [  350.375162]  #1:  (crtc_ww_class_acquire){+.+.+.}, at: [<ffffffffa03058a6>] drm_modeset_lock_all+0x46/0xd0 [drm]
      [  350.387022]  #2:  (crtc_ww_class_mutex){+.+.+.}, at: [<ffffffffa0305056>] drm_modeset_lock+0x36/0x110 [drm]
      [  350.398236]
      [  350.398236] stack backtrace:
      [  350.403196] CPU: 1 PID: 320 Comm: Xorg Tainted: G     U  W       4.8.0-rc8-bsw-rapl+ #3133
      [  350.412457] Hardware name: Intel Corporation CHERRYVIEW C0 PLATFORM/Braswell CRB, BIOS BRAS.X64.X088.R00.1510270350 10/27/2015
      [  350.425212]  0000000000000000 ffff8801680a78c8 ffffffff81332187 ffff88016c5c5000
      [  350.433611]  0000000000000001 ffff8801680a78f8 ffffffff810ca6da ffff88016cc8b0f0
      [  350.442012]  ffff88016cc80000 ffff88016cc80000 ffff880177ad0000 ffff8801680a7948
      [  350.450409] Call Trace:
      [  350.453165]  [<ffffffff81332187>] dump_stack+0x67/0x90
      [  350.458931]  [<ffffffff810ca6da>] lockdep_rcu_suspicious+0xea/0x120
      [  350.466002]  [<ffffffffa039e8dd>] fence_update+0xbd/0x670 [i915]
      [  350.472766]  [<ffffffffa039efe2>] i915_gem_restore_fences+0x52/0x70 [i915]
      [  350.480496]  [<ffffffffa0368f42>] vlv_resume_prepare+0x72/0x570 [i915]
      [  350.487839]  [<ffffffffa0369802>] intel_runtime_resume+0x102/0x210 [i915]
      [  350.495442]  [<ffffffff8137f26f>] pci_pm_runtime_resume+0x7f/0xb0
      [  350.502274]  [<ffffffff8137f1f0>] ? pci_restore_standard_config+0x40/0x40
      [  350.509883]  [<ffffffff814401c5>] __rpm_callback+0x35/0x70
      [  350.516037]  [<ffffffff8137f1f0>] ? pci_restore_standard_config+0x40/0x40
      [  350.523646]  [<ffffffff81440224>] rpm_callback+0x24/0x80
      [  350.529604]  [<ffffffff8137f1f0>] ? pci_restore_standard_config+0x40/0x40
      [  350.537212]  [<ffffffff814417bd>] rpm_resume+0x4ad/0x740
      [  350.543161]  [<ffffffff81441aa1>] __pm_runtime_resume+0x51/0x80
      [  350.549824]  [<ffffffffa03889c8>] intel_runtime_pm_get+0x28/0x90 [i915]
      [  350.557265]  [<ffffffffa0388a53>] intel_display_power_get+0x23/0x50 [i915]
      [  350.565001]  [<ffffffffa03ef23d>] intel_atomic_commit_tail+0xdfd/0x10b0 [i915]
      [  350.573106]  [<ffffffffa034b2e9>] ? drm_atomic_helper_swap_state+0x159/0x300 [drm_kms_helper]
      [  350.582659]  [<ffffffff81615091>] ? _raw_spin_unlock+0x31/0x50
      [  350.589205]  [<ffffffffa034b2e9>] ? drm_atomic_helper_swap_state+0x159/0x300 [drm_kms_helper]
      [  350.598787]  [<ffffffffa03ef8a5>] intel_atomic_commit+0x3b5/0x500 [i915]
      [  350.606319]  [<ffffffffa03061dc>] ? drm_atomic_set_crtc_for_connector+0xcc/0x100 [drm]
      [  350.615209]  [<ffffffffa0306b49>] drm_atomic_commit+0x49/0x50 [drm]
      [  350.622242]  [<ffffffffa034dee8>] drm_atomic_helper_set_config+0x88/0xc0 [drm_kms_helper]
      [  350.631419]  [<ffffffffa02f94ac>] drm_mode_set_config_internal+0x6c/0x120 [drm]
      [  350.639623]  [<ffffffffa02fa94c>] drm_mode_setcrtc+0x22c/0x4d0 [drm]
      [  350.646760]  [<ffffffffa02f0f19>] drm_ioctl+0x209/0x460 [drm]
      [  350.653217]  [<ffffffffa02fa720>] ? drm_mode_getcrtc+0x150/0x150 [drm]
      [  350.660536]  [<ffffffff810c984a>] ? __lock_is_held+0x4a/0x70
      [  350.666885]  [<ffffffff81202303>] do_vfs_ioctl+0x93/0x6b0
      [  350.672939]  [<ffffffff8120f843>] ? __fget+0x113/0x200
      [  350.678797]  [<ffffffff8120f735>] ? __fget+0x5/0x200
      [  350.684361]  [<ffffffff81202964>] SyS_ioctl+0x44/0x80
      [  350.690030]  [<ffffffff81001deb>] do_syscall_64+0x5b/0x120
      [  350.696184]  [<ffffffff81615ada>] entry_SYSCALL64_slow_path+0x25/0x25
      
      Note we also have to remember the lesson from commit 4fc788f5
      ("drm/i915: Flush delayed fence releases after reset") where we have to
      flush any changes to the fence on restore.
      
      v2: Replace call to release user mmaps with an assertion that they have
      already been zapped.
      
      Fixes: 49ef5294 ("drm/i915: Move fence tracking from object to vma")
      Reported-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161012114827.17031-1-chris@chris-wilson.co.uk
      4676dc83
    • C
      drm/i915: Compress GPU objects in error state · 0a97015d
      Chris Wilson 提交于
      Our error states are quickly growing, pinning kernel memory with them.
      The majority of the space is taken up by the error objects. These
      compress well using zlib and without decode are mostly meaningless, so
      encoding them does not hinder quickly parsing the error state for
      familiarity.
      
      v2: Make the zlib dependency optional
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161012090522.367-6-chris@chris-wilson.co.uk
      0a97015d
    • C
      drm/i915: Consolidate error object printing · fc4c79c3
      Chris Wilson 提交于
      Leave all the pretty printing to userspace and simplify the error
      capture to only have a single common object printer. It makes the kernel
      code more compact, and the refactoring allows us to apply more complex
      transformations like compressing the output.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161012090522.367-5-chris@chris-wilson.co.uk
      fc4c79c3
    • C
      drm/i915: Always use the GTT for error capture · 95374d75
      Chris Wilson 提交于
      Since the GTT provides universal access to any GPU page, we can use it
      to reduce our plethora of read methods to just one. It also has the
      important characteristic of being exactly what the GPU sees - if there
      are incoherency problems, seeing the batch as executed (rather than as
      trapped inside the cpu cache) is important.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161012090522.367-4-chris@chris-wilson.co.uk
      95374d75
    • C
      drm/i915: Stop the machine whilst capturing the GPU crash dump · 9f267eb8
      Chris Wilson 提交于
      The error state is purposefully racy as we expect it to be called at any
      time and so have avoided any locking whilst capturing the crash dump.
      However, with multi-engine GPUs and multiple CPUs, those races can
      manifest into OOPSes as we attempt to chase dangling pointers freed on
      other CPUs. Under discussion are lots of ways to slow down normal
      operation in order to protect the post-mortem error capture, but what it
      we take the opposite approach and freeze the machine whilst the error
      capture runs (note the GPU may still running, but as long as we don't
      process any of the results the driver's bookkeeping will be static).
      
      Note that by of itself, this is not a complete fix. It also depends on
      the compiler barriers in list_add/list_del to prevent traversing the
      lists into the void. We also depend that we only require state from
      carefully controlled sources - i.e. all the state we require for
      post-mortem debugging should be reachable from the request itself so
      that we only have to worry about retrieving the request carefully. Once
      we have the request, we know that all pointers from it are intact.
      
      v2: Avoid drm_clflush_pages() inside stop_machine() as it may use
      stop_machine() itself for its wbinvd fallback.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161012090522.367-3-chris@chris-wilson.co.uk
      9f267eb8
    • C
      drm/i915: Allow disabling error capture · 98a2f411
      Chris Wilson 提交于
      We currently capture the GPU state after we detect a hang. This is vital
      for us to both triage and debug hangs in the wild (post-mortem
      debugging). However, it comes at the cost of running some potentially
      dangerous code (since it has to make very few assumption about the state
      of the driver) that is quite resource intensive.
      
      This patch introduces both a method to disable error capture at runtime
      (for users who hit bugs at runtime and need a workaround) and to disable
      error capture at compiletime (for realtime users who want to minimise
      any possible latency, and never require error capture, saving ~30k of
      code). The cost is that we now have to be wary of (and test!) a kconfig
      flag and a module parameter. The effect of the module parameter is easy
      to verify through code inspection and runtime testing, but a kconfig flag
      needs regular compile checking.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Acked-by: NJani Nikula <jani.nikula@linux.intel.com>
      Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch
      Link: http://patchwork.freedesktop.org/patch/msgid/20161012090522.367-2-chris@chris-wilson.co.uk
      98a2f411
    • C
      drm/i915: Move common code out of i915_gpu_error.c · 0e704476
      Chris Wilson 提交于
      In the next patch, I want to conditionally compile i915_gpu_error.c and
      that requires moving the functions used by debug out of
      i915_gpu_error.c!
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161012090522.367-1-chris@chris-wilson.co.uk
      0e704476