1. 16 11月, 2017 1 次提交
  2. 15 11月, 2017 1 次提交
  3. 14 11月, 2017 2 次提交
  4. 12 11月, 2017 1 次提交
  5. 11 11月, 2017 6 次提交
  6. 09 11月, 2017 1 次提交
    • C
      drm/i915: Lock llist_del_first() vs llist_del_all() · 0f763ff3
      Chris Wilson 提交于
      An oversight in commit 87701b4b ("drm/i915: Only free the oldest
      stale object before a fresh allocation") was that not only do we have to
      serialise concurrent users of llist_del_first(), but we also have to
      lock llist_del_first() vs llist_del_all().
      
      From llist.h,
      
       * This can be summarized as follows:
       *
       *           |   add    | del_first |  del_all
       * add       |    -     |     -     |     -
       * del_first |          |     L     |     L
       * del_all   |          |           |     -
       *
       * Where, a particular row's operation can happen concurrently with a column's
       * operation, with "-" being no lock needed, while "L" being lock is needed.
      
      This should hopefully explain:
      
      <4>[   89.287106] general protection fault: 0000 [#1] PREEMPT SMP
      <4>[   89.287126] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp i915 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core r8169 mii mei_me mei snd_pcm prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel
      <4>[   89.287226] CPU: 2 PID: 23 Comm: ksoftirqd/2 Tainted: G     U          4.14.0-rc8-CI-CI_DRM_3315+ #1
      <4>[   89.287247] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
      <4>[   89.287270] task: ffff88017ab34ec0 task.stack: ffffc90000128000
      <4>[   89.287290] RIP: 0010:llist_add_batch+0x4/0x20
      <4>[   89.287301] RSP: 0018:ffffc9000012bdb8 EFLAGS: 00010296
      <4>[   89.287314] RAX: ffffffff811017ad RBX: 6e468801a1560000 RCX: ef3e53fceecdeb81
      <4>[   89.287330] RDX: 6e468801a1566130 RSI: ffff880103d73d98 RDI: ffff880103d73d98
      <4>[   89.287346] RBP: ffffc9000012bdb8 R08: ffff88017ab35780 R09: 0000000000000000
      <4>[   89.287361] R10: ffffc9000012bd68 R11: 00000000abb18c3d R12: ffffffffa01369e0
      <4>[   89.287377] R13: ffff88017fd1b8f8 R14: ffff88017ab34ec0 R15: 000000000000000a
      <4>[   89.287393] FS:  0000000000000000(0000) GS:ffff88017fd00000(0000) knlGS:0000000000000000
      <4>[   89.287411] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      <4>[   89.287424] CR2: 00007ff0c0755018 CR3: 000000016df9b000 CR4: 00000000003406e0
      <4>[   89.287440] Call Trace:
      <4>[   89.287511]  __i915_gem_free_object_rcu+0x20/0x40 [i915]
      <4>[   89.287527]  rcu_process_callbacks+0x27a/0x730
      <4>[   89.287544]  __do_softirq+0xc0/0x4ae
      <4>[   89.287559]  ? smpboot_thread_fn+0x2d/0x280
      <4>[   89.287571]  run_ksoftirqd+0x1f/0x70
      <4>[   89.287582]  smpboot_thread_fn+0x18a/0x280
      <4>[   89.287595]  kthread+0x114/0x150
      <4>[   89.287605]  ? sort_range+0x30/0x30
      <4>[   89.287615]  ? kthread_create_on_node+0x40/0x40
      <4>[   89.287628]  ret_from_fork+0x27/0x40
      <4>[   89.287641] Code: 0d 48 83 ea 01 4c 89 c1 48 83 fa ff 74 12 48 23 0c d7 74 ed 48 c1 e2 06 48 0f bd c9 48 8d 04 0a 5d c3 90 90 90 90 90 55 48 89 e5 <48> 8b 0a 48 89 0e 48 89 c8 f0 48 0f b1 3a 48 39 c1 75 ed 48 85
      <1>[   89.287774] RIP: llist_add_batch+0x4/0x20 RSP: ffffc9000012bdb8
      <4>[   89.287826] ---[ end trace e775d15174d8ae02 ]---
      
      (Lockless lists are only easy (and lockless) when only using
      llist_add/llist_del_all!)
      
      Fixes: 87701b4b ("drm/i915: Only free the oldest stale object before
      a fresh allocation")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20171106111508.11941-1-chris@chris-wilson.co.ukReviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      (cherry picked from commit f991c492)
      Signed-off-by: NJani Nikula <jani.nikula@intel.com>
      0f763ff3
  7. 06 11月, 2017 1 次提交
    • C
      drm/i915: Lock llist_del_first() vs llist_del_all() · f991c492
      Chris Wilson 提交于
      An oversight in commit 87701b4b ("drm/i915: Only free the oldest
      stale object before a fresh allocation") was that not only do we have to
      serialise concurrent users of llist_del_first(), but we also have to
      lock llist_del_first() vs llist_del_all().
      
      From llist.h,
      
       * This can be summarized as follows:
       *
       *           |   add    | del_first |  del_all
       * add       |    -     |     -     |     -
       * del_first |          |     L     |     L
       * del_all   |          |           |     -
       *
       * Where, a particular row's operation can happen concurrently with a column's
       * operation, with "-" being no lock needed, while "L" being lock is needed.
      
      This should hopefully explain:
      
      <4>[   89.287106] general protection fault: 0000 [#1] PREEMPT SMP
      <4>[   89.287126] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp i915 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core r8169 mii mei_me mei snd_pcm prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel
      <4>[   89.287226] CPU: 2 PID: 23 Comm: ksoftirqd/2 Tainted: G     U          4.14.0-rc8-CI-CI_DRM_3315+ #1
      <4>[   89.287247] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
      <4>[   89.287270] task: ffff88017ab34ec0 task.stack: ffffc90000128000
      <4>[   89.287290] RIP: 0010:llist_add_batch+0x4/0x20
      <4>[   89.287301] RSP: 0018:ffffc9000012bdb8 EFLAGS: 00010296
      <4>[   89.287314] RAX: ffffffff811017ad RBX: 6e468801a1560000 RCX: ef3e53fceecdeb81
      <4>[   89.287330] RDX: 6e468801a1566130 RSI: ffff880103d73d98 RDI: ffff880103d73d98
      <4>[   89.287346] RBP: ffffc9000012bdb8 R08: ffff88017ab35780 R09: 0000000000000000
      <4>[   89.287361] R10: ffffc9000012bd68 R11: 00000000abb18c3d R12: ffffffffa01369e0
      <4>[   89.287377] R13: ffff88017fd1b8f8 R14: ffff88017ab34ec0 R15: 000000000000000a
      <4>[   89.287393] FS:  0000000000000000(0000) GS:ffff88017fd00000(0000) knlGS:0000000000000000
      <4>[   89.287411] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      <4>[   89.287424] CR2: 00007ff0c0755018 CR3: 000000016df9b000 CR4: 00000000003406e0
      <4>[   89.287440] Call Trace:
      <4>[   89.287511]  __i915_gem_free_object_rcu+0x20/0x40 [i915]
      <4>[   89.287527]  rcu_process_callbacks+0x27a/0x730
      <4>[   89.287544]  __do_softirq+0xc0/0x4ae
      <4>[   89.287559]  ? smpboot_thread_fn+0x2d/0x280
      <4>[   89.287571]  run_ksoftirqd+0x1f/0x70
      <4>[   89.287582]  smpboot_thread_fn+0x18a/0x280
      <4>[   89.287595]  kthread+0x114/0x150
      <4>[   89.287605]  ? sort_range+0x30/0x30
      <4>[   89.287615]  ? kthread_create_on_node+0x40/0x40
      <4>[   89.287628]  ret_from_fork+0x27/0x40
      <4>[   89.287641] Code: 0d 48 83 ea 01 4c 89 c1 48 83 fa ff 74 12 48 23 0c d7 74 ed 48 c1 e2 06 48 0f bd c9 48 8d 04 0a 5d c3 90 90 90 90 90 55 48 89 e5 <48> 8b 0a 48 89 0e 48 89 c8 f0 48 0f b1 3a 48 39 c1 75 ed 48 85
      <1>[   89.287774] RIP: llist_add_batch+0x4/0x20 RSP: ffffc9000012bdb8
      <4>[   89.287826] ---[ end trace e775d15174d8ae02 ]---
      
      (Lockless lists are only easy (and lockless) when only using
      llist_add/llist_del_all!)
      
      Fixes: 87701b4b ("drm/i915: Only free the oldest stale object before
      a fresh allocation")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20171106111508.11941-1-chris@chris-wilson.co.ukReviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      f991c492
  8. 03 11月, 2017 1 次提交
  9. 01 11月, 2017 1 次提交
  10. 27 10月, 2017 2 次提交
    • C
      drm/i915: Hold rcu_read_lock when iterating over the radixtree (objects) · bea6e987
      Chris Wilson 提交于
      Kasan spotted
      
          [IGT] gem_tiled_pread_pwrite: exiting, ret=0
          ==================================================================
          BUG: KASAN: use-after-free in __i915_gem_object_reset_page_iter+0x15c/0x170 [i915]
          Read of size 8 at addr ffff8801359da310 by task kworker/3:2/182
      
          CPU: 3 PID: 182 Comm: kworker/3:2 Tainted: G     U          4.14.0-rc6-CI-Custom_3340+ #1
          Hardware name: Intel Corp. Geminilake/GLK RVP1 DDR4 (05), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
          Workqueue: events __i915_gem_free_work [i915]
          Call Trace:
           dump_stack+0x68/0xa0
           print_address_description+0x78/0x290
           ? __i915_gem_object_reset_page_iter+0x15c/0x170 [i915]
           kasan_report+0x23d/0x350
           __asan_report_load8_noabort+0x19/0x20
           __i915_gem_object_reset_page_iter+0x15c/0x170 [i915]
           ? i915_gem_object_truncate+0x100/0x100 [i915]
           ? lock_acquire+0x380/0x380
           __i915_gem_object_put_pages+0x30d/0x530 [i915]
           __i915_gem_free_objects+0x551/0xbd0 [i915]
           ? lock_acquire+0x13e/0x380
           __i915_gem_free_work+0x4e/0x70 [i915]
           process_one_work+0x6f6/0x1590
           ? pwq_dec_nr_in_flight+0x2b0/0x2b0
           worker_thread+0xe6/0xe90
           ? pci_mmcfg_check_reserved+0x110/0x110
           kthread+0x309/0x410
           ? process_one_work+0x1590/0x1590
           ? kthread_create_on_node+0xb0/0xb0
           ret_from_fork+0x27/0x40
      
          Allocated by task 1801:
           save_stack_trace+0x1b/0x20
           kasan_kmalloc+0xee/0x190
           kasan_slab_alloc+0x12/0x20
           kmem_cache_alloc+0xdc/0x2e0
           radix_tree_node_alloc.constprop.12+0x48/0x330
           __radix_tree_create+0x274/0x480
           __radix_tree_insert+0xa2/0x610
           i915_gem_object_get_sg+0x224/0x670 [i915]
           i915_gem_object_get_page+0xb5/0x1c0 [i915]
           i915_gem_pread_ioctl+0x822/0xf60 [i915]
           drm_ioctl_kernel+0x13f/0x1c0
           drm_ioctl+0x6cf/0x980
           do_vfs_ioctl+0x184/0xf30
           SyS_ioctl+0x41/0x70
           entry_SYSCALL_64_fastpath+0x1c/0xb1
      
          Freed by task 37:
           save_stack_trace+0x1b/0x20
           kasan_slab_free+0xaf/0x190
           kmem_cache_free+0xbf/0x340
           radix_tree_node_rcu_free+0x79/0x90
           rcu_process_callbacks+0x46d/0xf40
           __do_softirq+0x21c/0x8d3
      
          The buggy address belongs to the object at ffff8801359da0f0
          which belongs to the cache radix_tree_node of size 576
          The buggy address is located 544 bytes inside of
          576-byte region [ffff8801359da0f0, ffff8801359da330)
          The buggy address belongs to the page:
          page:ffffea0004d67600 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
          flags: 0x8000000000008100(slab|head)
          raw: 8000000000008100 0000000000000000 0000000000000000 0000000100110011
          raw: ffffea0004b52920 ffffea0004b38020 ffff88015b416a80 0000000000000000
          page dumped because: kasan: bad access detected
      
          Memory state around the buggy address:
           ffff8801359da200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
           ffff8801359da280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
          >ffff8801359da300: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
      			     ^
           ffff8801359da380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
           ffff8801359da400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
          ==================================================================
          Disabling lock debugging due to kernel taint
      
      which looks like the slab containing the radixtree iter was freed as we
      traversed the tree, taking the rcu read lock across the loop should
      prevent that (deferring all the frees until the end).
      Reported-by: NTomi Sarvela <tomi.p.sarvela@intel.com>
      Fixes: 96d77634 ("drm/i915: Use a radixtree for random access to the object's backing storage")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20171026130032.10677-1-chris@chris-wilson.co.ukReviewed-by: NMatthew Auld <matthew.william.auld@gmail.com>
      bea6e987
    • M
      drm/i915/guc: Preemption! With GuC · c41937fd
      Michał Winiarski 提交于
      Pretty similar to what we have on execlists.
      We're reusing most of the GEM code, however, due to GuC quirks we need a
      couple of extra bits.
      Preemption is implemented as GuC action, and actions can be pretty slow.
      Because of that, we're using a mutex to serialize them. Since we're
      requesting preemption from the tasklet, the task of creating a workitem
      and wrapping it in GuC action is delegated to a worker.
      
      To distinguish that preemption has finished, we're using additional
      piece of HWSP, and since we're not getting context switch interrupts,
      we're also adding a user interrupt.
      
      The fact that our special preempt context has completed unfortunately
      doesn't mean that we're ready to submit new work. We also need to wait
      for GuC to finish its own processing.
      
      v2: Don't compile out the wait for GuC, handle workqueue flush on reset,
      no need for ordered workqueue, put on a reviewer hat when looking at my own
      patches (Chris)
      Move struct work around in intel_guc, move user interruput outside of
      conditional (Michał)
      Keep ring around rather than chase though intel_context
      
      v3: Extract WA for flushing ggtt writes to a helper (Chris)
      Keep work_struct in intel_guc rather than engine (Michał)
      Use ordered workqueue for inject_preempt worker to avoid GuC quirks.
      
      v4: Drop now unused INTEL_GUC_PREEMPT_OPTION_IMMEDIATE (Daniele)
      Drop stray newlines, use container_of for intel_guc in worker,
      check for presence of workqueue when flushing it, rather than
      enable_guc_submission modparam, reorder preempt postprocessing (Chris)
      
      v5: Make wq NULL after destroying it
      
      v6: Swap struct guc_preempt_work members (Michał)
      Signed-off-by: NMichał Winiarski <michal.winiarski@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Jeff McGee <jeff.mcgee@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Oscar Mateo <oscar.mateo@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20171026133558.19580-1-michal.winiarski@intel.com
      c41937fd
  11. 26 10月, 2017 2 次提交
  12. 24 10月, 2017 2 次提交
  13. 19 10月, 2017 1 次提交
  14. 18 10月, 2017 1 次提交
  15. 17 10月, 2017 7 次提交
  16. 14 10月, 2017 1 次提交
  17. 11 10月, 2017 2 次提交
    • D
      drm/i915: Use rcu instead of stop_machine in set_wedged · af7a8ffa
      Daniel Vetter 提交于
      stop_machine is not really a locking primitive we should use, except
      when the hw folks tell us the hw is broken and that's the only way to
      work around it.
      
      This patch tries to address the locking abuse of stop_machine() from
      
      commit 20e4933c
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Tue Nov 22 14:41:21 2016 +0000
      
          drm/i915: Stop the machine as we install the wedged submit_request handler
      
      Chris said parts of the reasons for going with stop_machine() was that
      it's no overhead for the fast-path. But these callbacks use irqsave
      spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.
      
      To stay as close as possible to the stop_machine semantics we first
      update all the submit function pointers to the nop handler, then call
      synchronize_rcu() to make sure no new requests can be submitted. This
      should give us exactly the huge barrier we want.
      
      I pondered whether we should annotate engine->submit_request as __rcu
      and use rcu_assign_pointer and rcu_dereference on it. But the reason
      behind those is to make sure the compiler/cpu barriers are there for
      when you have an actual data structure you point at, to make sure all
      the writes are seen correctly on the read side. But we just have a
      function pointer, and .text isn't changed, so no need for these
      barriers and hence no need for annotations.
      
      Unfortunately there's a complication with the call to
      intel_engine_init_global_seqno:
      
      - Without stop_machine we must hold the corresponding spinlock.
      
      - Without stop_machine we must ensure that all requests are marked as
        having failed with dma_fence_set_error() before we call it. That
        means we need to split the nop request submission into two phases,
        both synchronized with rcu:
      
        1. Only stop submitting the requests to hw and mark them as failed.
      
        2. After all pending requests in the scheduler/ring are suitably
        marked up as failed and we can force complete them all, also force
        complete by calling intel_engine_init_global_seqno().
      
      This should fix the followwing lockdep splat:
      
      ======================================================
      WARNING: possible circular locking dependency detected
      4.14.0-rc3-CI-CI_DRM_3179+ #1 Tainted: G     U
      ------------------------------------------------------
      kworker/3:4/562 is trying to acquire lock:
       (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8113d4bc>] stop_machine+0x1c/0x40
      
      but task is already holding lock:
       (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #6 (&dev->struct_mutex){+.+.}:
             __lock_acquire+0x1420/0x15e0
             lock_acquire+0xb0/0x200
             __mutex_lock+0x86/0x9b0
             mutex_lock_interruptible_nested+0x1b/0x20
             i915_mutex_lock_interruptible+0x51/0x130 [i915]
             i915_gem_fault+0x209/0x650 [i915]
             __do_fault+0x1e/0x80
             __handle_mm_fault+0xa08/0xed0
             handle_mm_fault+0x156/0x300
             __do_page_fault+0x2c5/0x570
             do_page_fault+0x28/0x250
             page_fault+0x22/0x30
      
      -> #5 (&mm->mmap_sem){++++}:
             __lock_acquire+0x1420/0x15e0
             lock_acquire+0xb0/0x200
             __might_fault+0x68/0x90
             _copy_to_user+0x23/0x70
             filldir+0xa5/0x120
             dcache_readdir+0xf9/0x170
             iterate_dir+0x69/0x1a0
             SyS_getdents+0xa5/0x140
             entry_SYSCALL_64_fastpath+0x1c/0xb1
      
      -> #4 (&sb->s_type->i_mutex_key#5){++++}:
             down_write+0x3b/0x70
             handle_create+0xcb/0x1e0
             devtmpfsd+0x139/0x180
             kthread+0x152/0x190
             ret_from_fork+0x27/0x40
      
      -> #3 ((complete)&req.done){+.+.}:
             __lock_acquire+0x1420/0x15e0
             lock_acquire+0xb0/0x200
             wait_for_common+0x58/0x210
             wait_for_completion+0x1d/0x20
             devtmpfs_create_node+0x13d/0x160
             device_add+0x5eb/0x620
             device_create_groups_vargs+0xe0/0xf0
             device_create+0x3a/0x40
             msr_device_create+0x2b/0x40
             cpuhp_invoke_callback+0xc9/0xbf0
             cpuhp_thread_fun+0x17b/0x240
             smpboot_thread_fn+0x18a/0x280
             kthread+0x152/0x190
             ret_from_fork+0x27/0x40
      
      -> #2 (cpuhp_state-up){+.+.}:
             __lock_acquire+0x1420/0x15e0
             lock_acquire+0xb0/0x200
             cpuhp_issue_call+0x133/0x1c0
             __cpuhp_setup_state_cpuslocked+0x139/0x2a0
             __cpuhp_setup_state+0x46/0x60
             page_writeback_init+0x43/0x67
             pagecache_init+0x3d/0x42
             start_kernel+0x3a8/0x3fc
             x86_64_start_reservations+0x2a/0x2c
             x86_64_start_kernel+0x6d/0x70
             verify_cpu+0x0/0xfb
      
      -> #1 (cpuhp_state_mutex){+.+.}:
             __lock_acquire+0x1420/0x15e0
             lock_acquire+0xb0/0x200
             __mutex_lock+0x86/0x9b0
             mutex_lock_nested+0x1b/0x20
             __cpuhp_setup_state_cpuslocked+0x53/0x2a0
             __cpuhp_setup_state+0x46/0x60
             page_alloc_init+0x28/0x30
             start_kernel+0x145/0x3fc
             x86_64_start_reservations+0x2a/0x2c
             x86_64_start_kernel+0x6d/0x70
             verify_cpu+0x0/0xfb
      
      -> #0 (cpu_hotplug_lock.rw_sem){++++}:
             check_prev_add+0x430/0x840
             __lock_acquire+0x1420/0x15e0
             lock_acquire+0xb0/0x200
             cpus_read_lock+0x3d/0xb0
             stop_machine+0x1c/0x40
             i915_gem_set_wedged+0x1a/0x20 [i915]
             i915_reset+0xb9/0x230 [i915]
             i915_reset_device+0x1f6/0x260 [i915]
             i915_handle_error+0x2d8/0x430 [i915]
             hangcheck_declare_hang+0xd3/0xf0 [i915]
             i915_hangcheck_elapsed+0x262/0x2d0 [i915]
             process_one_work+0x233/0x660
             worker_thread+0x4e/0x3b0
             kthread+0x152/0x190
             ret_from_fork+0x27/0x40
      
      other info that might help us debug this:
      
      Chain exists of:
        cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev->struct_mutex
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&dev->struct_mutex);
                                     lock(&mm->mmap_sem);
                                     lock(&dev->struct_mutex);
        lock(cpu_hotplug_lock.rw_sem);
      
       *** DEADLOCK ***
      
      3 locks held by kworker/3:4/562:
       #0:  ("events_long"){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
       #1:  ((&(&i915->gpu_error.hangcheck_work)->work)){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
       #2:  (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
      
      stack backtrace:
      CPU: 3 PID: 562 Comm: kworker/3:4 Tainted: G     U          4.14.0-rc3-CI-CI_DRM_3179+ #1
      Hardware name:                  /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017
      Workqueue: events_long i915_hangcheck_elapsed [i915]
      Call Trace:
       dump_stack+0x68/0x9f
       print_circular_bug+0x235/0x3c0
       ? lockdep_init_map_crosslock+0x20/0x20
       check_prev_add+0x430/0x840
       ? irq_work_queue+0x86/0xe0
       ? wake_up_klogd+0x53/0x70
       __lock_acquire+0x1420/0x15e0
       ? __lock_acquire+0x1420/0x15e0
       ? lockdep_init_map_crosslock+0x20/0x20
       lock_acquire+0xb0/0x200
       ? stop_machine+0x1c/0x40
       ? i915_gem_object_truncate+0x50/0x50 [i915]
       cpus_read_lock+0x3d/0xb0
       ? stop_machine+0x1c/0x40
       stop_machine+0x1c/0x40
       i915_gem_set_wedged+0x1a/0x20 [i915]
       i915_reset+0xb9/0x230 [i915]
       i915_reset_device+0x1f6/0x260 [i915]
       ? gen8_gt_irq_ack+0x170/0x170 [i915]
       ? work_on_cpu_safe+0x60/0x60
       i915_handle_error+0x2d8/0x430 [i915]
       ? vsnprintf+0xd1/0x4b0
       ? scnprintf+0x3a/0x70
       hangcheck_declare_hang+0xd3/0xf0 [i915]
       ? intel_runtime_pm_put+0x56/0xa0 [i915]
       i915_hangcheck_elapsed+0x262/0x2d0 [i915]
       process_one_work+0x233/0x660
       worker_thread+0x4e/0x3b0
       kthread+0x152/0x190
       ? process_one_work+0x660/0x660
       ? kthread_create_on_node+0x40/0x40
       ret_from_fork+0x27/0x40
      Setting dangerous option reset - tainting kernel
      i915 0000:00:02.0: Resetting chip after gpu hang
      Setting dangerous option reset - tainting kernel
      i915 0000:00:02.0: Resetting chip after gpu hang
      
      v2: Have 1 global synchronize_rcu() barrier across all engines, and
      improve commit message.
      
      v3: We need to protect the seqno update with the timeline spinlock (in
      set_wedged) to avoid racing with other updates of the seqno, like we
      already do in nop_submit_request (Chris).
      
      v4: Use two-phase sequence to plug the race Chris spotted where we can
      complete requests before they're marked up with -EIO.
      
      v5: Review from Chris:
      - simplify nop_submit_request.
      - Add comment to rcu_read_lock section.
      - Align comments with the new style.
      
      v6: Remove unused variable to appease CI.
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102886
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103096
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Marta Lofstedt <marta.lofstedt@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20171011091019.1425-1-daniel.vetter@ffwll.ch
      af7a8ffa
    • S
      drm/i915: Name structure in dev_priv that contains RPS/RC6 state as "gt_pm" · 562d9bae
      Sagar Arun Kamble 提交于
      Prepared substructure rps for RPS related state. autoenable_work is
      used for RC6 too hence it is defined outside rps structure. As we do
      this lot many functions are refactored to use intel_rps *rps to access
      rps related members. Hence renamed intel_rps_client pointer variables
      to rps_client in various functions.
      
      v2: Rebase.
      
      v3: s/pm/gt_pm (Chris)
      Refactored access to rps structure by declaring struct intel_rps * in
      many functions.
      Signed-off-by: NSagar Arun Kamble <sagar.a.kamble@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Imre Deak <imre.deak@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com> #1
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/1507360055-19948-9-git-send-email-sagar.a.kamble@intel.comAcked-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20171010213010.7415-8-chris@chris-wilson.co.uk
      562d9bae
  18. 10 10月, 2017 7 次提交