1. 21 6月, 2017 3 次提交
  2. 16 6月, 2017 1 次提交
    • C
      drm/i915: Store a direct lookup from object handle to vma · 4ff4b44c
      Chris Wilson 提交于
      The advent of full-ppgtt lead to an extra indirection between the object
      and its binding. That extra indirection has a noticeable impact on how
      fast we can convert from the user handles to our internal vma for
      execbuffer. In order to bypass the extra indirection, we use a
      resizable hashtable to jump from the object to the per-ctx vma.
      rhashtable was considered but we don't need the online resizing feature
      and the extra complexity proved to undermine its usefulness. Instead, we
      simply reallocate the hastable on demand in a background task and
      serialize it before iterating.
      
      In non-full-ppgtt modes, multiple files and multiple contexts can share
      the same vma. This leads to having multiple possible handle->vma links,
      so we only use the first to establish the fast path. The majority of
      buffers are not shared and so we should still be able to realise
      speedups with multiple clients.
      
      v2: Prettier names, more magic.
      v3: Many style tweaks, most notably hiding the misuse of execobj[].rsvd2
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      4ff4b44c
  3. 07 6月, 2017 2 次提交
  4. 30 5月, 2017 1 次提交
  5. 26 5月, 2017 1 次提交
  6. 22 5月, 2017 1 次提交
  7. 19 5月, 2017 3 次提交
  8. 18 5月, 2017 1 次提交
  9. 17 5月, 2017 2 次提交
    • C
      drm/i915: Split execlist priority queue into rbtree + linked list · 6c067579
      Chris Wilson 提交于
      All the requests at the same priority are executed in FIFO order. They
      do not need to be stored in the rbtree themselves, as they are a simple
      list within a level. If we move the requests at one priority into a list,
      we can then reduce the rbtree to the set of priorities. This should keep
      the height of the rbtree small, as the number of active priorities can not
      exceed the number of active requests and should be typically only a few.
      
      Currently, we have ~2k possible different priority levels, that may
      increase to allow even more fine grained selection. Allocating those in
      advance seems a waste (and may be impossible), so we opt for allocating
      upon first use, and freeing after its requests are depleted. To avoid
      the possibility of an allocation failure causing us to lose a request,
      we preallocate the default priority (0) and bump any request to that
      priority if we fail to allocate it the appropriate plist. Having a
      request (that is ready to run, so not leading to corruption) execute
      out-of-order is better than leaking the request (and its dependency
      tree) entirely.
      
      There should be a benefit to reducing execlists_dequeue() to principally
      using a simple list (and reducing the frequency of both rbtree iteration
      and balancing on erase) but for typical workloads, request coalescing
      should be small enough that we don't notice any change. The main gain is
      from improving PI calls to schedule, and the explicit list within a
      level should make request unwinding simpler (we just need to insert at
      the head of the list rather than the tail and not have to make the
      rbtree search more complicated).
      
      v2: Avoid use-after-free when deleting a depleted priolist
      
      v3: Michał found the solution to handling the allocation failure
      gracefully. If we disable all priority scheduling following the
      allocation failure, those requests will be executed in fifo and we will
      ensure that this request and its dependencies are in strict fifo (even
      when it doesn't realise it is only a single list). Normal scheduling is
      restored once we know the device is idle, until the next failure!
      Suggested-by: NMichał Wajdeczko <michal.wajdeczko@intel.com>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Michał Winiarski <michal.winiarski@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: NMichał Winiarski <michal.winiarski@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-8-chris@chris-wilson.co.uk
      6c067579
    • C
      drm/i915/execlists: Pack the count into the low bits of the port.request · 77f0d0e9
      Chris Wilson 提交于
      add/remove: 1/1 grow/shrink: 5/4 up/down: 391/-578 (-187)
      function                                     old     new   delta
      execlists_submit_ports                       262     471    +209
      port_assign.isra                               -     136    +136
      capture                                     6344    6359     +15
      reset_common_ring                            438     452     +14
      execlists_submit_request                     228     238     +10
      gen8_init_common_ring                        334     341      +7
      intel_engine_is_idle                         106     105      -1
      i915_engine_info                            2314    2290     -24
      __i915_gem_set_wedged_BKL                    485     411     -74
      intel_lrc_irq_handler                       1789    1604    -185
      execlists_update_context                     294       -    -294
      
      The most important change there is the improve to the
      intel_lrc_irq_handler and excclist_submit_ports (net improvement since
      execlists_update_context is now inlined).
      
      v2: Use the port_api() for guc as well (even though currently we do not
      pack any counters in there, yet) and hide all port->request_count inside
      the helpers.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-5-chris@chris-wilson.co.uk
      77f0d0e9
  10. 16 5月, 2017 1 次提交
  11. 11 5月, 2017 2 次提交
  12. 10 5月, 2017 1 次提交
    • V
      drm/i915: Two stage watermarks for g4x · 04548cba
      Ville Syrjälä 提交于
      Implement proper two stage watermark programming for g4x. As with
      other pre-SKL platforms, the watermark registers aren't double
      buffered on g4x. Hence we must sequence the watermark update
      carefully around plane updates.
      
      The code is quite heavily modelled on the VLV/CHV code, with some
      fairly significant differences due to the different hardware
      architecture:
      * g4x doesn't use inverted watermark values
      * CxSR actually affects the watermarks since it controls memory self
        refresh in addition to the max FIFO mode
      * A further HPLL SR mode is possible with higher memory wakeup
        latency
      * g4x has FBC2 and so it also has FBC watermarks
      * max FIFO mode for primary plane only (cursor is allowed, sprite is not)
      * g4x has no manual FIFO repartitioning
      * some TLB miss related workarounds are needed for the watermarks
      
      Actually the hardware is quite similar to ILK+ in many ways. The
      most visible differences are in the actual watermakr register
      layout. ILK revamped that part quite heavily whereas g4x is still
      using the layout inherited from earlier platforms.
      
      Note that we didn't previously enable the HPLL SR on g4x. So in order
      to not introduce too many functional changes in this patch I've not
      actually enabled it here either, even though the code is now fully
      ready for it. We'll enable it separately later on.
      Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170421181432.15216-13-ville.syrjala@linux.intel.comReviewed-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      04548cba
  13. 09 5月, 2017 1 次提交
  14. 08 5月, 2017 1 次提交
  15. 10 4月, 2017 1 次提交
  16. 31 3月, 2017 1 次提交
  17. 28 3月, 2017 1 次提交
  18. 27 3月, 2017 1 次提交
  19. 23 3月, 2017 4 次提交
  20. 21 3月, 2017 1 次提交
  21. 17 3月, 2017 3 次提交
  22. 16 3月, 2017 1 次提交
  23. 14 3月, 2017 1 次提交
    • D
      drm/i915: annote drop_caches debugfs interface with lockdep · 05df49e7
      Daniel Vetter 提交于
      The trouble we have is that we can't really test all the shrinker
      recursion stuff exhaustively in BAT because any kind of thrashing
      stress test just takes too long.
      
      But that leaves a really big gap open, since shrinker recursions are
      one of the most annoying bugs. Now lockdep already has support for
      checking allocation deadlocks:
      
      - Direct reclaim paths are marked up with
        lockdep_set_current_reclaim_state() and
        lockdep_clear_current_reclaim_state().
      
      - Any allocation paths are marked with lockdep_trace_alloc().
      
      If we simply mark up our debugfs with the reclaim annotations, any
      code and locks taken in there will automatically complete the picture
      with any allocation paths we already have, as long as we have a simple
      testcase in BAT which throws out a few objects using this interface.
      Not stress test or thrashing needed at all.
      
      v2: Need to EXPORT_SYMBOL_GPL to make it compile as a module.
      
      v3: Fixup rebase fail (spotted by Chris).
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: linux-kernel@vger.kernel.org
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170312205340.16202-1-daniel.vetter@ffwll.chSigned-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      05df49e7
  24. 13 3月, 2017 1 次提交
    • C
      drm/i915: Extend rpm wakelock for debugfs/i915_drpc_info · cf632bd6
      Chris Wilson 提交于
      i915_drpc_info missed covering a few register read with the runtime pm
      wakelock. Be simple and cover the entire function with a single wakelock
      so that new additions are not similarly missed in future.
      
        WARNING: CPU: 2 PID: 1334 at drivers/gpu/drm/i915/intel_drv.h:1743 gen6_read32+0x192/0x1e0 [i915]
        RPM wakelock ref not held during HW access
        Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver netconsole nfsd auth_rpcgss ipmi_watchdog ipmi_poweroff ipmi_devintf ipmi_msghandler overlay btrfs xor raid6_pq dm_mod sg sd_mod snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ata_generic pata_acpi intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_intel kvm_intel snd_hda_codec kvm eeepc_wmi irqbypass snd_hda_core crct10dif_pclmul crc32_pclmul crc32c_intel asus_wmi sparse_keymap ghash_clmulni_intel snd_hwdep i915 rfkill ppdev pcbc aesni_intel ata_piix crypto_simd glue_helper snd_pcm pata_via cryptd pcspkr snd_timer drm_kms_helper syscopyarea snd sysfillrect libata sysimgblt fb_sys_fops soundcore shpchp drm wmi parport_pc parport tpm_infineon video
        CPU: 2 PID: 1334 Comm: php5 Not tainted 4.10.0-rc8-01615-g1f58c8e7 #1
        Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 1002 04/01/2011
        Call Trace:
         dump_stack+0x63/0x8a
         __warn+0xcb/0xf0
         warn_slowpath_fmt+0x4f/0x60
         ? seq_vprintf+0x35/0x50
         gen6_read32+0x192/0x1e0 [i915]
         i915_drpc_info+0x55d/0x990 [i915]
         seq_read+0xf2/0x3b0
         full_proxy_read+0x51/0x80
         __vfs_read+0x28/0x130
         ? security_file_permission+0x9b/0xc0
         ? rw_verify_area+0x4e/0xb0
         vfs_read+0xa8/0x170
         SyS_read+0x46/0xa0
         entry_SYSCALL_64_fastpath+0x1a/0xa9
        RIP: 0033:0x7fd97bf175a0
        RSP: 002b:00007ffdf730db68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
        RAX: ffffffffffffffda RBX: 00007fd978028738 RCX: 00007fd97bf175a0
        RDX: 0000000000002000 RSI: 00007fd97740e0d8 RDI: 0000000000000005
        RBP: 0000000000000001 R08: 0000000000e97840 R09: 00007fd977ef8d58
        R10: 0000000000000027 R11: 0000000000000246 R12: 00007fd977ef8d58
        R13: 0000000000000000 R14: 0000000000eb4640 R15: 0000000000000000
      Reported-by: Nkernel test robot <xiaolong.ye@intel.com>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170313095617.29010-1-chris@chris-wilson.co.ukReviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      cf632bd6
  25. 12 3月, 2017 1 次提交
  26. 10 3月, 2017 2 次提交
  27. 09 3月, 2017 1 次提交