1. 06 3月, 2020 1 次提交
  2. 04 3月, 2020 1 次提交
    • L
      drm/i915: Force DPCD backlight mode on X1 Extreme 2nd Gen 4K AMOLED panel · 17f5d579
      Lyude Paul 提交于
      The X1 Extreme is one of the systems that lies about which backlight
      interface that it uses in its VBIOS as PWM backlight controls don't work
      at all on this machine. It's possible that this panel could be one of
      the infamous ones that can switch between PWM mode and DPCD backlight
      control mode, but we haven't gotten any more details on this from Lenovo
      just yet. For the time being though, making sure the backlight 'just
      works' is a bit more important.
      
      So, add a quirk to force DPCD backlight controls on for these systems
      based on EDID (since this panel doesn't appear to fill in the device ID).
      Hopefully in the future we'll figure out a better way of probing this.
      
      Changes since v2:
      * The bugzilla URL is deprecated, bug reporting happens on gitlab now.
        Update the messages we print to reflect this
      * Also, take the opportunity to move FDO_BUG_URL out of i915_utils.c and
        into i915_utils.h so that other places which print things that aren't
        traditional errors but are worth filing bugs about, can actually use
        it.
      Signed-off-by: NLyude Paul <lyude@redhat.com>
      Reviewed-by: NAdam Jackson <ajax@redhat.com>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200303215320.93491-1-lyude@redhat.com
      17f5d579
  3. 12 12月, 2019 1 次提交
  4. 29 10月, 2019 1 次提交
  5. 26 10月, 2019 1 次提交
  6. 24 10月, 2019 1 次提交
    • C
      drm/i915/execlists: Force preemption · 3a7a92ab
      Chris Wilson 提交于
      If the preempted context takes too long to relinquish control, e.g. it
      is stuck inside a shader with arbitration disabled, evict that context
      with an engine reset. This ensures that preemptions are reasonably
      responsive, providing a tighter QoS for the more important context at
      the cost of flagging unresponsive contexts more frequently (i.e. instead
      of using an ~10s hangcheck, we now evict at ~100ms).  The challenge of
      lies in picking a timeout that can be reasonably serviced by HW for
      typical workloads, balancing the existing clients against the needs for
      responsiveness.
      
      Note that coupled with timeslicing, this will lead to rapid GPU "hang"
      detection with multiple active contexts vying for GPU time.
      
      The forced preemption mechanism can be compiled out with
      
      	./scripts/config --set-val DRM_I915_PREEMPT_TIMEOUT 0
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20191023133108.21401-2-chris@chris-wilson.co.uk
      3a7a92ab
  7. 17 8月, 2019 1 次提交
  8. 09 8月, 2019 2 次提交
  9. 20 6月, 2019 1 次提交
    • C
      drm/i915/execlists: Preempt-to-busy · 22b7a426
      Chris Wilson 提交于
      When using a global seqno, we required a precise stop-the-workd event to
      handle preemption and unwind the global seqno counter. To accomplish
      this, we would preempt to a special out-of-band context and wait for the
      machine to report that it was idle. Given an idle machine, we could very
      precisely see which requests had completed and which we needed to feed
      back into the run queue.
      
      However, now that we have scrapped the global seqno, we no longer need
      to precisely unwind the global counter and only track requests by their
      per-context seqno. This allows us to loosely unwind inflight requests
      while scheduling a preemption, with the enormous caveat that the
      requests we put back on the run queue are still _inflight_ (until the
      preemption request is complete). This makes request tracking much more
      messy, as at any point then we can see a completed request that we
      believe is not currently scheduled for execution. We also have to be
      careful not to rewind RING_TAIL past RING_HEAD on preempting to the
      running context, and for this we use a semaphore to prevent completion
      of the request before continuing.
      
      To accomplish this feat, we change how we track requests scheduled to
      the HW. Instead of appending our requests onto a single list as we
      submit, we track each submission to ELSP as its own block. Then upon
      receiving the CS preemption event, we promote the pending block to the
      inflight block (discarding what was previously being tracked). As normal
      CS completion events arrive, we then remove stale entries from the
      inflight tracker.
      
      v2: Be a tinge paranoid and ensure we flush the write into the HWS page
      for the GPU semaphore to pick in a timely fashion.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190620142052.19311-1-chris@chris-wilson.co.uk
      22b7a426
  10. 28 5月, 2019 1 次提交
  11. 22 5月, 2019 1 次提交
    • C
      drm/i915: Allow a context to define its set of engines · 976b55f0
      Chris Wilson 提交于
      Over the last few years, we have debated how to extend the user API to
      support an increase in the number of engines, that may be sparse and
      even be heterogeneous within a class (not all video decoders created
      equal). We settled on using (class, instance) tuples to identify a
      specific engine, with an API for the user to construct a map of engines
      to capabilities. Into this picture, we then add a challenge of virtual
      engines; one user engine that maps behind the scenes to any number of
      physical engines. To keep it general, we want the user to have full
      control over that mapping. To that end, we allow the user to constrain a
      context to define the set of engines that it can access, order fully
      controlled by the user via (class, instance). With such precise control
      in context setup, we can continue to use the existing execbuf uABI of
      specifying a single index; only now it doesn't automagically map onto
      the engines, it uses the user defined engine map from the context.
      
      v2: Fixup freeing of local on success of get_engines()
      v3: Allow empty engines[]
      v4: s/nengine/num_engines/
      v5: Replace 64 limit on num_engines with a note that execbuf is
      currently limited to only using the first 64 engines.
      v6: Actually use the engines_mutex to guard the ctx->engines.
      
      Testcase: igt/gem_ctx_engines
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-2-chris@chris-wilson.co.uk
      976b55f0
  12. 14 5月, 2019 1 次提交
    • I
      drm/i915: Add support for tracking wakerefs w/o power-on guarantee · 4547c255
      Imre Deak 提交于
      It's useful to track runtime PM refs that don't guarantee a device
      power-on state to the rest of the driver. One such case is holding a
      reference that will be put asynchronously, during which normal users
      without their own reference shouldn't access the HW. A follow-up patch
      will add support for disabling display power domains asynchronously
      which needs this.
      
      For this we can split wakeref_count into a low half-word tracking
      all references (raw-wakerefs) and a high half-word tracking
      references guaranteeing a power-on state (wakelocks).
      
      Follow-up patches will make use of the API added here.
      
      While at it add the missing docbook header for the unchecked
      display-power and runtime_pm put functions.
      
      No functional changes, except for printing leaked raw-wakerefs
      and wakelocks separately in intel_runtime_pm_cleanup().
      
      v2:
      - Track raw wakerefs/wakelocks in the low/high half-word of
        wakeref_count, instead of adding a new counter. (Chris)
      v3:
      - Add a struct_member(T, m) helper instead of open-coding it. (Chris)
      - Checkpatch indentation formatting fix.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NImre Deak <imre.deak@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190509173446.31095-2-imre.deak@intel.com
      4547c255
  13. 03 5月, 2019 2 次提交
  14. 22 3月, 2019 1 次提交
    • C
      drm/i915: Introduce the i915_user_extension_method · 9d1305ef
      Chris Wilson 提交于
      An idea for extending uABI inspired by Vulkan's extension chains.
      Instead of expanding the data struct for each ioctl every time we need
      to add a new feature, define an extension chain instead. As we add
      optional interfaces to control the ioctl, we define a new extension
      struct that can be linked into the ioctl data only when required by the
      user. The key advantage being able to ignore large control structs for
      optional interfaces/extensions, while being able to process them in a
      consistent manner.
      
      In comparison to other extensible ioctls, the key difference is the
      use of a linked chain of extension structs vs an array of tagged
      pointers. For example,
      
      struct drm_amdgpu_cs_chunk {
              __u32           chunk_id;
              __u32           length_dw;
              __u64           chunk_data;
      };
      
      struct drm_amdgpu_cs_in {
              __u32           ctx_id;
              __u32           bo_list_handle;
              __u32           num_chunks;
              __u32           _pad;
              __u64           chunks;
      };
      
      allows userspace to pass in array of pointers to extension structs, but
      must therefore keep constructing that array along side the command stream.
      In dynamic situations like that, a linked list is preferred and does not
      similar from extra cache line misses as the extension structs themselves
      must still be loaded separate to the chunks array.
      
      v2: Apply the tail call optimisation directly to nip the worry of stack
      overflow in the bud.
      v3: Defend against recursion.
      v4: Fixup local types to match new uabi
      
      Opens:
      - do we include the result as an out-field in each chain?
      struct i915_user_extension {
      	__u64 next_extension;
      	__u64 name;
      	__s32 result;
      	__u32 mbz; /* reserved for future use */
      };
      * Undecided, so provision some room for future expansion.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190322092325.5883-1-chris@chris-wilson.co.uk
      9d1305ef
  15. 06 3月, 2019 1 次提交
    • M
      mm, compaction: use free lists to quickly locate a migration source · 70b44595
      Mel Gorman 提交于
      The migration scanner is a linear scan of a zone with a potentiall large
      search space.  Furthermore, many pageblocks are unusable such as those
      filled with reserved pages or partially filled with pages that cannot
      migrate.  These still get scanned in the common case of allocating a THP
      and the cost accumulates.
      
      The patch uses a partial search of the free lists to locate a migration
      source candidate that is marked as MOVABLE when allocating a THP.  It
      prefers picking a block with a larger number of free pages already on
      the basis that there are fewer pages to migrate to free the entire
      block.  The lowest PFN found during searches is tracked as the basis of
      the start for the linear search after the first search of the free list
      fails.  After the search, the free list is shuffled so that the next
      search will not encounter the same page.  If the search fails then the
      subsequent searches will be shorter and the linear scanner is used.
      
      If this search fails, or if the request is for a small or
      unmovable/reclaimable allocation then the linear scanner is still used.
      It is somewhat pointless to use the list search in those cases.  Small
      free pages must be used for the search and there is no guarantee that
      movable pages are located within that block that are contiguous.
      
                                           5.0.0-rc1              5.0.0-rc1
                                       noboost-v3r10          findmig-v3r15
      Amean     fault-both-3      3771.41 (   0.00%)     3390.40 (  10.10%)
      Amean     fault-both-5      5409.05 (   0.00%)     5082.28 (   6.04%)
      Amean     fault-both-7      7040.74 (   0.00%)     7012.51 (   0.40%)
      Amean     fault-both-12    11887.35 (   0.00%)    11346.63 (   4.55%)
      Amean     fault-both-18    16718.19 (   0.00%)    15324.19 (   8.34%)
      Amean     fault-both-24    21157.19 (   0.00%)    16088.50 *  23.96%*
      Amean     fault-both-30    21175.92 (   0.00%)    18723.42 *  11.58%*
      Amean     fault-both-32    21339.03 (   0.00%)    18612.01 *  12.78%*
      
                                      5.0.0-rc1              5.0.0-rc1
                                  noboost-v3r10          findmig-v3r15
      Percentage huge-3        86.50 (   0.00%)       89.83 (   3.85%)
      Percentage huge-5        92.52 (   0.00%)       91.96 (  -0.61%)
      Percentage huge-7        92.44 (   0.00%)       92.85 (   0.44%)
      Percentage huge-12       92.98 (   0.00%)       92.74 (  -0.25%)
      Percentage huge-18       91.70 (   0.00%)       91.71 (   0.02%)
      Percentage huge-24       91.59 (   0.00%)       92.13 (   0.60%)
      Percentage huge-30       90.14 (   0.00%)       93.79 (   4.04%)
      Percentage huge-32       90.03 (   0.00%)       91.27 (   1.37%)
      
      This shows an improvement in allocation latencies with similar
      allocation success rates.  While not presented, there was a 31%
      reduction in migration scanning and a 8% reduction on system CPU usage.
      A 2-socket machine showed similar benefits.
      
      [mgorman@techsingularity.net: several fixes]
        Link: http://lkml.kernel.org/r/20190204120111.GL9565@techsingularity.net
      [vbabka@suse.cz: migrate block that was found-fast, some optimisations]
      Link: http://lkml.kernel.org/r/20190118175136.31341-10-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <Vbabka@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: YueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      70b44595
  16. 30 11月, 2018 1 次提交
  17. 26 9月, 2018 1 次提交
  18. 23 8月, 2018 1 次提交
  19. 30 4月, 2018 1 次提交
    • C
      drm/i915: Retire requests along rings · b887d615
      Chris Wilson 提交于
      In the next patch, rings are the central timeline as requests may jump
      between engines. Therefore in the future as we retire in order along the
      engine timeline, we may retire out-of-order within a ring (as the ring now
      occurs along multiple engines), leading to much hilarity in miscomputing
      the position of ring->head.
      
      As an added bonus, retiring along the ring reduces the penalty of having
      one execlists client do cleanup for another (old legacy submission
      shares a ring between all clients). The downside is that slow and
      irregular (off the critical path) process of cleaning up stale requests
      after userspace becomes a modicum less efficient.
      
      In the long run, it will become apparent that the ordered
      ring->request_list matches the ring->timeline, a fun challenge for the
      future will be unifying the two lists to avoid duplication!
      
      v2: We need both engine-order and ring-order processing to maintain our
      knowledge of where individual rings have completed upto as well as
      knowing what was last executing on any engine. And finally by decoupling
      retiring the contexts on the engine and the timelines along the rings,
      we do have to keep a reference to the context on each request
      (previously it was guaranteed by the context being pinned).
      
      v3: Not just a reference to the context, but we need to keep it pinned
      as we manipulate the rings; i.e. we need a pin for both the manipulation
      of the engine state during its retirements, and a separate pin for the
      manipulation of the ring state.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-3-chris@chris-wilson.co.uk
      b887d615
  20. 28 3月, 2018 1 次提交
  21. 22 12月, 2017 1 次提交
  22. 03 11月, 2017 1 次提交
  23. 07 10月, 2017 1 次提交
  24. 06 10月, 2017 1 次提交
  25. 16 6月, 2017 1 次提交
    • C
      drm/i915: Store a direct lookup from object handle to vma · 4ff4b44c
      Chris Wilson 提交于
      The advent of full-ppgtt lead to an extra indirection between the object
      and its binding. That extra indirection has a noticeable impact on how
      fast we can convert from the user handles to our internal vma for
      execbuffer. In order to bypass the extra indirection, we use a
      resizable hashtable to jump from the object to the per-ctx vma.
      rhashtable was considered but we don't need the online resizing feature
      and the extra complexity proved to undermine its usefulness. Instead, we
      simply reallocate the hastable on demand in a background task and
      serialize it before iterating.
      
      In non-full-ppgtt modes, multiple files and multiple contexts can share
      the same vma. This leads to having multiple possible handle->vma links,
      so we only use the first to establish the fast path. The majority of
      buffers are not shared and so we should still be able to realise
      speedups with multiple clients.
      
      v2: Prettier names, more magic.
      v3: Many style tweaks, most notably hiding the misuse of execobj[].rsvd2
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      4ff4b44c
  26. 17 5月, 2017 3 次提交
    • C
      drm/i915: Split execlist priority queue into rbtree + linked list · 6c067579
      Chris Wilson 提交于
      All the requests at the same priority are executed in FIFO order. They
      do not need to be stored in the rbtree themselves, as they are a simple
      list within a level. If we move the requests at one priority into a list,
      we can then reduce the rbtree to the set of priorities. This should keep
      the height of the rbtree small, as the number of active priorities can not
      exceed the number of active requests and should be typically only a few.
      
      Currently, we have ~2k possible different priority levels, that may
      increase to allow even more fine grained selection. Allocating those in
      advance seems a waste (and may be impossible), so we opt for allocating
      upon first use, and freeing after its requests are depleted. To avoid
      the possibility of an allocation failure causing us to lose a request,
      we preallocate the default priority (0) and bump any request to that
      priority if we fail to allocate it the appropriate plist. Having a
      request (that is ready to run, so not leading to corruption) execute
      out-of-order is better than leaking the request (and its dependency
      tree) entirely.
      
      There should be a benefit to reducing execlists_dequeue() to principally
      using a simple list (and reducing the frequency of both rbtree iteration
      and balancing on erase) but for typical workloads, request coalescing
      should be small enough that we don't notice any change. The main gain is
      from improving PI calls to schedule, and the explicit list within a
      level should make request unwinding simpler (we just need to insert at
      the head of the list rather than the tail and not have to make the
      rbtree search more complicated).
      
      v2: Avoid use-after-free when deleting a depleted priolist
      
      v3: Michał found the solution to handling the allocation failure
      gracefully. If we disable all priority scheduling following the
      allocation failure, those requests will be executed in fifo and we will
      ensure that this request and its dependencies are in strict fifo (even
      when it doesn't realise it is only a single list). Normal scheduling is
      restored once we know the device is idle, until the next failure!
      Suggested-by: NMichał Wajdeczko <michal.wajdeczko@intel.com>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Michał Winiarski <michal.winiarski@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: NMichał Winiarski <michal.winiarski@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-8-chris@chris-wilson.co.uk
      6c067579
    • C
      drm/i915: Redefine ptr_pack_bits() and friends · 0ce81788
      Chris Wilson 提交于
      Rebrand the current (pointer | bits) pack/unpack utility macros as
      explicit bit twiddling for PAGE_SIZE so that we can use the more
      flexible underlying macros for different bits.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-4-chris@chris-wilson.co.uk
      0ce81788
    • C
      drm/i915: Make ptr_unpack_bits() more function-like · 991bfc64
      Chris Wilson 提交于
      ptr_unpack_bits() is a function-like macro, as such it is meant to be
      replaceable by a function. In this case, we should be passing in the
      out-param as a pointer.
      
      Bizarrely this does affect code generation:
      
      function                                     old     new   delta
      i915_gem_object_pin_map                      409     389     -20
      
      An improvement(?) in this case, but one can't help wonder what
      strict-aliasing optimisations we are preventing.
      
      The generated code looks identical in using ptr_unpack_bits (no extra
      motions to stack, the pointer and bits appear to be kept in registers),
      the difference appears to be code ordering and with a reorder it is able
      to use smaller forward jumps.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-3-chris@chris-wilson.co.uk
      991bfc64
  27. 09 5月, 2017 1 次提交
  28. 29 3月, 2017 1 次提交
  29. 15 3月, 2017 1 次提交
  30. 31 1月, 2017 1 次提交
  31. 06 1月, 2017 1 次提交
  32. 05 1月, 2017 1 次提交
  33. 17 12月, 2016 1 次提交
  34. 06 12月, 2016 1 次提交
  35. 09 11月, 2016 1 次提交
  36. 29 10月, 2016 1 次提交