1. 14 10月, 2016 1 次提交
    • A
      drm/i915: Allocate intel_engine_cs structure only for the enabled engines · 3b3f1650
      Akash Goel 提交于
      With the possibility of addition of many more number of rings in future,
      the drm_i915_private structure could bloat as an array, of type
      intel_engine_cs, is embedded inside it.
      	struct intel_engine_cs engine[I915_NUM_ENGINES];
      Though this is still fine as generally there is only a single instance of
      drm_i915_private structure used, but not all of the possible rings would be
      enabled or active on most of the platforms. Some memory can be saved by
      allocating intel_engine_cs structure only for the enabled/active engines.
      Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
      indexed using the enums defined in intel_engine_id.
      To save memory and continue using the static engine/ring IDs, 'engine' is
      defined as an array of pointers.
      	struct intel_engine_cs *engine[I915_NUM_ENGINES];
      dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
      
      There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
      i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
      
      v2:
      - Remove the engine iterator field added in drm_i915_private structure,
        instead pass a local iterator variable to the for_each_engine**
        macros. (Chris)
      - Do away with intel_engine_initialized() and instead directly use the
        NULL pointer check on engine pointer. (Chris)
      
      v3:
      - Remove for_each_engine_id() macro, as the updated macro for_each_engine()
        can be used in place of it. (Chris)
      - Protect the access to Render engine Fault register with a NULL check, as
        engine specific init is done later in Driver load sequence.
      
      v4:
      - Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
      - Kill the superfluous init_engine_lists().
      
      v5:
      - Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
        allocation of intel_engine_cs structure. (Chris)
      
      v6:
      - Rebase.
      
      v7:
      - Optimize the for_each_engine_masked() macro. (Chris)
      - Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
      - Rebase.
      
      v8: Rebase.
      
      v9: Rebase.
      
      v10:
      - For index calculation use engine ID instead of pointer based arithmetic in
        intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
      - For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
      - Use for_each_engine macro for cleanup in intel_engines_init() and remove
        check for NULL engine pointer in cleanup() routines. (Joonas)
      
      v11: Rebase.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NAkash Goel <akash.goel@intel.com>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
      3b3f1650
  2. 10 10月, 2016 1 次提交
  3. 07 10月, 2016 1 次提交
  4. 09 9月, 2016 9 次提交
  5. 27 8月, 2016 2 次提交
  6. 15 8月, 2016 4 次提交
  7. 10 8月, 2016 2 次提交
  8. 06 8月, 2016 1 次提交
  9. 09 8月, 2016 1 次提交
    • C
      drm/i915: Do not overwrite the request with zero on reallocation · 5a198b8c
      Chris Wilson 提交于
      When using RCU lookup for the request, commit 0eafec6d ("drm/i915:
      Enable lockless lookup of request tracking via RCU"), we acknowledge that
      we may race with another thread that could have reallocated the request.
      In order for the first thread not to blow up, the second thread must not
      clear the request completed before overwriting it. In the RCU lookup, we
      allow for the engine/seqno to be replaced but we do not allow for it to
      be zeroed.
      
      The choice we make is to either add extra checking to the RCU lookup, or
      embrace the inherent races (as intended). It is more complicated as we
      need to manually clear everything we depend upon being zero initialised,
      but we benefit from not emiting the memset() to clear the entire
      frequently allocated structure (that memset turns up in throughput
      profiles). And at the same time, the lookup remains flexible for future
      adjustments.
      
      v2: Old style LRC requires another variable to be initialize. (The
      danger inherent in not zeroing everything.)
      v3: request->batch also needs to be cleared
      v4: signaling.tsk is no long used unset, but pid still exists
      
      Fixes: 0eafec6d ("drm/i915: Enable lockless lookup of request...")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: "Goel, Akash" <akash.goel@intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: http://patchwork.freedesktop.org/patch/msgid/1470731014-6894-2-git-send-email-chris@chris-wilson.co.uk
      5a198b8c
  10. 05 8月, 2016 4 次提交
    • C
      drm/i915: Assert that the request hasn't been retired · 209b3f7e
      Chris Wilson 提交于
      With all callers now not playing tricks with dropping the struct_mutex
      between waiting and retiring, we can assert that the request is ready to
      be retired.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-19-git-send-email-chris@chris-wilson.co.uk
      209b3f7e
    • C
      drm/i915: Enable i915_gem_wait_for_idle() without holding struct_mutex · dcff85c8
      Chris Wilson 提交于
      The principal motivation for this was to try and eliminate the
      struct_mutex from i915_gem_suspend - but we still need to hold the mutex
      current for the i915_gem_context_lost(). (The issue there is that there
      may be an indirect lockdep cycle between cpu_hotplug (i.e. suspend) and
      struct_mutex via the stop_machine().) For the moment, enabling last
      request tracking for the engine, allows us to do busyness checking and
      waiting without requiring the struct_mutex - which is useful in its own
      right.
      
      As a side-effect of having a robust means for tracking engine busyness,
      we can replace our other busyness heuristic, that of comparing against
      the last submitted seqno. For paranoid reasons, we have a semi-ordered
      check of that seqno inside the hangchecker, which we can now improve to
      an ordered check of the engine's busyness (removing a locked xchg in the
      process).
      
      v2: Pass along "bool interruptible" as being unlocked we cannot rely on
      i915->mm.interruptible being stable or even under our control.
      v3: Replace check Ironlake i915_gpu_busy() with the common precalculated value
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-6-git-send-email-chris@chris-wilson.co.uk
      dcff85c8
    • C
      drm/i915: Enable lockless lookup of request tracking via RCU · 0eafec6d
      Chris Wilson 提交于
      If we enable RCU for the requests (providing a grace period where we can
      inspect a "dead" request before it is freed), we can allow callers to
      carefully perform lockless lookup of an active request.
      
      However, by enabling deferred freeing of requests, we can potentially
      hog a lot of memory when dealing with tens of thousands of requests per
      second - with a quick insertion of a synchronize_rcu() inside our
      shrinker callback, that issue disappears.
      
      v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
      hogging memory with the delayed slab frees. At the moment, we wait for a
      grace period in the shrinker, and block for all RCU callbacks on oom.
      Suggested alternatives focus on flushing our RCU callback when we have a
      certain number of outstanding request frees, and blocking on that flush
      after a second high watermark. (So rather than wait for the system to
      run out of memory, we stop issuing requests - both are nondeterministic.)
      
      Paul E. McKenney wrote:
      
      Another approach is synchronize_rcu() after some largish number of
      requests.  The advantage of this approach is that it throttles the
      production of callbacks at the source.  The corresponding disadvantage
      is that it slows things up.
      
      Another approach is to use call_rcu(), but if the previous call_rcu()
      is still in flight, block waiting for it.  Yet another approach is
      the get_state_synchronize_rcu() / cond_synchronize_rcu() pair.  The
      idea is to do something like this:
      
              cond_synchronize_rcu(cookie);
              cookie = get_state_synchronize_rcu();
      
      You would of course do an initial get_state_synchronize_rcu() to
      get things going.  This would not block unless there was less than
      one grace period's worth of time between invocations.  But this
      assumes a busy system, where there is almost always a grace period
      in flight.  But you can make that happen as follows:
      
              cond_synchronize_rcu(cookie);
              cookie = get_state_synchronize_rcu();
              call_rcu(&my_rcu_head, noop_function);
      
      Note that you need additional code to make sure that the old callback
      has completed before doing a new one.  Setting and clearing a flag
      with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
      and smp_store_release()).
      
      v3: More comments on compiler and processor order of operations within
      the RCU lookup and discover we can use rcu_access_pointer() here instead.
      
      v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: "Goel, Akash" <akash.goel@intel.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
      0eafec6d
    • C
      drm/i915: Remove request retirement before each batch · 0340d9fd
      Chris Wilson 提交于
      This reimplements the denial-of-service protection against igt from
      commit 227f782e ("drm/i915: Retire requests before creating a new
      one") and transfers the stall from before each batch into get_pages().
      The issue is that the stall is increasing latency between batches which
      is detrimental in some cases (especially coupled with execlists) to
      keeping the GPU well fed. Also we have made the observation that retiring
      requests can of itself free objects (and requests) and therefore makes
      a good first step when shrinking.
      
      v2: Recycle objects prior to i915_gem_object_get_pages()
      v3: Remove the reference to the ring from i915_gem_requests_ring() as it
      operates on an intel_engine_cs.
      v4: Since commit 9b5f4e5e ("drm/i915: Retire oldest completed request
      before allocating next") we no longer need the safeguard to retire
      requests before get_pages(). We no longer see the huge latencies when
      hitting the shrinker between allocations.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-4-git-send-email-chris@chris-wilson.co.uk
      0340d9fd
  11. 04 8月, 2016 6 次提交
  12. 03 8月, 2016 8 次提交