1. 20 11月, 2014 3 次提交
    • T
      drm/i915/bdw: Pin the ringbuffer backing object to GGTT on-demand · 7ba717cf
      Thomas Daniel 提交于
      Same as with the context, pinning to GGTT regardless is harmful (it
      badly fragments the GGTT and can even exhaust it).
      
      Unfortunately, this case is also more complex than the previous one
      because we need to map and access the ringbuffer in several places
      along the execbuffer path (and we cannot make do by leaving the
      default ringbuffer pinned, as before). Also, the context object
      itself contains a pointer to the ringbuffer address that we have to
      keep updated if we are going to allow the ringbuffer to move around.
      
      v2: Same as with the context pinning, we cannot really do it during
      an interrupt. Also, pin the default ringbuffers objects regardless
      (makes error capture a lot easier).
      
      v3: Rebased. Take a pin reference of the ringbuffer for each item
      in the execlist request queue because the hardware may still be using
      the ringbuffer after the MI_USER_INTERRUPT to notify the seqno update
      is executed.  The ringbuffer must remain pinned until the context save
      is complete.  No longer pin and unpin ringbuffer in
      populate_lr_context() - this transient address is meaningless and the
      pinning can cause a sleep while atomic.
      
      v4: Moved ringbuffer pin and unpin into the lr_context_pin functions.
      Downgraded pinning check BUG_ONs to WARN_ONs.
      
      v5: Reinstated WARN_ONs for unexpected execlist states.  Removed unused
      variable.
      
      Issue: VIZ-4277
      Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
      Signed-off-by: NThomas Daniel <thomas.daniel@intel.com>
      Reviewed-by: NAkash Goel <akash.goels@gmail.com>
      Reviewed-by: Deepak S<deepak.s@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      7ba717cf
    • O
      drm/i915/bdw: Pin the context backing objects to GGTT on-demand · dcb4c12a
      Oscar Mateo 提交于
      Up until now, we have pinned every logical ring context backing object
      during creation, and left it pinned until destruction. This made my life
      easier, but it's a harmful thing to do, because we cause fragmentation
      of the GGTT (and, eventually, we would run out of space).
      
      This patch makes the pinning on-demand: the backing objects of the two
      contexts that are written to the ELSP are pinned right before submission
      and unpinned once the hardware is done with them. The only context that
      is still pinned regardless is the global default one, so that the HWS can
      still be accessed in the same way (ring->status_page).
      
      v2: In the early version of this patch, we were pinning the context as
      we put it into the ELSP: on the one hand, this is very efficient because
      only a maximum two contexts are pinned at any given time, but on the other
      hand, we cannot really pin in interrupt time :(
      
      v3: Use a mutex rather than atomic_t to protect pin count to avoid races.
      Do not unpin default context in free_request.
      
      v4: Break out pin and unpin into functions.  Fix style problems reported
      by checkpatch
      
      v5: Remove unpin_lock as all pinning and unpinning is done with the struct
      mutex already locked.  Add WARN_ONs to make sure this is the case in future.
      
      Issue: VIZ-4277
      Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
      Signed-off-by: NThomas Daniel <thomas.daniel@intel.com>
      Reviewed-by: NAkash Goel <akash.goels@gmail.com>
      Reviewed-by: Deepak S<deepak.s@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      dcb4c12a
    • T
      drm/i915/bdw: Clean up execlist queue items in retire_work · c86ee3a9
      Thomas Daniel 提交于
      No longer create a work item to clean each execlist queue item.
      Instead, move retired execlist requests to a queue and clean up the
      items during retire_requests.
      
      v2: Fix legacy ring path broken during overzealous cleanup
      
      v3: Update idle detection to take execlists queue into account
      
      v4: Grab execlist lock when checking queue state
      
      v5: Fix leaking requests by freeing in execlists_retire_requests.
      
      Issue: VIZ-4274
      Signed-off-by: NThomas Daniel <thomas.daniel@intel.com>
      Reviewed-by: NDeepak S <deepak.s@linux.intel.com>
      Reviewed-by: NAkash Goel <akash.goels@gmail.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      c86ee3a9
  2. 18 11月, 2014 1 次提交
  3. 15 11月, 2014 1 次提交
  4. 14 11月, 2014 3 次提交
  5. 08 11月, 2014 1 次提交
  6. 05 11月, 2014 2 次提交
  7. 21 10月, 2014 1 次提交
  8. 19 9月, 2014 2 次提交
  9. 03 9月, 2014 2 次提交
  10. 20 8月, 2014 2 次提交
  11. 15 8月, 2014 8 次提交
    • O
      drm/i915/bdw: Help out the ctx switch interrupt handler · f1ad5a1f
      Oscar Mateo 提交于
      If we receive a storm of requests for the same context (see gem_storedw_loop_*)
      we might end up iterating over too many elements in interrupt time, looking for
      contexts to squash together. Instead, share the burden by giving more
      intelligence to the queue function. At most, the interrupt will iterate over
      three elements.
      Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
      [danvet: Checkpatch.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      f1ad5a1f
    • O
      drm/i915/bdw: Avoid non-lite-restore preemptions · e1fee72c
      Oscar Mateo 提交于
      In the current Execlists feeding mechanism, full preemption is not
      supported yet: only lite-restores are allowed (this is: the GPU
      simply samples a new tail pointer for the context currently in
      execution).
      
      But we have identified an scenario in which a full preemption occurs:
      1) We submit two contexts for execution (A & B).
      2) The GPU finishes with the first one (A), switches to the second one
      (B) and informs us.
      3) We submit B again (hoping to cause a lite restore) together with C,
      but in the time we spend writing to the ELSP, the GPU finishes B.
      4) The GPU start executing B again (since we told it so).
      5) We receive a B finished interrupt and, mistakenly, we submit C (again)
      and D, causing a full preemption of B.
      
      The race is avoided by keeping track of how many times a context has been
      submitted to the hardware and by better discriminating the received context
      switch interrupts: in the example, when we have submitted B twice, we won´t
      submit C and D as soon as we receive the notification that B is completed
      because we were expecting to get a LITE_RESTORE and we didn´t, so we know a
      second completion will be received shortly.
      
      Without this explicit checking, somehow, the batch buffer execution order
      gets messed with. This can be verified with the IGT test I sent together with
      the series. I don´t know the exact mechanism by which the pre-emption messes
      with the execution order but, since other people is working on the Scheduler
      + Preemption on Execlists, I didn´t try to fix it. In these series, only Lite
      Restores are supported (other kind of preemptions WARN).
      
      v2: elsp_submitted belongs in the new intel_ctx_submit_request. Several
      rebase changes.
      
      v3: Clarify how the race is avoided, as requested by Daniel.
      Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
      [danvet: Align function parameters ...]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      e1fee72c
    • T
      drm/i915/bdw: Handle context switch events · e981e7b1
      Thomas Daniel 提交于
      Handle all context status events in the context status buffer on every
      context switch interrupt. We only remove work from the execlist queue
      after a context status buffer reports that it has completed and we only
      attempt to schedule new contexts on interrupt when a previously submitted
      context completes (unless no contexts are queued, which means the GPU is
      free).
      
      We canot call intel_runtime_pm_get() in an interrupt (or with a spinlock
      grabbed, FWIW), because it might sleep, which is not a nice thing to do.
      Instead, do the runtime_pm get/put together with the create/destroy request,
      and handle the forcewake get/put directly.
      Signed-off-by: NThomas Daniel <thomas.daniel@intel.com>
      
      v2: Unreferencing the context when we are freeing the request might free
      the backing bo, which requires the struct_mutex to be grabbed, so defer
      unreferencing and freeing to a bottom half.
      
      v3:
      - Ack the interrupt inmediately, before trying to handle it (fix for
      missing interrupts by Bob Beckett <robert.beckett@intel.com>).
      - Update the Context Status Buffer Read Pointer, just in case (spotted
      by Damien Lespiau).
      
      v4: New namespace and multiple rebase changes.
      
      v5: Squash with "drm/i915/bdw: Do not call intel_runtime_pm_get() in an
      interrupt", as suggested by Daniel.
      Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
      [danvet: Checkpatch ...]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      e981e7b1
    • M
      drm/i915/bdw: Two-stage execlist submit process · acdd884a
      Michel Thierry 提交于
      Context switch (and execlist submission) should happen only when
      other contexts are not active, otherwise pre-emption occurs.
      
      To assure this, we place context switch requests in a queue and those
      request are later consumed when the right context switch interrupt is
      received (still TODO).
      
      v2: Use a spinlock, do not remove the requests on unqueue (wait for
      context switch completion).
      Signed-off-by: NThomas Daniel <thomas.daniel@intel.com>
      
      v3: Several rebases and code changes. Use unique ID.
      
      v4:
      - Move the queue/lock init to the late ring initialization.
      - Damien's kmalloc review comments: check return, use sizeof(*req),
      do not cast.
      
      v5:
      - Do not reuse drm_i915_gem_request. Instead, create our own.
      - New namespace.
      
      Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v1)
      Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2-v5)
      Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
      [davnet: Checkpatch + wash-up s/BUG_ON/WARN_ON/.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      acdd884a
    • O
      drm/i915/bdw: Write the tail pointer, LRC style · ae1250b9
      Oscar Mateo 提交于
      Each logical ring context has the tail pointer in the context object,
      so update it before submission.
      
      v2: New namespace.
      Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      ae1250b9
    • B
      drm/i915/bdw: Implement context switching (somewhat) · 84b790f8
      Ben Widawsky 提交于
      A context switch occurs by submitting a context descriptor to the
      ExecList Submission Port. Given that we can now initialize a context,
      it's possible to begin implementing the context switch by creating the
      descriptor and submitting it to ELSP (actually two, since the ELSP
      has two ports).
      
      The context object must be mapped in the GGTT, which means it must exist
      in the 0-4GB graphics VA range.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      
      v2: This code has changed quite a lot in various rebases. Of particular
      importance is that now we use the globally unique Submission ID to send
      to the hardware. Also, context pages are now pinned unconditionally to
      GGTT, so there is no need to bind them.
      
      v3: Use LRCA[31:12] as hwCtxId[19:0]. This guarantees that the HW context
      ID we submit to the ELSP is globally unique and != 0 (Bspec requirements
      of the software use-only bits of the Context ID in the Context Descriptor
      Format) without the hassle of the previous submission Id construction.
      Also, re-add the ELSP porting read (it was dropped somewhere during the
      rebases).
      
      v4:
      - Squash with "drm/i915/bdw: Add forcewake lock around ELSP writes" (BSPEC
        says: "SW must set Force Wakeup bit to prevent GT from entering C6 while
        ELSP writes are in progress") as noted by Thomas Daniel
        (thomas.daniel@intel.com).
      - Rename functions and use an execlists/intel_execlists_ namespace.
      - The BUG_ON only checked that the LRCA was <32 bits, but it didn't make
        sure that it was properly aligned. Spotted by Alistair Mcaulay
        <alistair.mcaulay@intel.com>.
      
      v5:
      - Improved source code comments as suggested by Chris Wilson.
      - No need to abstract submit_ctx away, as pointed by Brad Volkin.
      Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
      [danvet: Checkpatch. Sigh.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      84b790f8
    • O
      drm/i915/bdw: Emission of requests with logical rings · 48e29f55
      Oscar Mateo 提交于
      On a previous iteration of this patch, I created an Execlists
      version of __i915_add_request and asbtracted it away as a
      vfunc. Daniel Vetter wondered then why that was needed:
      
      "with the clean split in command submission I expect every
      function to know wether it'll submit to an lrc (everything in
      intel_lrc.c) or wether it'll submit to a legacy ring (existing
      code), so I don't see a need for an add_request vfunc."
      
      The honest, hairy truth is that this patch is the glue keeping
      the whole logical ring puzzle together:
      
      - i915_add_request is used by intel_ring_idle, which in turn is
        used by i915_gpu_idle, which in turn is used in several places
        inside the eviction and gtt codes.
      - Also, it is used by i915_gem_check_olr, which is littered all
        over i915_gem.c
      - ...
      
      If I were to duplicate all the code that directly or indirectly
      uses __i915_add_request, I'll end up creating a separate driver.
      
      To show the differences between the existing legacy version and
      the new Execlists one, this time I have special-cased
      __i915_add_request instead of adding an add_request vfunc. I
      hope this helps to untangle this Gordian knot.
      Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
      [danvet: Adjust to ringbuf->FIXME_lrc_ctx per the discussion with
      Thomas Daniel.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      48e29f55
    • O
      drm/i915: Add temporary ring->ctx backpointer · 582d67f0
      Oscar Mateo 提交于
      The execlist patches have a bit a convoluted and long history and due
      to that have the actual submission still misplaced deeply burried in
      the low-level ringbuffer handling code. This design goes back to the
      legacy ringbuffer code with its tricky lazy request and simple work
      submissiion using ring tail writes. For that reason they need a
      ring->ctx backpointer.
      
      The goal is to unburry that code and move it up into a level where the
      full execlist context is available so that we can ditch this
      backpointer. Until that's done make it really obvious that there's
      work still to be done.
      
      Cc: Oscar Mateo <oscar.mateo@intel.com>
      Cc: Thomas Daniel <thomas.daniel@intel.com>
      Acked-by: NThomas Daniel <thomas.daniel@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      582d67f0
  12. 13 8月, 2014 1 次提交
    • D
      drm/i915: Only track real ppgtt for a context · ae6c4806
      Daniel Vetter 提交于
      There's a bit a confusion since we track the global gtt,
      the aliasing and real ppgtt in the ctx->vm pointer. And not
      all callers really bother to check for the different cases and just
      presume that it points to a real ppgtt.
      
      Now looking closely we don't actually need ->vm to always point at an
      address space - the only place that cares actually has fixup code
      already to decide whether to look at the per-proces or the global
      address space.
      
      So switch to just tracking the ppgtt directly and ditch all the
      extraneous code.
      
      v2: Fixup the ppgtt debugfs file to not oops on a NULL ctx->ppgtt.
      Also drop the early exit - without aliasing ppgtt we want to dump all
      the ppgtts of the contexts if we have full ppgtt.
      
      v3: Actually git add the compile fix.
      Reviewed-by: NMichel Thierry <michel.thierry@intel.com>
      Cc: "Thierry, Michel" <michel.thierry@intel.com>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      OTC-Jira: VIZ-3724
      [danvet: Resolve conflicts with execlist patches while applying.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      ae6c4806
  13. 12 8月, 2014 8 次提交
  14. 11 8月, 2014 5 次提交
    • O
      drm/i915/bdw: GEN-specific logical ring set/get seqno · e94e37ad
      Oscar Mateo 提交于
      No mistery here: the seqno is still retrieved from the engine's
      HW status page (the one in the default context. For the moment,
      I see no reason to worry about other context's HWS page).
      Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      e94e37ad
    • O
      drm/i915/bdw: GEN-specific logical ring init · 9b1136d5
      Oscar Mateo 提交于
      Logical rings do not need most of the initialization their
      legacy ringbuffer counterparts do: we just need the pipe
      control object for the render ring, enable Execlists on the
      hardware and a few workarounds.
      
      v2: Squash with: "drm/i915: Extract pipe control fini & make
      init outside accesible".
      Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
      [danvet: Make checkpatch happy.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      9b1136d5
    • O
      drm/i915/bdw: Generic logical ring init and cleanup · 48d82387
      Oscar Mateo 提交于
      Allocate and populate the default LRC for every ring, call
      gen-specific init/cleanup, init/fini the command parser and
      set the status page (now inside the LRC object). These are
      things all engines/rings have in common.
      
      Stopping the ring before cleanup and initializing the seqnos
      is left as a TODO task (we need more infrastructure in place
      before we can achieve this).
      
      v2: Check the ringbuffer backing obj for ring_is_initialized,
      instead of the context backing obj (similar, but not exactly
      the same).
      Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      48d82387
    • O
      drm/i915/bdw: Skeleton for the new logical rings submission path · 454afebd
      Oscar Mateo 提交于
      Execlists are indeed a brave new world with respect to workload
      submission to the GPU.
      
      In previous version of these series, I have tried to impact the
      legacy ringbuffer submission path as little as possible (mostly,
      passing the context around and using the correct ringbuffer when I
      needed one) but Daniel is afraid (probably with a reason) that
      these changes and, especially, future ones, will end up breaking
      older gens.
      
      This commit and some others coming next will try to limit the
      damage by creating an alternative path for workload submission.
      The first step is here: laying out a new ring init/fini.
      Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      454afebd
    • O
      drm/i915/bdw: Populate LR contexts (somewhat) · 8670d6f9
      Oscar Mateo 提交于
      For the most part, logical ring context objects are similar to hardware
      contexts in that the backing object is meant to be opaque. There are
      some exceptions where we need to poke certain offsets of the object for
      initialization, updating the tail pointer or updating the PDPs.
      
      For our basic execlist implementation we'll only need our PPGTT PDs,
      and ringbuffer addresses in order to set up the context. With previous
      patches, we have both, so start prepping the context to be load.
      
      Before running a context for the first time you must populate some
      fields in the context object. These fields begin 1 PAGE + LRCA, ie. the
      first page (in 0 based counting) of the context  image. These same
      fields will be read and written to as contexts are saved and restored
      once the system is up and running.
      
      Many of these fields are completely reused from previous global
      registers: ringbuffer head/tail/control, context control matches some
      previous MI_SET_CONTEXT flags, and page directories. There are other
      fields which we don't touch which we may want in the future.
      
      v2: CTX_LRI_HEADER_0 is MI_LOAD_REGISTER_IMM(14) for render and (11)
      for other engines.
      
      v3: Several rebases and general changes to the code.
      
      v4: Squash with "Extract LR context object populating"
      Also, Damien's review comments:
      - Set the Force Posted bit on the LRI header, as the BSpec suggest we do.
      - Prevent warning when compiling a 32-bits kernel without HIGHMEM64.
      - Add a clarifying comment to the context population code.
      
      v5: Damien's review comments:
      - The third MI_LOAD_REGISTER_IMM in the context does not set Force Posted.
      - Remove dead code.
      
      v6: Add a note about the (presumed) differences between BDW and CHV state
      contexts. Also, Brad's review comments:
      - Use the _MASKED_BIT_ENABLE, upper_32_bits and lower_32_bits macros.
      - Be less magical about how we set the ring size in the context.
      
      Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
      Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com> (v2)
      Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      8670d6f9