1. 28 4月, 2016 1 次提交
  2. 14 4月, 2016 1 次提交
  3. 13 4月, 2016 1 次提交
  4. 12 4月, 2016 1 次提交
  5. 04 4月, 2016 1 次提交
    • T
      drm/i915: Move execlists irq handler to a bottom half · 27af5eea
      Tvrtko Ursulin 提交于
      Doing a lot of work in the interrupt handler introduces huge
      latencies to the system as a whole.
      
      Most dramatic effect can be seen by running an all engine
      stress test like igt/gem_exec_nop/all where, when the kernel
      config is lean enough, the whole system can be brought into
      multi-second periods of complete non-interactivty. That can
      look for example like this:
      
       NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u8:3:143]
       Modules linked in: [redacted for brevity]
       CPU: 0 PID: 143 Comm: kworker/u8:3 Tainted: G     U       L  4.5.0-160321+ #183
       Hardware name: Intel Corporation Broadwell Client platform/WhiteTip Mountain 1
       Workqueue: i915 gen6_pm_rps_work [i915]
       task: ffff8800aae88000 ti: ffff8800aae90000 task.ti: ffff8800aae90000
       RIP: 0010:[<ffffffff8104a3c2>]  [<ffffffff8104a3c2>] __do_softirq+0x72/0x1d0
       RSP: 0000:ffff88014f403f38  EFLAGS: 00000206
       RAX: ffff8800aae94000 RBX: 0000000000000000 RCX: 00000000000006e0
       RDX: 0000000000000020 RSI: 0000000004208060 RDI: 0000000000215d80
       RBP: ffff88014f403f80 R08: 0000000b1b42c180 R09: 0000000000000022
       R10: 0000000000000004 R11: 00000000ffffffff R12: 000000000000a030
       R13: 0000000000000082 R14: ffff8800aa4d0080 R15: 0000000000000082
       FS:  0000000000000000(0000) GS:ffff88014f400000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007fa53b90c000 CR3: 0000000001a0a000 CR4: 00000000001406f0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Stack:
        042080601b33869f ffff8800aae94000 00000000fffc2678 ffff88010000000a
        0000000000000000 000000000000a030 0000000000005302 ffff8800aa4d0080
        0000000000000206 ffff88014f403f90 ffffffff8104a716 ffff88014f403fa8
       Call Trace:
        <IRQ>
        [<ffffffff8104a716>] irq_exit+0x86/0x90
        [<ffffffff81031e7d>] smp_apic_timer_interrupt+0x3d/0x50
        [<ffffffff814f3eac>] apic_timer_interrupt+0x7c/0x90
        <EOI>
        [<ffffffffa01c5b40>] ? gen8_write64+0x1a0/0x1a0 [i915]
        [<ffffffff814f2b39>] ? _raw_spin_unlock_irqrestore+0x9/0x20
        [<ffffffffa01c5c44>] gen8_write32+0x104/0x1a0 [i915]
        [<ffffffff8132c6a2>] ? n_tty_receive_buf_common+0x372/0xae0
        [<ffffffffa017cc9e>] gen6_set_rps_thresholds+0x1be/0x330 [i915]
        [<ffffffffa017eaf0>] gen6_set_rps+0x70/0x200 [i915]
        [<ffffffffa0185375>] intel_set_rps+0x25/0x30 [i915]
        [<ffffffffa01768fd>] gen6_pm_rps_work+0x10d/0x2e0 [i915]
        [<ffffffff81063852>] ? finish_task_switch+0x72/0x1c0
        [<ffffffff8105ab29>] process_one_work+0x139/0x350
        [<ffffffff8105b186>] worker_thread+0x126/0x490
        [<ffffffff8105b060>] ? rescuer_thread+0x320/0x320
        [<ffffffff8105fa64>] kthread+0xc4/0xe0
        [<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170
        [<ffffffff814f351f>] ret_from_fork+0x3f/0x70
        [<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170
      
      I could not explain, or find a code path, which would explain
      a +20 second lockup, but from some instrumentation it was
      apparent the interrupts off proportion of time was between
      10-25% under heavy load which is quite bad.
      
      When a interrupt "cliff" is reached, which was >~320k irq/s on
      my machine, the whole system goes into a terrible state of the
      above described multi-second lockups.
      
      By moving the GT interrupt handling to a tasklet in a most
      simple way, the problem above disappears completely.
      
      Testing the effect on sytem-wide latencies using
      igt/gem_syslatency shows the following before this patch:
      
      gem_syslatency: cycles=1532739, latency mean=416531.829us max=2499237us
      gem_syslatency: cycles=1839434, latency mean=1458099.157us max=4998944us
      gem_syslatency: cycles=1432570, latency mean=2688.451us max=1201185us
      gem_syslatency: cycles=1533543, latency mean=416520.499us max=2498886us
      
      This shows that the unrelated process is experiencing huge
      delays in its wake-up latency. After the patch the results
      look like this:
      
      gem_syslatency: cycles=808907, latency mean=53.133us max=1640us
      gem_syslatency: cycles=862154, latency mean=62.778us max=2117us
      gem_syslatency: cycles=856039, latency mean=58.079us max=2123us
      gem_syslatency: cycles=841683, latency mean=56.914us max=1667us
      
      Showing a huge improvement in the unrelated process wake-up
      latency. It also shows an approximate halving in the number
      of total empty batches submitted during the test. This may
      not be worrying since the test puts the driver under
      a very unrealistic load with ncpu threads doing empty batch
      submission to all GPU engines each.
      
      Another benefit compared to the hard-irq handling is that now
      work on all engines can be dispatched in parallel since we can
      have up to number of CPUs active tasklets. (While previously
      a single hard-irq would serially dispatch on one engine after
      another.)
      
      More interesting scenario with regards to throughput is
      "gem_latency -n 100" which  shows 25% better throughput and
      CPU usage, and 14% better dispatch latencies.
      
      I did not find any gains or regressions with Synmark2 or
      GLbench under light testing. More benchmarking is certainly
      required.
      
      v2:
         * execlists_lock should be taken as spin_lock_bh when
           queuing work from userspace now. (Chris Wilson)
         * uncore.lock must be taken with spin_lock_irq when
           submitting requests since that now runs from either
           softirq or process context.
      
      v3:
         * Expanded commit message with more testing data;
         * converted missed locking sites to _bh;
         * added execlist_lock comment. (Chris Wilson)
      
      v4:
         * Mention dispatch parallelism in commit. (Chris Wilson)
         * Do not hold uncore.lock over MMIO reads since the block
           is already serialised per-engine via the tasklet itself.
           (Chris Wilson)
         * intel_lrc_irq_handler should be static. (Chris Wilson)
         * Cancel/sync the tasklet on GPU reset. (Chris Wilson)
         * Document and WARN that tasklet cannot be active/pending
           on engine cleanup. (Chris Wilson/Imre Deak)
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Imre Deak <imre.deak@intel.com>
      Testcase: igt/gem_exec_nop/all
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94350Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: http://patchwork.freedesktop.org/patch/msgid/1459768316-6670-1-git-send-email-tvrtko.ursulin@linux.intel.com
      27af5eea
  6. 16 3月, 2016 1 次提交
  7. 29 1月, 2016 1 次提交
  8. 18 1月, 2016 1 次提交
    • T
      drm/i915: Do not call API requiring struct_mutex where it is not available · ca82580c
      Tvrtko Ursulin 提交于
      LRC code was calling GEM API like i915_gem_obj_ggtt_offset from
      places where the struct_mutex cannot be grabbed (irq handlers).
      
      To avoid that this patch caches some interesting bits and values
      in the engine and context structures.
      
      Some usages are also removed where they are not needed like a
      few asserts which are either impossible or have been checked
      already during engine initialization.
      
      Side benefit is also that interrupt handlers and command
      submission stop evaluating invariant conditionals, like what
      Gen we are running on, on every interrupt and every command
      submitted.
      
      This patch deals with logical ring context id and descriptors
      while subsequent patches will deal with the remaining issues.
      
      v2:
       * Cache the VMA instead of the address. (Chris Wilson)
       * Incorporate Dave Gordon's good comments and function name.
      
      v3:
       * Extract ctx descriptor template to a function and group
         functions dealing with ctx descriptor & co together near
         top of the file. (Dave Gordon)
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Dave Gordon <david.s.gordon@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1452870629-13830-1-git-send-email-tvrtko.ursulin@linux.intel.com
      ca82580c
  9. 07 1月, 2016 1 次提交
  10. 05 1月, 2016 1 次提交
  11. 05 12月, 2015 1 次提交
  12. 03 12月, 2015 1 次提交
    • N
      drm/i915: Extend LRC pinning to cover GPU context writeback · 6d65ba94
      Nick Hoath 提交于
      Use the first retired request on a new context to unpin
      the old context. This ensures that the hw context remains
      bound until it has been written back to by the GPU.
      Now that the context is pinned until later in the request/context
      lifecycle, it no longer needs to be pinned from context_queue to
      retire_requests.
      This fixes an issue with GuC submission where the GPU might not
      have finished writing back the context before it is unpinned. This
      results in a GPU hang.
      
      v2: Moved the new pin to cover GuC submission (Alex Dai)
          Moved the new unpin to request_retire to fix coverage leak
      v3: Added switch to default context if freeing a still pinned
          context just in case the hw was actually still using it
      v4: Unwrapped context unpin to allow calling without a request
      v5: Only create a switch to idle context if the ring doesn't
          already have a request pending on it (Alex Dai)
          Rename unsaved to dirty to avoid double negatives (Dave Gordon)
          Changed _no_req postfix to __ prefix for consistency (Dave Gordon)
          Split out per engine cleanup from context_free as it
          was getting unwieldy
          Corrected locking (Dave Gordon)
      v6: Removed some bikeshedding (Mika Kuoppala)
          Added explanation of the GuC hang that this fixes (Daniel Vetter)
      v7: Removed extra per request pinning from ring reset code (Alex Dai)
          Added forced ring unpin/clean in error case in context free (Alex Dai)
      Signed-off-by: NNick Hoath <nicholas.hoath@intel.com>
      Issue: VIZ-4277
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Gordon <david.s.gordon@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Alex Dai <yu.dai@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: NAlex Dai <yu.dai@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      6d65ba94
  13. 18 11月, 2015 2 次提交
  14. 28 9月, 2015 1 次提交
    • M
      drm/i915: Consider HW CSB write pointer before resetting the sw read pointer · dfc53c5e
      Michel Thierry 提交于
      A previous commit resets the Context Status Buffer (CSB) read pointer in
      ring init
          commit c0a03a2e ("drm/i915: Reset CSB read pointer in ring init")
      
      This is generally correct, but this pointer is not reset after
      suspend/resume in some platforms (cht). In this case, the driver should
      read the register value instead of resetting the sw read counter to 0.
      Otherwise we process old events, leading to unwanted pre-emptions or
      something worse.
      
      But in other platforms (bdw) and also during GPU reset or power up, the
      CSBWP is reset to 0x7 (an invalid number), and in this case the read
      pointer should be set to 5 (the interrupt code will increment this
      counter one more time, and will start reading from CSB[0]).
      
      v2: When the CSB registers are reset, the read pointer needs to be set
      to 5, otherwise the first write (CSB[0]) won't be read (Mika).
      Replace magic numbers with GEN8_CSB_ENTRIES (6) and GEN8_CSB_PTR_MASK
      (0x07).
      
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: stable@vger.kernel.org # v4.0+
      Signed-off-by: NLei Shen <lei.shen@intel.com>
      Signed-off-by: NDeepak S <deepak.s@intel.com>
      Signed-off-by: NMichel Thierry <michel.thierry@intel.com>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Signed-off-by: NJani Nikula <jani.nikula@intel.com>
      dfc53c5e
  15. 23 9月, 2015 1 次提交
  16. 14 9月, 2015 1 次提交
    • N
      drm/i915: Split alloc from init for lrc · e84fe803
      Nick Hoath 提交于
      Extend init/init_hw split to context init.
         - Move context initialisation in to i915_gem_init_hw
         - Move one off initialisation for render ring to
              i915_gem_validate_context
         - Move default context initialisation to logical_ring_init
      
      Rename intel_lr_context_deferred_create to
      intel_lr_context_deferred_alloc, to reflect reduced functionality &
      alloc/init split.
      
      This patch is intended to split out the allocation of resources &
      initialisation to allow easier reuse of code for resume/gpu reset.
      
      v2: Removed function ptr wrapping of do_switch_context (Daniel Vetter)
          Left ->init_context int intel_lr_context_deferred_alloc
          (Daniel Vetter)
          Remove unnecessary init flag & ring type test. (Daniel Vetter)
          Improve commit message (Daniel Vetter)
      v3: On init/reinit, set the hw next sequence number to the sw next
          sequence number. This is set to 1 at driver load time. This prevents
          the seqno being reset on reinit (Chris Wilson)
      v4: Set seqno back to ~0 - 0x1000 at start-of-day, and increment by 0x100
          on reset.
          This makes it obvious which bbs are which after a reset. (David Gordon
          & John Harrison)
          Rebase.
      v5: Rebase. Fixed rebase breakage. Put context pinning in separate
          function. Removed code churn. (Thomas Daniel)
      v6: Cleanup up issues introduced in v2 & v5 (Thomas Daniel)
      
      Issue: VIZ-4798
      Signed-off-by: NNick Hoath <nicholas.hoath@intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: John Harrison <john.c.harrison@intel.com>
      Cc: David Gordon <david.s.gordon@intel.com>
      Cc: Thomas Daniel <thomas.daniel@intel.com>
      Reviewed-by: NThomas Daniel <thomas.daniel@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      e84fe803
  17. 15 8月, 2015 2 次提交
    • A
      drm/i915: Integrate GuC-based command submission · d1675198
      Alex Dai 提交于
      GuC-based submission is mostly the same as execlist mode, up to
      intel_logical_ring_advance_and_submit(), where the context being
      dispatched would be added to the execlist queue; at this point
      we submit the context to the GuC backend instead.
      
      There are, however, a few other changes also required, notably:
      1.  Contexts must be pinned at GGTT addresses accessible by the GuC
          i.e. NOT in the range [0..WOPCM_SIZE), so we have to add the
          PIN_OFFSET_BIAS flag to the relevant GGTT-pinning calls.
      
      2.  The GuC's TLB must be invalidated after a context is pinned at
          a new GGTT address.
      
      3.  GuC firmware uses the one page before Ring Context as shared data.
          Therefore, whenever driver wants to get base address of LRC, we
          will offset one page for it. LRC_PPHWSP_PN is defined as the page
          number of LRCA.
      
      4.  In the work queue used to pass requests to the GuC, the GuC
          firmware requires the ring-tail-offset to be represented as an
          11-bit value, expressed in QWords. Therefore, the ringbuffer
          size must be reduced to the representable range (4 pages).
      
      v2:
          Defer adding #defines until needed [Chris Wilson]
          Rationalise type declarations [Chris Wilson]
      
      v4:
          Squashed kerneldoc patch into here [Daniel Vetter]
      
      v5:
          Update request->tail in code common to both GuC and execlist modes.
          Add a private version of lr_context_update(), as sharing the
              execlist version leads to race conditions when the CPU and
              the GuC both update TAIL in the context image.
          Conversion of error-captured HWS page to string must account
              for offset from start of object to actual HWS (LRC_PPHWSP_PN).
      
      Issue: VIZ-4884
      Signed-off-by: NAlex Dai <yu.dai@intel.com>
      Signed-off-by: NDave Gordon <david.s.gordon@intel.com>
      Reviewed-by: NTom O'Rourke <Tom.O'Rourke@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d1675198
    • D
      drm/i915: Expose one LRC function for GuC submission mode · 919f1f55
      Dave Gordon 提交于
      GuC submission is basically execlist submission, but with the GuC
      handling the actual writes to the ELSP and the resulting context
      switch interrupts.  So to describe a context for submission via
      the GuC, we need one of the same functions used in execlist mode.
      This commit exposes one such function, changing its name to better
      describe what it does (it's related to logical ring contexts rather
      than to execlists per se).
      
      v2:
          Replaces previous "drm/i915: Move execlists defines from .c to .h"
      
      v3:
          Incorporates a change to one of the functions exposed here that was
              previously part of an internal patch, but which was omitted from
              the version recently committed to drm-intel-nightly:
      	    7a01a0a2 drm/i915/lrc: Update PDPx registers with lri commands
              So we reinstate this change here.
      
      v4:
          Drop v3 change, update function parameters due to collision with
              8ee36152 drm/i915: Convert execlists_ctx_descriptor() for requests
      
      v5:
          Don't expose execlists_update_context() after all. The current
              version is no longer compatible with GuC submission; trying to
              share the execlist version of this function results in both GuC
              and CPU updating TAIL in the context image, with bad results when
              they get out of step. The GuC submission path now has its own
              private version that just updates the ringbuffer start address,
              and not TAIL or PDPx.
      
      v6:
          Rebased
      
      Issue: VIZ-4884
      Signed-off-by: NDave Gordon <david.s.gordon@intel.com>
      Reviewed-by: NTom O'Rourke <Tom.O'Rourke@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      919f1f55
  18. 14 7月, 2015 1 次提交
    • P
      drm/i915: Added Programming of the MOCS · 3bbaba0c
      Peter Antoine 提交于
      This change adds the programming of the MOCS registers to the gen 9+
      platforms. The set of MOCS configuration entries introduced by this
      patch is intended to be minimal but sufficient to cover the needs of
      current userspace - i.e. a good set of defaults. It is expected to be
      extended in the future to provide further default values or to allow
      userspace to redefine its private MOCS tables based on its demand for
      additional caching configurations. In this setup, userspace should
      only utilize the first N entries, higher entries are reserved for
      future use.
      
      It creates a fixed register set that is programmed across the different
      engines so that all engines have the same table. This is done as the
      main RCS context only holds the registers for itself and the shared
      L3 values. By trying to keep the registers consistent across the
      different engines it should make the programming for the registers
      consistent.
      
      v2:
      -'static const' for private data structures and style changes.(Matt Turner)
      v3:
      - Make the tables "slightly" more readable. (Damien Lespiau)
      - Updated tables fix performance regression.
      v4:
      - Code formatting. (Chris Wilson)
      - re-privatised mocs code. (Daniel Vetter)
      v5:
      - Changed the name of a function. (Chris Wilson)
      v6:
      - re-based
      - Added Mesa table entry (skylake & broxton) (Francisco Jerez)
      - Tidied up the readability defines (Francisco Jerez)
      - NUMBER of entries defines wrong. (Jim Bish)
      - Added comments to clear up the meaning of the tables (Jim Bish)
      Signed-off-by: NPeter Antoine <peter.antoine@intel.com>
      
      v7 (Francisco Jerez):
      - Don't write L3-specific MOCS_ESC/SCC values into the e/LLC control
        tables.  Prefix L3-specific defines consistently with L3_ and
        e/LLC-specific defines with LE_ to avoid this kind of confusion in
        the future.
      - Change L3CC WT define back to RESERVED (matches my hardware
        documentation and the original patch, probably a misunderstanding
        of my own previous comment).
      - Drop Android tables, define new minimal tables more suitable for the
        open source stack.
      - Add comment that the MOCS tables are part of the kernel ABI.
      - Move intel_logical_ring_begin() and _advance() calls one level down
        (Chris Wilson).
      - Minor formatting and style fixes.
      v8 (Francisco Jerez):
      - Add table size sanity check to emit_mocs_control/l3cc_table() (Chris
        Wilson).
      - Add comment about undefined entries being implicitly set to uncached
        for forwards compatibility.
      v9 (Francisco Jerez):
      - Minor style fixes.
      Signed-off-by: NFrancisco Jerez <currojerez@riseup.net>
      Acked-by: NDamien Lespiau <damien.lespiau@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      3bbaba0c
  19. 06 7月, 2015 2 次提交
  20. 23 6月, 2015 4 次提交
    • J
      drm/i915: Add *_ring_begin() to request allocation · ccd98fe4
      John Harrison 提交于
      Now that the *_ring_begin() functions no longer call the request allocation
      code, it is finally safe for the request allocation code to call *_ring_begin().
      This is important to guarantee that the space reserved for the subsequent
      i915_add_request() call does actually get reserved.
      
      v2: Renamed functions according to review feedback (Tomas Elf).
      
      For: VIZ-5115
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      ccd98fe4
    • J
      drm/i915: Update flush_all_caches() to take request structures · 4866d729
      John Harrison 提交于
      Updated the *_ring_flush_all_caches() functions to take requests instead of
      rings or ringbuf/context pairs.
      
      For: VIZ-5115
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Reviewed-by: NTomas Elf <tomas.elf@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      4866d729
    • J
      drm/i915: Merged the many do_execbuf() parameters into a structure · 5f19e2bf
      John Harrison 提交于
      The do_execbuf() function takes quite a few parameters. The actual set of
      parameters is going to change with the conversion to passing requests around.
      Further, it is due to grow massively with the arrival of the GPU scheduler.
      
      This patch simplifies the prototype by passing a parameter structure instead.
      Changing the parameter set in the future is then simply a matter of
      adding/removing items to the structure.
      
      Note that the structure does not contain absolutely everything that is passed
      in. This is because the intention is to use this structure more extensively
      later in this patch series and more especially in the GPU scheduler that is
      coming soon. The latter requires hanging on to the structure as the final
      hardware submission can be delayed until long after the execbuf IOCTL has
      returned to user land. Thus it is unsafe to put anything in the structure that
      is local to the IOCTL call itself - such as the 'args' parameter. All entries
      must be copies of data or pointers to structures that are reference counted in
      some way and guaranteed to exist for the duration of the batch buffer's life.
      
      v2: Rebased to newer tree and updated for changes to the command parser.
      Specifically, a code shuffle has required saving the batch start address in the
      params structure.
      
      For: VIZ-5115
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Reviewed-by: NTomas Elf <tomas.elf@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      5f19e2bf
    • J
      drm/i915: Set context in request from creation even in legacy mode · 40e895ce
      John Harrison 提交于
      In execlist mode, the context object pointer is written in to the request
      structure (and reference counted) at the point of request creation. In legacy
      mode, this only happens inside i915_add_request().
      
      This patch updates the legacy code path to match the execlist version. This
      allows all the intermediate code between request creation and request submission
      to get at the context object given only a request structure. Thus negating the
      need to pass context pointers here, there and everywhere.
      
      v2: Moved the context reference so it does not need to be undone if the
      get_seqno() fails.
      
      v3: Fixed execlist mode always hitting a warning about invalid last_contexts
      (which don't exist in execlist mode).
      
      v4: Updated for new i915_gem_request_alloc() scheme.
      
      For: VIZ-5115
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Reviewed-by: NTomas Elf <tomas.elf@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      40e895ce
  21. 01 4月, 2015 2 次提交
  22. 26 2月, 2015 1 次提交
    • J
      drm/i915: Rename 'flags' to 'dispatch_flags' for better code reading · 8e004efc
      John Harrison 提交于
      There is a flags word that is passed through the execbuffer code path all the
      way from initial decoding of the user parameters down to the very final dispatch
      buffer call. It is simply called 'flags'. Unfortuantely, there are many other
      flags words floating around in the same blocks of code. Even more once the GPU
      scheduler arrives.
      
      This patch makes it more obvious exactly which flags word is which by renaming
      'flags' to 'dispatch_flags'. Note that the bit definitions for this flags word
      already have an 'I915_DISPATCH_' prefix on them and so are not quite so
      ambiguous.
      
      OTC-Jira: VIZ-1587
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      [danvet: Resolve conflict with Chris' rework of the bb parsing.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      8e004efc
  23. 24 2月, 2015 1 次提交
  24. 14 2月, 2015 3 次提交
  25. 27 1月, 2015 4 次提交
  26. 15 12月, 2014 1 次提交
  27. 20 11月, 2014 2 次提交
    • O
      drm/i915/bdw: Pin the context backing objects to GGTT on-demand · dcb4c12a
      Oscar Mateo 提交于
      Up until now, we have pinned every logical ring context backing object
      during creation, and left it pinned until destruction. This made my life
      easier, but it's a harmful thing to do, because we cause fragmentation
      of the GGTT (and, eventually, we would run out of space).
      
      This patch makes the pinning on-demand: the backing objects of the two
      contexts that are written to the ELSP are pinned right before submission
      and unpinned once the hardware is done with them. The only context that
      is still pinned regardless is the global default one, so that the HWS can
      still be accessed in the same way (ring->status_page).
      
      v2: In the early version of this patch, we were pinning the context as
      we put it into the ELSP: on the one hand, this is very efficient because
      only a maximum two contexts are pinned at any given time, but on the other
      hand, we cannot really pin in interrupt time :(
      
      v3: Use a mutex rather than atomic_t to protect pin count to avoid races.
      Do not unpin default context in free_request.
      
      v4: Break out pin and unpin into functions.  Fix style problems reported
      by checkpatch
      
      v5: Remove unpin_lock as all pinning and unpinning is done with the struct
      mutex already locked.  Add WARN_ONs to make sure this is the case in future.
      
      Issue: VIZ-4277
      Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
      Signed-off-by: NThomas Daniel <thomas.daniel@intel.com>
      Reviewed-by: NAkash Goel <akash.goels@gmail.com>
      Reviewed-by: Deepak S<deepak.s@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      dcb4c12a
    • T
      drm/i915/bdw: Clean up execlist queue items in retire_work · c86ee3a9
      Thomas Daniel 提交于
      No longer create a work item to clean each execlist queue item.
      Instead, move retired execlist requests to a queue and clean up the
      items during retire_requests.
      
      v2: Fix legacy ring path broken during overzealous cleanup
      
      v3: Update idle detection to take execlists queue into account
      
      v4: Grab execlist lock when checking queue state
      
      v5: Fix leaking requests by freeing in execlists_retire_requests.
      
      Issue: VIZ-4274
      Signed-off-by: NThomas Daniel <thomas.daniel@intel.com>
      Reviewed-by: NDeepak S <deepak.s@linux.intel.com>
      Reviewed-by: NAkash Goel <akash.goels@gmail.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      c86ee3a9