1. 03 8月, 2016 4 次提交
  2. 21 7月, 2016 1 次提交
  3. 20 7月, 2016 1 次提交
  4. 14 7月, 2016 2 次提交
  5. 18 6月, 2016 1 次提交
    • Z
      drm/i915: Introduce execlist context status change notification · 3c7ba635
      Zhi Wang 提交于
      This patch introduces an approach to track the execlist context status
      change.
      
      GVT-g uses GVT context as the "shadow context". The content inside GVT
      context will be copied back to guest after the context is idle. And GVT-g
      has to know the status of the execlist context.
      
      This function is configurable when creating a new GEM context. Currently,
      Only GVT-g will create the "status-change-notification" enabled GEM
      context.
      
      v10:
      
      - Fix the identation. (Joonas)
      
      v8:
      
      - Remove the boolean flag in struct i915_gem_context. (Joonas)
      
      v7:
      
      - Remove per-engine ctx status notifiers. Use one status notifier for all
      engines. (Joonas)
      - Add prefix "INTEL_" for related definitions. (Joonas)
      - Refine the comments in execlists_context_status_change(). (Joonas)
      
      v6:
      
      - When !CONFIG_DRM_I915_GVT, make GVT code as dead code then compiler
      could automatically eliminate them for us. (Chris)
      - Always initialize the notifier header, so it could be switched on/off
      at runtime. (Chris)
      
      v5:
      
      - Only compile this feature when CONFIG_DRM_I915_GVT is enabled.(Tvrtko)
      
      Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v8)
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Signed-off-by: NZhi Wang <zhi.a.wang@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1466078825-6662-8-git-send-email-zhi.a.wang@intel.comSigned-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      3c7ba635
  6. 24 5月, 2016 2 次提交
  7. 23 5月, 2016 1 次提交
  8. 09 5月, 2016 1 次提交
  9. 28 4月, 2016 4 次提交
  10. 14 4月, 2016 1 次提交
  11. 13 4月, 2016 1 次提交
  12. 12 4月, 2016 1 次提交
  13. 04 4月, 2016 1 次提交
    • T
      drm/i915: Move execlists irq handler to a bottom half · 27af5eea
      Tvrtko Ursulin 提交于
      Doing a lot of work in the interrupt handler introduces huge
      latencies to the system as a whole.
      
      Most dramatic effect can be seen by running an all engine
      stress test like igt/gem_exec_nop/all where, when the kernel
      config is lean enough, the whole system can be brought into
      multi-second periods of complete non-interactivty. That can
      look for example like this:
      
       NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u8:3:143]
       Modules linked in: [redacted for brevity]
       CPU: 0 PID: 143 Comm: kworker/u8:3 Tainted: G     U       L  4.5.0-160321+ #183
       Hardware name: Intel Corporation Broadwell Client platform/WhiteTip Mountain 1
       Workqueue: i915 gen6_pm_rps_work [i915]
       task: ffff8800aae88000 ti: ffff8800aae90000 task.ti: ffff8800aae90000
       RIP: 0010:[<ffffffff8104a3c2>]  [<ffffffff8104a3c2>] __do_softirq+0x72/0x1d0
       RSP: 0000:ffff88014f403f38  EFLAGS: 00000206
       RAX: ffff8800aae94000 RBX: 0000000000000000 RCX: 00000000000006e0
       RDX: 0000000000000020 RSI: 0000000004208060 RDI: 0000000000215d80
       RBP: ffff88014f403f80 R08: 0000000b1b42c180 R09: 0000000000000022
       R10: 0000000000000004 R11: 00000000ffffffff R12: 000000000000a030
       R13: 0000000000000082 R14: ffff8800aa4d0080 R15: 0000000000000082
       FS:  0000000000000000(0000) GS:ffff88014f400000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007fa53b90c000 CR3: 0000000001a0a000 CR4: 00000000001406f0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Stack:
        042080601b33869f ffff8800aae94000 00000000fffc2678 ffff88010000000a
        0000000000000000 000000000000a030 0000000000005302 ffff8800aa4d0080
        0000000000000206 ffff88014f403f90 ffffffff8104a716 ffff88014f403fa8
       Call Trace:
        <IRQ>
        [<ffffffff8104a716>] irq_exit+0x86/0x90
        [<ffffffff81031e7d>] smp_apic_timer_interrupt+0x3d/0x50
        [<ffffffff814f3eac>] apic_timer_interrupt+0x7c/0x90
        <EOI>
        [<ffffffffa01c5b40>] ? gen8_write64+0x1a0/0x1a0 [i915]
        [<ffffffff814f2b39>] ? _raw_spin_unlock_irqrestore+0x9/0x20
        [<ffffffffa01c5c44>] gen8_write32+0x104/0x1a0 [i915]
        [<ffffffff8132c6a2>] ? n_tty_receive_buf_common+0x372/0xae0
        [<ffffffffa017cc9e>] gen6_set_rps_thresholds+0x1be/0x330 [i915]
        [<ffffffffa017eaf0>] gen6_set_rps+0x70/0x200 [i915]
        [<ffffffffa0185375>] intel_set_rps+0x25/0x30 [i915]
        [<ffffffffa01768fd>] gen6_pm_rps_work+0x10d/0x2e0 [i915]
        [<ffffffff81063852>] ? finish_task_switch+0x72/0x1c0
        [<ffffffff8105ab29>] process_one_work+0x139/0x350
        [<ffffffff8105b186>] worker_thread+0x126/0x490
        [<ffffffff8105b060>] ? rescuer_thread+0x320/0x320
        [<ffffffff8105fa64>] kthread+0xc4/0xe0
        [<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170
        [<ffffffff814f351f>] ret_from_fork+0x3f/0x70
        [<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170
      
      I could not explain, or find a code path, which would explain
      a +20 second lockup, but from some instrumentation it was
      apparent the interrupts off proportion of time was between
      10-25% under heavy load which is quite bad.
      
      When a interrupt "cliff" is reached, which was >~320k irq/s on
      my machine, the whole system goes into a terrible state of the
      above described multi-second lockups.
      
      By moving the GT interrupt handling to a tasklet in a most
      simple way, the problem above disappears completely.
      
      Testing the effect on sytem-wide latencies using
      igt/gem_syslatency shows the following before this patch:
      
      gem_syslatency: cycles=1532739, latency mean=416531.829us max=2499237us
      gem_syslatency: cycles=1839434, latency mean=1458099.157us max=4998944us
      gem_syslatency: cycles=1432570, latency mean=2688.451us max=1201185us
      gem_syslatency: cycles=1533543, latency mean=416520.499us max=2498886us
      
      This shows that the unrelated process is experiencing huge
      delays in its wake-up latency. After the patch the results
      look like this:
      
      gem_syslatency: cycles=808907, latency mean=53.133us max=1640us
      gem_syslatency: cycles=862154, latency mean=62.778us max=2117us
      gem_syslatency: cycles=856039, latency mean=58.079us max=2123us
      gem_syslatency: cycles=841683, latency mean=56.914us max=1667us
      
      Showing a huge improvement in the unrelated process wake-up
      latency. It also shows an approximate halving in the number
      of total empty batches submitted during the test. This may
      not be worrying since the test puts the driver under
      a very unrealistic load with ncpu threads doing empty batch
      submission to all GPU engines each.
      
      Another benefit compared to the hard-irq handling is that now
      work on all engines can be dispatched in parallel since we can
      have up to number of CPUs active tasklets. (While previously
      a single hard-irq would serially dispatch on one engine after
      another.)
      
      More interesting scenario with regards to throughput is
      "gem_latency -n 100" which  shows 25% better throughput and
      CPU usage, and 14% better dispatch latencies.
      
      I did not find any gains or regressions with Synmark2 or
      GLbench under light testing. More benchmarking is certainly
      required.
      
      v2:
         * execlists_lock should be taken as spin_lock_bh when
           queuing work from userspace now. (Chris Wilson)
         * uncore.lock must be taken with spin_lock_irq when
           submitting requests since that now runs from either
           softirq or process context.
      
      v3:
         * Expanded commit message with more testing data;
         * converted missed locking sites to _bh;
         * added execlist_lock comment. (Chris Wilson)
      
      v4:
         * Mention dispatch parallelism in commit. (Chris Wilson)
         * Do not hold uncore.lock over MMIO reads since the block
           is already serialised per-engine via the tasklet itself.
           (Chris Wilson)
         * intel_lrc_irq_handler should be static. (Chris Wilson)
         * Cancel/sync the tasklet on GPU reset. (Chris Wilson)
         * Document and WARN that tasklet cannot be active/pending
           on engine cleanup. (Chris Wilson/Imre Deak)
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Imre Deak <imre.deak@intel.com>
      Testcase: igt/gem_exec_nop/all
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94350Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: http://patchwork.freedesktop.org/patch/msgid/1459768316-6670-1-git-send-email-tvrtko.ursulin@linux.intel.com
      27af5eea
  14. 16 3月, 2016 1 次提交
  15. 29 1月, 2016 1 次提交
  16. 18 1月, 2016 1 次提交
    • T
      drm/i915: Do not call API requiring struct_mutex where it is not available · ca82580c
      Tvrtko Ursulin 提交于
      LRC code was calling GEM API like i915_gem_obj_ggtt_offset from
      places where the struct_mutex cannot be grabbed (irq handlers).
      
      To avoid that this patch caches some interesting bits and values
      in the engine and context structures.
      
      Some usages are also removed where they are not needed like a
      few asserts which are either impossible or have been checked
      already during engine initialization.
      
      Side benefit is also that interrupt handlers and command
      submission stop evaluating invariant conditionals, like what
      Gen we are running on, on every interrupt and every command
      submitted.
      
      This patch deals with logical ring context id and descriptors
      while subsequent patches will deal with the remaining issues.
      
      v2:
       * Cache the VMA instead of the address. (Chris Wilson)
       * Incorporate Dave Gordon's good comments and function name.
      
      v3:
       * Extract ctx descriptor template to a function and group
         functions dealing with ctx descriptor & co together near
         top of the file. (Dave Gordon)
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Dave Gordon <david.s.gordon@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1452870629-13830-1-git-send-email-tvrtko.ursulin@linux.intel.com
      ca82580c
  17. 07 1月, 2016 1 次提交
  18. 05 1月, 2016 1 次提交
  19. 05 12月, 2015 1 次提交
  20. 03 12月, 2015 1 次提交
    • N
      drm/i915: Extend LRC pinning to cover GPU context writeback · 6d65ba94
      Nick Hoath 提交于
      Use the first retired request on a new context to unpin
      the old context. This ensures that the hw context remains
      bound until it has been written back to by the GPU.
      Now that the context is pinned until later in the request/context
      lifecycle, it no longer needs to be pinned from context_queue to
      retire_requests.
      This fixes an issue with GuC submission where the GPU might not
      have finished writing back the context before it is unpinned. This
      results in a GPU hang.
      
      v2: Moved the new pin to cover GuC submission (Alex Dai)
          Moved the new unpin to request_retire to fix coverage leak
      v3: Added switch to default context if freeing a still pinned
          context just in case the hw was actually still using it
      v4: Unwrapped context unpin to allow calling without a request
      v5: Only create a switch to idle context if the ring doesn't
          already have a request pending on it (Alex Dai)
          Rename unsaved to dirty to avoid double negatives (Dave Gordon)
          Changed _no_req postfix to __ prefix for consistency (Dave Gordon)
          Split out per engine cleanup from context_free as it
          was getting unwieldy
          Corrected locking (Dave Gordon)
      v6: Removed some bikeshedding (Mika Kuoppala)
          Added explanation of the GuC hang that this fixes (Daniel Vetter)
      v7: Removed extra per request pinning from ring reset code (Alex Dai)
          Added forced ring unpin/clean in error case in context free (Alex Dai)
      Signed-off-by: NNick Hoath <nicholas.hoath@intel.com>
      Issue: VIZ-4277
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Gordon <david.s.gordon@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Alex Dai <yu.dai@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: NAlex Dai <yu.dai@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      6d65ba94
  21. 18 11月, 2015 2 次提交
  22. 28 9月, 2015 1 次提交
    • M
      drm/i915: Consider HW CSB write pointer before resetting the sw read pointer · dfc53c5e
      Michel Thierry 提交于
      A previous commit resets the Context Status Buffer (CSB) read pointer in
      ring init
          commit c0a03a2e ("drm/i915: Reset CSB read pointer in ring init")
      
      This is generally correct, but this pointer is not reset after
      suspend/resume in some platforms (cht). In this case, the driver should
      read the register value instead of resetting the sw read counter to 0.
      Otherwise we process old events, leading to unwanted pre-emptions or
      something worse.
      
      But in other platforms (bdw) and also during GPU reset or power up, the
      CSBWP is reset to 0x7 (an invalid number), and in this case the read
      pointer should be set to 5 (the interrupt code will increment this
      counter one more time, and will start reading from CSB[0]).
      
      v2: When the CSB registers are reset, the read pointer needs to be set
      to 5, otherwise the first write (CSB[0]) won't be read (Mika).
      Replace magic numbers with GEN8_CSB_ENTRIES (6) and GEN8_CSB_PTR_MASK
      (0x07).
      
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: stable@vger.kernel.org # v4.0+
      Signed-off-by: NLei Shen <lei.shen@intel.com>
      Signed-off-by: NDeepak S <deepak.s@intel.com>
      Signed-off-by: NMichel Thierry <michel.thierry@intel.com>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Signed-off-by: NJani Nikula <jani.nikula@intel.com>
      dfc53c5e
  23. 23 9月, 2015 1 次提交
  24. 14 9月, 2015 1 次提交
    • N
      drm/i915: Split alloc from init for lrc · e84fe803
      Nick Hoath 提交于
      Extend init/init_hw split to context init.
         - Move context initialisation in to i915_gem_init_hw
         - Move one off initialisation for render ring to
              i915_gem_validate_context
         - Move default context initialisation to logical_ring_init
      
      Rename intel_lr_context_deferred_create to
      intel_lr_context_deferred_alloc, to reflect reduced functionality &
      alloc/init split.
      
      This patch is intended to split out the allocation of resources &
      initialisation to allow easier reuse of code for resume/gpu reset.
      
      v2: Removed function ptr wrapping of do_switch_context (Daniel Vetter)
          Left ->init_context int intel_lr_context_deferred_alloc
          (Daniel Vetter)
          Remove unnecessary init flag & ring type test. (Daniel Vetter)
          Improve commit message (Daniel Vetter)
      v3: On init/reinit, set the hw next sequence number to the sw next
          sequence number. This is set to 1 at driver load time. This prevents
          the seqno being reset on reinit (Chris Wilson)
      v4: Set seqno back to ~0 - 0x1000 at start-of-day, and increment by 0x100
          on reset.
          This makes it obvious which bbs are which after a reset. (David Gordon
          & John Harrison)
          Rebase.
      v5: Rebase. Fixed rebase breakage. Put context pinning in separate
          function. Removed code churn. (Thomas Daniel)
      v6: Cleanup up issues introduced in v2 & v5 (Thomas Daniel)
      
      Issue: VIZ-4798
      Signed-off-by: NNick Hoath <nicholas.hoath@intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: John Harrison <john.c.harrison@intel.com>
      Cc: David Gordon <david.s.gordon@intel.com>
      Cc: Thomas Daniel <thomas.daniel@intel.com>
      Reviewed-by: NThomas Daniel <thomas.daniel@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      e84fe803
  25. 15 8月, 2015 2 次提交
    • A
      drm/i915: Integrate GuC-based command submission · d1675198
      Alex Dai 提交于
      GuC-based submission is mostly the same as execlist mode, up to
      intel_logical_ring_advance_and_submit(), where the context being
      dispatched would be added to the execlist queue; at this point
      we submit the context to the GuC backend instead.
      
      There are, however, a few other changes also required, notably:
      1.  Contexts must be pinned at GGTT addresses accessible by the GuC
          i.e. NOT in the range [0..WOPCM_SIZE), so we have to add the
          PIN_OFFSET_BIAS flag to the relevant GGTT-pinning calls.
      
      2.  The GuC's TLB must be invalidated after a context is pinned at
          a new GGTT address.
      
      3.  GuC firmware uses the one page before Ring Context as shared data.
          Therefore, whenever driver wants to get base address of LRC, we
          will offset one page for it. LRC_PPHWSP_PN is defined as the page
          number of LRCA.
      
      4.  In the work queue used to pass requests to the GuC, the GuC
          firmware requires the ring-tail-offset to be represented as an
          11-bit value, expressed in QWords. Therefore, the ringbuffer
          size must be reduced to the representable range (4 pages).
      
      v2:
          Defer adding #defines until needed [Chris Wilson]
          Rationalise type declarations [Chris Wilson]
      
      v4:
          Squashed kerneldoc patch into here [Daniel Vetter]
      
      v5:
          Update request->tail in code common to both GuC and execlist modes.
          Add a private version of lr_context_update(), as sharing the
              execlist version leads to race conditions when the CPU and
              the GuC both update TAIL in the context image.
          Conversion of error-captured HWS page to string must account
              for offset from start of object to actual HWS (LRC_PPHWSP_PN).
      
      Issue: VIZ-4884
      Signed-off-by: NAlex Dai <yu.dai@intel.com>
      Signed-off-by: NDave Gordon <david.s.gordon@intel.com>
      Reviewed-by: NTom O'Rourke <Tom.O'Rourke@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d1675198
    • D
      drm/i915: Expose one LRC function for GuC submission mode · 919f1f55
      Dave Gordon 提交于
      GuC submission is basically execlist submission, but with the GuC
      handling the actual writes to the ELSP and the resulting context
      switch interrupts.  So to describe a context for submission via
      the GuC, we need one of the same functions used in execlist mode.
      This commit exposes one such function, changing its name to better
      describe what it does (it's related to logical ring contexts rather
      than to execlists per se).
      
      v2:
          Replaces previous "drm/i915: Move execlists defines from .c to .h"
      
      v3:
          Incorporates a change to one of the functions exposed here that was
              previously part of an internal patch, but which was omitted from
              the version recently committed to drm-intel-nightly:
      	    7a01a0a2 drm/i915/lrc: Update PDPx registers with lri commands
              So we reinstate this change here.
      
      v4:
          Drop v3 change, update function parameters due to collision with
              8ee36152 drm/i915: Convert execlists_ctx_descriptor() for requests
      
      v5:
          Don't expose execlists_update_context() after all. The current
              version is no longer compatible with GuC submission; trying to
              share the execlist version of this function results in both GuC
              and CPU updating TAIL in the context image, with bad results when
              they get out of step. The GuC submission path now has its own
              private version that just updates the ringbuffer start address,
              and not TAIL or PDPx.
      
      v6:
          Rebased
      
      Issue: VIZ-4884
      Signed-off-by: NDave Gordon <david.s.gordon@intel.com>
      Reviewed-by: NTom O'Rourke <Tom.O'Rourke@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      919f1f55
  26. 14 7月, 2015 1 次提交
    • P
      drm/i915: Added Programming of the MOCS · 3bbaba0c
      Peter Antoine 提交于
      This change adds the programming of the MOCS registers to the gen 9+
      platforms. The set of MOCS configuration entries introduced by this
      patch is intended to be minimal but sufficient to cover the needs of
      current userspace - i.e. a good set of defaults. It is expected to be
      extended in the future to provide further default values or to allow
      userspace to redefine its private MOCS tables based on its demand for
      additional caching configurations. In this setup, userspace should
      only utilize the first N entries, higher entries are reserved for
      future use.
      
      It creates a fixed register set that is programmed across the different
      engines so that all engines have the same table. This is done as the
      main RCS context only holds the registers for itself and the shared
      L3 values. By trying to keep the registers consistent across the
      different engines it should make the programming for the registers
      consistent.
      
      v2:
      -'static const' for private data structures and style changes.(Matt Turner)
      v3:
      - Make the tables "slightly" more readable. (Damien Lespiau)
      - Updated tables fix performance regression.
      v4:
      - Code formatting. (Chris Wilson)
      - re-privatised mocs code. (Daniel Vetter)
      v5:
      - Changed the name of a function. (Chris Wilson)
      v6:
      - re-based
      - Added Mesa table entry (skylake & broxton) (Francisco Jerez)
      - Tidied up the readability defines (Francisco Jerez)
      - NUMBER of entries defines wrong. (Jim Bish)
      - Added comments to clear up the meaning of the tables (Jim Bish)
      Signed-off-by: NPeter Antoine <peter.antoine@intel.com>
      
      v7 (Francisco Jerez):
      - Don't write L3-specific MOCS_ESC/SCC values into the e/LLC control
        tables.  Prefix L3-specific defines consistently with L3_ and
        e/LLC-specific defines with LE_ to avoid this kind of confusion in
        the future.
      - Change L3CC WT define back to RESERVED (matches my hardware
        documentation and the original patch, probably a misunderstanding
        of my own previous comment).
      - Drop Android tables, define new minimal tables more suitable for the
        open source stack.
      - Add comment that the MOCS tables are part of the kernel ABI.
      - Move intel_logical_ring_begin() and _advance() calls one level down
        (Chris Wilson).
      - Minor formatting and style fixes.
      v8 (Francisco Jerez):
      - Add table size sanity check to emit_mocs_control/l3cc_table() (Chris
        Wilson).
      - Add comment about undefined entries being implicitly set to uncached
        for forwards compatibility.
      v9 (Francisco Jerez):
      - Minor style fixes.
      Signed-off-by: NFrancisco Jerez <currojerez@riseup.net>
      Acked-by: NDamien Lespiau <damien.lespiau@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      3bbaba0c
  27. 06 7月, 2015 2 次提交
  28. 23 6月, 2015 2 次提交