1. 14 4月, 2016 3 次提交
  2. 12 4月, 2016 1 次提交
    • T
      drm/i915: Simplify for_each_fw_domain iterators · 33c582c1
      Tvrtko Ursulin 提交于
      As the vast majority of users do not use the domain id variable,
      we can eliminate it from the iterator and also change the latter
      using the same principle as was recently done for for_each_engine.
      
      For a couple of callers which do need the domain mask, store it
      in the domain array (which already has the domain id), then both
      can be retrieved thence.
      
      Result is clearer code and smaller generated binary, especially
      in the tight fw get/put loops. Also, relationship between domain
      id and mask is no longer assumed in the macro.
      
      v2: Improve grammar in the commit message and rename the
          iterator to for_each_fw_domain_masked for consistency.
          (Dave Gordon)
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NDave Gordon <david.s.gordon@intel.com>
      33c582c1
  3. 09 4月, 2016 2 次提交
  4. 08 4月, 2016 1 次提交
  5. 07 4月, 2016 1 次提交
  6. 04 4月, 2016 1 次提交
    • T
      drm/i915: Move execlists irq handler to a bottom half · 27af5eea
      Tvrtko Ursulin 提交于
      Doing a lot of work in the interrupt handler introduces huge
      latencies to the system as a whole.
      
      Most dramatic effect can be seen by running an all engine
      stress test like igt/gem_exec_nop/all where, when the kernel
      config is lean enough, the whole system can be brought into
      multi-second periods of complete non-interactivty. That can
      look for example like this:
      
       NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u8:3:143]
       Modules linked in: [redacted for brevity]
       CPU: 0 PID: 143 Comm: kworker/u8:3 Tainted: G     U       L  4.5.0-160321+ #183
       Hardware name: Intel Corporation Broadwell Client platform/WhiteTip Mountain 1
       Workqueue: i915 gen6_pm_rps_work [i915]
       task: ffff8800aae88000 ti: ffff8800aae90000 task.ti: ffff8800aae90000
       RIP: 0010:[<ffffffff8104a3c2>]  [<ffffffff8104a3c2>] __do_softirq+0x72/0x1d0
       RSP: 0000:ffff88014f403f38  EFLAGS: 00000206
       RAX: ffff8800aae94000 RBX: 0000000000000000 RCX: 00000000000006e0
       RDX: 0000000000000020 RSI: 0000000004208060 RDI: 0000000000215d80
       RBP: ffff88014f403f80 R08: 0000000b1b42c180 R09: 0000000000000022
       R10: 0000000000000004 R11: 00000000ffffffff R12: 000000000000a030
       R13: 0000000000000082 R14: ffff8800aa4d0080 R15: 0000000000000082
       FS:  0000000000000000(0000) GS:ffff88014f400000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007fa53b90c000 CR3: 0000000001a0a000 CR4: 00000000001406f0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Stack:
        042080601b33869f ffff8800aae94000 00000000fffc2678 ffff88010000000a
        0000000000000000 000000000000a030 0000000000005302 ffff8800aa4d0080
        0000000000000206 ffff88014f403f90 ffffffff8104a716 ffff88014f403fa8
       Call Trace:
        <IRQ>
        [<ffffffff8104a716>] irq_exit+0x86/0x90
        [<ffffffff81031e7d>] smp_apic_timer_interrupt+0x3d/0x50
        [<ffffffff814f3eac>] apic_timer_interrupt+0x7c/0x90
        <EOI>
        [<ffffffffa01c5b40>] ? gen8_write64+0x1a0/0x1a0 [i915]
        [<ffffffff814f2b39>] ? _raw_spin_unlock_irqrestore+0x9/0x20
        [<ffffffffa01c5c44>] gen8_write32+0x104/0x1a0 [i915]
        [<ffffffff8132c6a2>] ? n_tty_receive_buf_common+0x372/0xae0
        [<ffffffffa017cc9e>] gen6_set_rps_thresholds+0x1be/0x330 [i915]
        [<ffffffffa017eaf0>] gen6_set_rps+0x70/0x200 [i915]
        [<ffffffffa0185375>] intel_set_rps+0x25/0x30 [i915]
        [<ffffffffa01768fd>] gen6_pm_rps_work+0x10d/0x2e0 [i915]
        [<ffffffff81063852>] ? finish_task_switch+0x72/0x1c0
        [<ffffffff8105ab29>] process_one_work+0x139/0x350
        [<ffffffff8105b186>] worker_thread+0x126/0x490
        [<ffffffff8105b060>] ? rescuer_thread+0x320/0x320
        [<ffffffff8105fa64>] kthread+0xc4/0xe0
        [<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170
        [<ffffffff814f351f>] ret_from_fork+0x3f/0x70
        [<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170
      
      I could not explain, or find a code path, which would explain
      a +20 second lockup, but from some instrumentation it was
      apparent the interrupts off proportion of time was between
      10-25% under heavy load which is quite bad.
      
      When a interrupt "cliff" is reached, which was >~320k irq/s on
      my machine, the whole system goes into a terrible state of the
      above described multi-second lockups.
      
      By moving the GT interrupt handling to a tasklet in a most
      simple way, the problem above disappears completely.
      
      Testing the effect on sytem-wide latencies using
      igt/gem_syslatency shows the following before this patch:
      
      gem_syslatency: cycles=1532739, latency mean=416531.829us max=2499237us
      gem_syslatency: cycles=1839434, latency mean=1458099.157us max=4998944us
      gem_syslatency: cycles=1432570, latency mean=2688.451us max=1201185us
      gem_syslatency: cycles=1533543, latency mean=416520.499us max=2498886us
      
      This shows that the unrelated process is experiencing huge
      delays in its wake-up latency. After the patch the results
      look like this:
      
      gem_syslatency: cycles=808907, latency mean=53.133us max=1640us
      gem_syslatency: cycles=862154, latency mean=62.778us max=2117us
      gem_syslatency: cycles=856039, latency mean=58.079us max=2123us
      gem_syslatency: cycles=841683, latency mean=56.914us max=1667us
      
      Showing a huge improvement in the unrelated process wake-up
      latency. It also shows an approximate halving in the number
      of total empty batches submitted during the test. This may
      not be worrying since the test puts the driver under
      a very unrealistic load with ncpu threads doing empty batch
      submission to all GPU engines each.
      
      Another benefit compared to the hard-irq handling is that now
      work on all engines can be dispatched in parallel since we can
      have up to number of CPUs active tasklets. (While previously
      a single hard-irq would serially dispatch on one engine after
      another.)
      
      More interesting scenario with regards to throughput is
      "gem_latency -n 100" which  shows 25% better throughput and
      CPU usage, and 14% better dispatch latencies.
      
      I did not find any gains or regressions with Synmark2 or
      GLbench under light testing. More benchmarking is certainly
      required.
      
      v2:
         * execlists_lock should be taken as spin_lock_bh when
           queuing work from userspace now. (Chris Wilson)
         * uncore.lock must be taken with spin_lock_irq when
           submitting requests since that now runs from either
           softirq or process context.
      
      v3:
         * Expanded commit message with more testing data;
         * converted missed locking sites to _bh;
         * added execlist_lock comment. (Chris Wilson)
      
      v4:
         * Mention dispatch parallelism in commit. (Chris Wilson)
         * Do not hold uncore.lock over MMIO reads since the block
           is already serialised per-engine via the tasklet itself.
           (Chris Wilson)
         * intel_lrc_irq_handler should be static. (Chris Wilson)
         * Cancel/sync the tasklet on GPU reset. (Chris Wilson)
         * Document and WARN that tasklet cannot be active/pending
           on engine cleanup. (Chris Wilson/Imre Deak)
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Imre Deak <imre.deak@intel.com>
      Testcase: igt/gem_exec_nop/all
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94350Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: http://patchwork.freedesktop.org/patch/msgid/1459768316-6670-1-git-send-email-tvrtko.ursulin@linux.intel.com
      27af5eea
  7. 03 4月, 2016 2 次提交
  8. 31 3月, 2016 1 次提交
    • J
      drm/i915: Refer to GGTT {,VM} consistently · 72e96d64
      Joonas Lahtinen 提交于
      Refer to the GGTT VM consistently as "ggtt->base" instead of just "ggtt",
      "vm" or indirectly through other variables like "dev_priv->ggtt.base"
      to avoid confusion with the i915_ggtt object itself and PPGTT VMs.
      
      Refer to the GGTT as "ggtt" instead of indirectly through chaining.
      
      As a bonus gets rid of the long-standing i915_obj_to_ggtt vs.
      i915_gem_obj_to_ggtt conflict, due to removal of i915_obj_to_ggtt!
      
      v2:
      - Added some more after grepping sources with Chris
      
      v3:
      - Refer to GGTT VM through ggtt->base consistently instead of ggtt_vm
        (Chris)
      
      v4:
      - Convert all dev_priv->ggtt->foo accesses to ggtt->foo.
      
      v5:
      - Make patch checker happy
      
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      72e96d64
  9. 24 3月, 2016 2 次提交
  10. 18 3月, 2016 1 次提交
  11. 17 3月, 2016 1 次提交
  12. 16 3月, 2016 5 次提交
  13. 04 3月, 2016 1 次提交
  14. 26 2月, 2016 2 次提交
  15. 22 2月, 2016 1 次提交
  16. 17 2月, 2016 1 次提交
  17. 02 2月, 2016 1 次提交
    • R
      drm/i915: Add PSR main link standby support back · 60e5ffe3
      Rodrigo Vivi 提交于
      Link standby support has been deprecated with 'commit 89251b17
      ("drm/i915: PSR: deprecate link_standby support for core platforms.")'
      
      The reason for that is that main link in full off offers more power
      savings and on HSW and BDW implementations on source side had known
      bugs with link standby.
      
      However that same HSD report only mentions BDW and HSW and tells that
      a fix was going to new platforms. Since on Skylake link standby
      didn't cause the bad blank flickering screens seen on HSW and BDW
      let's respect VBT again for this and future platforms.
      
      Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
      Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
      Reviewed-by: NPaulo Zanoni <paulo.r.zanoni@intel.com>
      60e5ffe3
  18. 25 1月, 2016 2 次提交
    • A
      drm/i915/gen9: Add framework to whitelist specific GPU registers · 33136b06
      Arun Siluvery 提交于
      Some of the HW registers are privileged and cannot be written to from
      non-privileged batch buffers coming from userspace unless they are added to
      the HW whitelist. This whitelist is maintained by HW and it is different from
      SW whitelist. Userspace need write access to them to implement preemption
      related WA.
      
      The reason for using this approach is, the register bits that control
      preemption granularity at the HW level are not context save/restored; so even
      if we set these bits always in kernel they are going to change once the
      context is switched out.  We can consider making them non-privileged by
      default but these registers also contain other chicken bits which should not
      be allowed to be modified.
      
      In the later revisions controlling bits are save/restored at context level but
      in the existing revisions these are exported via other debug registers and
      should be on the whitelist. This patch adds changes to provide HW with a list
      of registers to be whitelisted. HW checks this list during execution and
      provides access accordingly.
      
      HW imposes a limit on the number of registers on whitelist and it is
      per-engine.  At this point we are only enabling whitelist for RCS and we don't
      foresee any requirement for other engines.
      
      The registers to be whitelisted are added using generic workaround list
      mechanism, even these are only enablers for userspace workarounds. But by
      sharing this mechanism we get some test assets without additional cost (Mika).
      
      v2: rebase
      
      v3: parameterize RING_FORCE_TO_NONPRIV() as _MMIO() should be limited to
      i915_reg.h (Ville), drop inline for wa_ring_whitelist_reg (Mika).
      
      v4: improvements suggested by Chris Wilson.
      Clarify that this is HW whitelist and different from the one maintained in
      driver. This list is engine specific but it gets initialized along with other
      WA which is RCS specific thing, so make it clear that we are not doing any
      cross engine setup during initialization.
      Make HW whitelist count of each engine available in debugfs.
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NArun Siluvery <arun.siluvery@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1453412634-29238-2-git-send-email-arun.siluvery@linux.intel.comSigned-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      33136b06
    • A
      drm/i915/guc: Decouple GuC engine id from ring id · 397097b0
      Alex Dai 提交于
      Previously GuC uses ring id as engine id because of same definition.
      But this is not true since this commit:
      
      commit de1add36
      Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Date:   Fri Jan 15 15:12:50 2016 +0000
      
          drm/i915: Decouple execbuf uAPI from internal implementation
      
      Added GuC engine id into GuC interface to decouple it from ring id used
      by driver.
      
      v2: Keep ring name print out in debugfs; using for_each_ring() where
          possible to keep driver consistent. (Chris W.)
      Signed-off-by: NAlex Dai <yu.dai@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1453579094-29860-1-git-send-email-yu.dai@intel.com
      397097b0
  19. 21 1月, 2016 2 次提交
  20. 18 1月, 2016 1 次提交
    • T
      drm/i915: Do not call API requiring struct_mutex where it is not available · ca82580c
      Tvrtko Ursulin 提交于
      LRC code was calling GEM API like i915_gem_obj_ggtt_offset from
      places where the struct_mutex cannot be grabbed (irq handlers).
      
      To avoid that this patch caches some interesting bits and values
      in the engine and context structures.
      
      Some usages are also removed where they are not needed like a
      few asserts which are either impossible or have been checked
      already during engine initialization.
      
      Side benefit is also that interrupt handlers and command
      submission stop evaluating invariant conditionals, like what
      Gen we are running on, on every interrupt and every command
      submitted.
      
      This patch deals with logical ring context id and descriptors
      while subsequent patches will deal with the remaining issues.
      
      v2:
       * Cache the VMA instead of the address. (Chris Wilson)
       * Incorporate Dave Gordon's good comments and function name.
      
      v3:
       * Extract ctx descriptor template to a function and group
         functions dealing with ctx descriptor & co together near
         top of the file. (Dave Gordon)
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Dave Gordon <david.s.gordon@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1452870629-13830-1-git-send-email-tvrtko.ursulin@linux.intel.com
      ca82580c
  21. 08 1月, 2016 1 次提交
    • M
      drm/i915: Inspect subunit states on hangcheck · 61642ff0
      Mika Kuoppala 提交于
      If head seems stuck and engine in question is rcs,
      inspect subunit state transitions from undone to done,
      before deciding that this really is a hang instead of limited
      progress. Only account the transitions of subunits from
      undone to done once, to prevent unstable subunit states
      to keep us falsely active.
      
      As this adds one extra steps to hangcheck heuristics,
      before hang is declared, it adds 1500ms to to detect hang
      for render ring to a total of 7500ms. We could sample
      the subunit states on first head stuck condition but
      decide not to do so only in order to mimic old behaviour. This
      way the check order of promotion from seqno > atchd > instdone
      is consistently done.
      
      v2: Deal with unstable done states (Arun)
          Clear instdone progress on head and seqno movement (Chris)
          Report raw and accumulated instdone's in in debugfs (Chris)
          Return HANGCHECK_ACTIVE on undone->done
      
      References: https://bugs.freedesktop.org/show_bug.cgi?id=93029
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Dave Gordon <david.s.gordon@intel.com>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1448985372-19535-1-git-send-email-mika.kuoppala@intel.com
      61642ff0
  22. 07 1月, 2016 1 次提交
  23. 16 12月, 2015 2 次提交
  24. 10 12月, 2015 2 次提交
  25. 03 12月, 2015 2 次提交
    • A
      drm/i915/guc: Clean up locks in GuC · 5a843307
      Alex Dai 提交于
      For now, remove the spinlocks that protected the GuC's
      statistics block and work queue; they are only accessed
      by code that already holds the global struct_mutex, and
      so are redundant (until the big struct_mutex rewrite!).
      
      The specific problem that the spinlocks caused was that
      if the work queue was full, the driver would try to
      spinwait for one jiffy, but with interrupts disabled the
      jiffy count would not advance, leading to a system hang.
      The issue was found using test case igt/gem_close_race.
      
      The new version will usleep() instead, still holding
      the struct_mutex but without any spinlocks.
      
      v4: Reorganize commit message (Dave Gordon)
      v3: Remove unnecessary whitespace churn
      v2: Clean up wq_lock too
      v1: Clean up host2guc lock as well
      Signed-off-by: NAlex Dai <yu.dai@intel.com>
      Reviewed-by: NDave Gordon <david.s.gordon@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1449104189-27591-1-git-send-email-yu.dai@intel.comSigned-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      5a843307
    • P
      drm/i915: introduce is_active/activate/deactivate to the FBC terminology · 0e631adc
      Paulo Zanoni 提交于
      The long term goal is to have enable/disable as the higher level
      functions and activate/deactivate as the lower level functions, just
      like we do for PSR and for the CRTC. This way, we'll run enable and
      disable once per modeset, while update, activate and deactivate will
      be run many times. With this, we can move the checks and code that
      need to run only once per modeset to enable(), making the code simpler
      and possibly a little faster.
      
      This patch is just the first step on the conversion: it starts by
      converting the current low level functions from enable/disable to
      activate/deactivate. This patch by itself has no benefits other than
      making review and rebase easier. Please see the next patches for more
      details on the conversion.
      
      v2:
        - Rebase.
        - Improve commit message (Chris).
      v3: Rebase after changing the patch order.
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NPaulo Zanoni <paulo.r.zanoni@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/
      0e631adc