1. 28 7月, 2016 1 次提交
    • L
      Add braces to avoid "ambiguous ‘else’" compiler warnings · 194dc870
      Linus Torvalds 提交于
      Some of our "for_each_xyz()" macro constructs make gcc unhappy about
      lack of braces around if-statements inside or outside the loop, because
      the loop construct itself has a "if-then-else" statement inside of it.
      
      The resulting warnings look something like this:
      
        drivers/gpu/drm/i915/i915_debugfs.c: In function ‘i915_dump_lrc’:
        drivers/gpu/drm/i915/i915_debugfs.c:2103:6: warning: suggest explicit braces to avoid ambiguous ‘else’ [-Wparentheses]
           if (ctx != dev_priv->kernel_context)
              ^
      
      even if the code itself is fine.
      
      Since the warning is fairly easy to avoid by adding a braces around the
      if-statement near the for_each_xyz() construct, do so, rather than
      disabling the otherwise potentially useful warning.
      
      (The if-then-else statements used in the "for_each_xyz()" constructs are
      designed to be inherently safe even with no braces, but in this case
      it's quite understandable that gcc isn't really able to tell that).
      
      This finally leaves the standard "allmodconfig" build with just a
      handful of remaining warnings, so new and valid warnings hopefully will
      stand out.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      194dc870
  2. 30 6月, 2016 1 次提交
  3. 04 5月, 2016 1 次提交
  4. 27 4月, 2016 1 次提交
  5. 24 4月, 2016 1 次提交
  6. 22 4月, 2016 1 次提交
  7. 15 4月, 2016 3 次提交
  8. 14 4月, 2016 3 次提交
  9. 12 4月, 2016 1 次提交
    • T
      drm/i915: Simplify for_each_fw_domain iterators · 33c582c1
      Tvrtko Ursulin 提交于
      As the vast majority of users do not use the domain id variable,
      we can eliminate it from the iterator and also change the latter
      using the same principle as was recently done for for_each_engine.
      
      For a couple of callers which do need the domain mask, store it
      in the domain array (which already has the domain id), then both
      can be retrieved thence.
      
      Result is clearer code and smaller generated binary, especially
      in the tight fw get/put loops. Also, relationship between domain
      id and mask is no longer assumed in the macro.
      
      v2: Improve grammar in the commit message and rename the
          iterator to for_each_fw_domain_masked for consistency.
          (Dave Gordon)
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NDave Gordon <david.s.gordon@intel.com>
      33c582c1
  10. 09 4月, 2016 2 次提交
  11. 08 4月, 2016 1 次提交
  12. 07 4月, 2016 1 次提交
  13. 04 4月, 2016 1 次提交
    • T
      drm/i915: Move execlists irq handler to a bottom half · 27af5eea
      Tvrtko Ursulin 提交于
      Doing a lot of work in the interrupt handler introduces huge
      latencies to the system as a whole.
      
      Most dramatic effect can be seen by running an all engine
      stress test like igt/gem_exec_nop/all where, when the kernel
      config is lean enough, the whole system can be brought into
      multi-second periods of complete non-interactivty. That can
      look for example like this:
      
       NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u8:3:143]
       Modules linked in: [redacted for brevity]
       CPU: 0 PID: 143 Comm: kworker/u8:3 Tainted: G     U       L  4.5.0-160321+ #183
       Hardware name: Intel Corporation Broadwell Client platform/WhiteTip Mountain 1
       Workqueue: i915 gen6_pm_rps_work [i915]
       task: ffff8800aae88000 ti: ffff8800aae90000 task.ti: ffff8800aae90000
       RIP: 0010:[<ffffffff8104a3c2>]  [<ffffffff8104a3c2>] __do_softirq+0x72/0x1d0
       RSP: 0000:ffff88014f403f38  EFLAGS: 00000206
       RAX: ffff8800aae94000 RBX: 0000000000000000 RCX: 00000000000006e0
       RDX: 0000000000000020 RSI: 0000000004208060 RDI: 0000000000215d80
       RBP: ffff88014f403f80 R08: 0000000b1b42c180 R09: 0000000000000022
       R10: 0000000000000004 R11: 00000000ffffffff R12: 000000000000a030
       R13: 0000000000000082 R14: ffff8800aa4d0080 R15: 0000000000000082
       FS:  0000000000000000(0000) GS:ffff88014f400000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007fa53b90c000 CR3: 0000000001a0a000 CR4: 00000000001406f0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Stack:
        042080601b33869f ffff8800aae94000 00000000fffc2678 ffff88010000000a
        0000000000000000 000000000000a030 0000000000005302 ffff8800aa4d0080
        0000000000000206 ffff88014f403f90 ffffffff8104a716 ffff88014f403fa8
       Call Trace:
        <IRQ>
        [<ffffffff8104a716>] irq_exit+0x86/0x90
        [<ffffffff81031e7d>] smp_apic_timer_interrupt+0x3d/0x50
        [<ffffffff814f3eac>] apic_timer_interrupt+0x7c/0x90
        <EOI>
        [<ffffffffa01c5b40>] ? gen8_write64+0x1a0/0x1a0 [i915]
        [<ffffffff814f2b39>] ? _raw_spin_unlock_irqrestore+0x9/0x20
        [<ffffffffa01c5c44>] gen8_write32+0x104/0x1a0 [i915]
        [<ffffffff8132c6a2>] ? n_tty_receive_buf_common+0x372/0xae0
        [<ffffffffa017cc9e>] gen6_set_rps_thresholds+0x1be/0x330 [i915]
        [<ffffffffa017eaf0>] gen6_set_rps+0x70/0x200 [i915]
        [<ffffffffa0185375>] intel_set_rps+0x25/0x30 [i915]
        [<ffffffffa01768fd>] gen6_pm_rps_work+0x10d/0x2e0 [i915]
        [<ffffffff81063852>] ? finish_task_switch+0x72/0x1c0
        [<ffffffff8105ab29>] process_one_work+0x139/0x350
        [<ffffffff8105b186>] worker_thread+0x126/0x490
        [<ffffffff8105b060>] ? rescuer_thread+0x320/0x320
        [<ffffffff8105fa64>] kthread+0xc4/0xe0
        [<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170
        [<ffffffff814f351f>] ret_from_fork+0x3f/0x70
        [<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170
      
      I could not explain, or find a code path, which would explain
      a +20 second lockup, but from some instrumentation it was
      apparent the interrupts off proportion of time was between
      10-25% under heavy load which is quite bad.
      
      When a interrupt "cliff" is reached, which was >~320k irq/s on
      my machine, the whole system goes into a terrible state of the
      above described multi-second lockups.
      
      By moving the GT interrupt handling to a tasklet in a most
      simple way, the problem above disappears completely.
      
      Testing the effect on sytem-wide latencies using
      igt/gem_syslatency shows the following before this patch:
      
      gem_syslatency: cycles=1532739, latency mean=416531.829us max=2499237us
      gem_syslatency: cycles=1839434, latency mean=1458099.157us max=4998944us
      gem_syslatency: cycles=1432570, latency mean=2688.451us max=1201185us
      gem_syslatency: cycles=1533543, latency mean=416520.499us max=2498886us
      
      This shows that the unrelated process is experiencing huge
      delays in its wake-up latency. After the patch the results
      look like this:
      
      gem_syslatency: cycles=808907, latency mean=53.133us max=1640us
      gem_syslatency: cycles=862154, latency mean=62.778us max=2117us
      gem_syslatency: cycles=856039, latency mean=58.079us max=2123us
      gem_syslatency: cycles=841683, latency mean=56.914us max=1667us
      
      Showing a huge improvement in the unrelated process wake-up
      latency. It also shows an approximate halving in the number
      of total empty batches submitted during the test. This may
      not be worrying since the test puts the driver under
      a very unrealistic load with ncpu threads doing empty batch
      submission to all GPU engines each.
      
      Another benefit compared to the hard-irq handling is that now
      work on all engines can be dispatched in parallel since we can
      have up to number of CPUs active tasklets. (While previously
      a single hard-irq would serially dispatch on one engine after
      another.)
      
      More interesting scenario with regards to throughput is
      "gem_latency -n 100" which  shows 25% better throughput and
      CPU usage, and 14% better dispatch latencies.
      
      I did not find any gains or regressions with Synmark2 or
      GLbench under light testing. More benchmarking is certainly
      required.
      
      v2:
         * execlists_lock should be taken as spin_lock_bh when
           queuing work from userspace now. (Chris Wilson)
         * uncore.lock must be taken with spin_lock_irq when
           submitting requests since that now runs from either
           softirq or process context.
      
      v3:
         * Expanded commit message with more testing data;
         * converted missed locking sites to _bh;
         * added execlist_lock comment. (Chris Wilson)
      
      v4:
         * Mention dispatch parallelism in commit. (Chris Wilson)
         * Do not hold uncore.lock over MMIO reads since the block
           is already serialised per-engine via the tasklet itself.
           (Chris Wilson)
         * intel_lrc_irq_handler should be static. (Chris Wilson)
         * Cancel/sync the tasklet on GPU reset. (Chris Wilson)
         * Document and WARN that tasklet cannot be active/pending
           on engine cleanup. (Chris Wilson/Imre Deak)
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Imre Deak <imre.deak@intel.com>
      Testcase: igt/gem_exec_nop/all
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94350Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: http://patchwork.freedesktop.org/patch/msgid/1459768316-6670-1-git-send-email-tvrtko.ursulin@linux.intel.com
      27af5eea
  14. 03 4月, 2016 2 次提交
  15. 31 3月, 2016 1 次提交
    • J
      drm/i915: Refer to GGTT {,VM} consistently · 72e96d64
      Joonas Lahtinen 提交于
      Refer to the GGTT VM consistently as "ggtt->base" instead of just "ggtt",
      "vm" or indirectly through other variables like "dev_priv->ggtt.base"
      to avoid confusion with the i915_ggtt object itself and PPGTT VMs.
      
      Refer to the GGTT as "ggtt" instead of indirectly through chaining.
      
      As a bonus gets rid of the long-standing i915_obj_to_ggtt vs.
      i915_gem_obj_to_ggtt conflict, due to removal of i915_obj_to_ggtt!
      
      v2:
      - Added some more after grepping sources with Chris
      
      v3:
      - Refer to GGTT VM through ggtt->base consistently instead of ggtt_vm
        (Chris)
      
      v4:
      - Convert all dev_priv->ggtt->foo accesses to ggtt->foo.
      
      v5:
      - Make patch checker happy
      
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      72e96d64
  16. 24 3月, 2016 2 次提交
  17. 18 3月, 2016 1 次提交
  18. 17 3月, 2016 1 次提交
  19. 16 3月, 2016 5 次提交
  20. 04 3月, 2016 1 次提交
  21. 26 2月, 2016 2 次提交
  22. 22 2月, 2016 1 次提交
  23. 17 2月, 2016 1 次提交
  24. 02 2月, 2016 1 次提交
    • R
      drm/i915: Add PSR main link standby support back · 60e5ffe3
      Rodrigo Vivi 提交于
      Link standby support has been deprecated with 'commit 89251b17
      ("drm/i915: PSR: deprecate link_standby support for core platforms.")'
      
      The reason for that is that main link in full off offers more power
      savings and on HSW and BDW implementations on source side had known
      bugs with link standby.
      
      However that same HSD report only mentions BDW and HSW and tells that
      a fix was going to new platforms. Since on Skylake link standby
      didn't cause the bad blank flickering screens seen on HSW and BDW
      let's respect VBT again for this and future platforms.
      
      Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
      Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
      Reviewed-by: NPaulo Zanoni <paulo.r.zanoni@intel.com>
      60e5ffe3
  25. 25 1月, 2016 2 次提交
    • A
      drm/i915/gen9: Add framework to whitelist specific GPU registers · 33136b06
      Arun Siluvery 提交于
      Some of the HW registers are privileged and cannot be written to from
      non-privileged batch buffers coming from userspace unless they are added to
      the HW whitelist. This whitelist is maintained by HW and it is different from
      SW whitelist. Userspace need write access to them to implement preemption
      related WA.
      
      The reason for using this approach is, the register bits that control
      preemption granularity at the HW level are not context save/restored; so even
      if we set these bits always in kernel they are going to change once the
      context is switched out.  We can consider making them non-privileged by
      default but these registers also contain other chicken bits which should not
      be allowed to be modified.
      
      In the later revisions controlling bits are save/restored at context level but
      in the existing revisions these are exported via other debug registers and
      should be on the whitelist. This patch adds changes to provide HW with a list
      of registers to be whitelisted. HW checks this list during execution and
      provides access accordingly.
      
      HW imposes a limit on the number of registers on whitelist and it is
      per-engine.  At this point we are only enabling whitelist for RCS and we don't
      foresee any requirement for other engines.
      
      The registers to be whitelisted are added using generic workaround list
      mechanism, even these are only enablers for userspace workarounds. But by
      sharing this mechanism we get some test assets without additional cost (Mika).
      
      v2: rebase
      
      v3: parameterize RING_FORCE_TO_NONPRIV() as _MMIO() should be limited to
      i915_reg.h (Ville), drop inline for wa_ring_whitelist_reg (Mika).
      
      v4: improvements suggested by Chris Wilson.
      Clarify that this is HW whitelist and different from the one maintained in
      driver. This list is engine specific but it gets initialized along with other
      WA which is RCS specific thing, so make it clear that we are not doing any
      cross engine setup during initialization.
      Make HW whitelist count of each engine available in debugfs.
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NArun Siluvery <arun.siluvery@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1453412634-29238-2-git-send-email-arun.siluvery@linux.intel.comSigned-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      33136b06
    • A
      drm/i915/guc: Decouple GuC engine id from ring id · 397097b0
      Alex Dai 提交于
      Previously GuC uses ring id as engine id because of same definition.
      But this is not true since this commit:
      
      commit de1add36
      Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Date:   Fri Jan 15 15:12:50 2016 +0000
      
          drm/i915: Decouple execbuf uAPI from internal implementation
      
      Added GuC engine id into GuC interface to decouple it from ring id used
      by driver.
      
      v2: Keep ring name print out in debugfs; using for_each_ring() where
          possible to keep driver consistent. (Chris W.)
      Signed-off-by: NAlex Dai <yu.dai@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1453579094-29860-1-git-send-email-yu.dai@intel.com
      397097b0
  26. 21 1月, 2016 2 次提交