1. 08 4月, 2016 1 次提交
  2. 07 4月, 2016 1 次提交
  3. 06 4月, 2016 1 次提交
  4. 04 4月, 2016 1 次提交
    • T
      drm/i915: Move execlists irq handler to a bottom half · 27af5eea
      Tvrtko Ursulin 提交于
      Doing a lot of work in the interrupt handler introduces huge
      latencies to the system as a whole.
      
      Most dramatic effect can be seen by running an all engine
      stress test like igt/gem_exec_nop/all where, when the kernel
      config is lean enough, the whole system can be brought into
      multi-second periods of complete non-interactivty. That can
      look for example like this:
      
       NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u8:3:143]
       Modules linked in: [redacted for brevity]
       CPU: 0 PID: 143 Comm: kworker/u8:3 Tainted: G     U       L  4.5.0-160321+ #183
       Hardware name: Intel Corporation Broadwell Client platform/WhiteTip Mountain 1
       Workqueue: i915 gen6_pm_rps_work [i915]
       task: ffff8800aae88000 ti: ffff8800aae90000 task.ti: ffff8800aae90000
       RIP: 0010:[<ffffffff8104a3c2>]  [<ffffffff8104a3c2>] __do_softirq+0x72/0x1d0
       RSP: 0000:ffff88014f403f38  EFLAGS: 00000206
       RAX: ffff8800aae94000 RBX: 0000000000000000 RCX: 00000000000006e0
       RDX: 0000000000000020 RSI: 0000000004208060 RDI: 0000000000215d80
       RBP: ffff88014f403f80 R08: 0000000b1b42c180 R09: 0000000000000022
       R10: 0000000000000004 R11: 00000000ffffffff R12: 000000000000a030
       R13: 0000000000000082 R14: ffff8800aa4d0080 R15: 0000000000000082
       FS:  0000000000000000(0000) GS:ffff88014f400000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007fa53b90c000 CR3: 0000000001a0a000 CR4: 00000000001406f0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Stack:
        042080601b33869f ffff8800aae94000 00000000fffc2678 ffff88010000000a
        0000000000000000 000000000000a030 0000000000005302 ffff8800aa4d0080
        0000000000000206 ffff88014f403f90 ffffffff8104a716 ffff88014f403fa8
       Call Trace:
        <IRQ>
        [<ffffffff8104a716>] irq_exit+0x86/0x90
        [<ffffffff81031e7d>] smp_apic_timer_interrupt+0x3d/0x50
        [<ffffffff814f3eac>] apic_timer_interrupt+0x7c/0x90
        <EOI>
        [<ffffffffa01c5b40>] ? gen8_write64+0x1a0/0x1a0 [i915]
        [<ffffffff814f2b39>] ? _raw_spin_unlock_irqrestore+0x9/0x20
        [<ffffffffa01c5c44>] gen8_write32+0x104/0x1a0 [i915]
        [<ffffffff8132c6a2>] ? n_tty_receive_buf_common+0x372/0xae0
        [<ffffffffa017cc9e>] gen6_set_rps_thresholds+0x1be/0x330 [i915]
        [<ffffffffa017eaf0>] gen6_set_rps+0x70/0x200 [i915]
        [<ffffffffa0185375>] intel_set_rps+0x25/0x30 [i915]
        [<ffffffffa01768fd>] gen6_pm_rps_work+0x10d/0x2e0 [i915]
        [<ffffffff81063852>] ? finish_task_switch+0x72/0x1c0
        [<ffffffff8105ab29>] process_one_work+0x139/0x350
        [<ffffffff8105b186>] worker_thread+0x126/0x490
        [<ffffffff8105b060>] ? rescuer_thread+0x320/0x320
        [<ffffffff8105fa64>] kthread+0xc4/0xe0
        [<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170
        [<ffffffff814f351f>] ret_from_fork+0x3f/0x70
        [<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170
      
      I could not explain, or find a code path, which would explain
      a +20 second lockup, but from some instrumentation it was
      apparent the interrupts off proportion of time was between
      10-25% under heavy load which is quite bad.
      
      When a interrupt "cliff" is reached, which was >~320k irq/s on
      my machine, the whole system goes into a terrible state of the
      above described multi-second lockups.
      
      By moving the GT interrupt handling to a tasklet in a most
      simple way, the problem above disappears completely.
      
      Testing the effect on sytem-wide latencies using
      igt/gem_syslatency shows the following before this patch:
      
      gem_syslatency: cycles=1532739, latency mean=416531.829us max=2499237us
      gem_syslatency: cycles=1839434, latency mean=1458099.157us max=4998944us
      gem_syslatency: cycles=1432570, latency mean=2688.451us max=1201185us
      gem_syslatency: cycles=1533543, latency mean=416520.499us max=2498886us
      
      This shows that the unrelated process is experiencing huge
      delays in its wake-up latency. After the patch the results
      look like this:
      
      gem_syslatency: cycles=808907, latency mean=53.133us max=1640us
      gem_syslatency: cycles=862154, latency mean=62.778us max=2117us
      gem_syslatency: cycles=856039, latency mean=58.079us max=2123us
      gem_syslatency: cycles=841683, latency mean=56.914us max=1667us
      
      Showing a huge improvement in the unrelated process wake-up
      latency. It also shows an approximate halving in the number
      of total empty batches submitted during the test. This may
      not be worrying since the test puts the driver under
      a very unrealistic load with ncpu threads doing empty batch
      submission to all GPU engines each.
      
      Another benefit compared to the hard-irq handling is that now
      work on all engines can be dispatched in parallel since we can
      have up to number of CPUs active tasklets. (While previously
      a single hard-irq would serially dispatch on one engine after
      another.)
      
      More interesting scenario with regards to throughput is
      "gem_latency -n 100" which  shows 25% better throughput and
      CPU usage, and 14% better dispatch latencies.
      
      I did not find any gains or regressions with Synmark2 or
      GLbench under light testing. More benchmarking is certainly
      required.
      
      v2:
         * execlists_lock should be taken as spin_lock_bh when
           queuing work from userspace now. (Chris Wilson)
         * uncore.lock must be taken with spin_lock_irq when
           submitting requests since that now runs from either
           softirq or process context.
      
      v3:
         * Expanded commit message with more testing data;
         * converted missed locking sites to _bh;
         * added execlist_lock comment. (Chris Wilson)
      
      v4:
         * Mention dispatch parallelism in commit. (Chris Wilson)
         * Do not hold uncore.lock over MMIO reads since the block
           is already serialised per-engine via the tasklet itself.
           (Chris Wilson)
         * intel_lrc_irq_handler should be static. (Chris Wilson)
         * Cancel/sync the tasklet on GPU reset. (Chris Wilson)
         * Document and WARN that tasklet cannot be active/pending
           on engine cleanup. (Chris Wilson/Imre Deak)
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Imre Deak <imre.deak@intel.com>
      Testcase: igt/gem_exec_nop/all
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94350Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: http://patchwork.freedesktop.org/patch/msgid/1459768316-6670-1-git-send-email-tvrtko.ursulin@linux.intel.com
      27af5eea
  5. 30 3月, 2016 1 次提交
    • C
      drm/i915: Exit cherryview_irq_handler() after one pass · 579de73b
      Chris Wilson 提交于
      This effectively reverts
      
      commit 8e5fd599
      Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Date:   Wed Apr 9 13:28:50 2014 +0300
      
          drm/i915/chv: Make CHV irq handler loop until all interrupts are consumed
      
      as under continuous execlists load we can saturate the IRQ handler,
      destablising the tsc clock and triggering the NMI watchdog to declare a hung
      CPU.
      
      [  552.756051] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
      [  552.756080] clocksource:                       'refined-jiffies' wd_now: 10003b480 wd_last: 10003b28c mask: ffffffff
      [  552.756091] clocksource:                       'tsc' cs_now: d55d31aa50 cs_last: d17446166c mask: ffffffffffffffff
      [  552.756210] clocksource: Switched to clocksource refined-jiffies
      [  575.217870] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
      [  575.217893] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rc7+ #18
      [  575.217905] Hardware name:                  /NUC5CPYB, BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015
      [  575.217915]  0000000000000000 ffff88027fd05bc0 ffffffff81288c6d 0000000000000000
      [  575.217935]  0000000000000001 ffff88027fd05be0 ffffffff810e72d1 0000000000000000
      [  575.217951]  ffff88027fd05c80 ffff88027fd05c20 ffffffff81114b60 0000000181015f1e
      [  575.217967] Call Trace:
      [  575.217973]  <NMI>  [<ffffffff81288c6d>] dump_stack+0x4f/0x72
      [  575.217994]  [<ffffffff810e72d1>] watchdog_overflow_callback+0x151/0x160
      [  575.218003]  [<ffffffff81114b60>] __perf_event_overflow+0xa0/0x1e0
      [  575.218016]  [<ffffffff811154c4>] perf_event_overflow+0x14/0x20
      [  575.218028]  [<ffffffff8101d2ca>] intel_pmu_handle_irq+0x1da/0x460
      [  575.218042]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
      [  575.218052]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
      [  575.218064]  [<ffffffff81014ae8>] perf_event_nmi_handler+0x28/0x50
      [  575.218075]  [<ffffffff81007540>] nmi_handle+0x60/0x130
      [  575.218086]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
      [  575.218096]  [<ffffffff810079c0>] do_nmi+0x140/0x470
      [  575.218108]  [<ffffffff81559ec7>] end_repeat_nmi+0x1a/0x1e
      [  575.218119]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
      [  575.218129]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
      [  575.218139]  [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
      [  575.218148]  <<EOE>>  [<ffffffff814a8353>] cpuidle_enter_state+0xf3/0x2f0
      [  575.218164]  [<ffffffff814a8587>] cpuidle_enter+0x17/0x20
      [  575.218175]  [<ffffffff810aaa3a>] call_cpuidle+0x2a/0x40
      [  575.218185]  [<ffffffff810aade3>] cpu_startup_entry+0x273/0x330
      [  575.218196]  [<ffffffff81033a1e>] start_secondary+0x10e/0x130
      
      However, not servicing all available IIR within the handler does hurt the
      throughput of pathological nop execbuf by about 20%, with a similar effect
      upon the dispatch latency of a series of execbuf.
      
      v2: use do {} while(0) for a smaller patch, and easier to revert again
      
      I have reasonable confidence that we do not miss GT interrupts (as
      execlists provides a stress case with a failure mechanism easily
      detected by igt), however I have less confidence about all the other
      sources of interrupts and worry that may lose a display hotplug
      interrupt, for example.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93467
      Testcase: igt/gem_exec_nop/basic # requires NMI watchdog
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Antti Koskipää <antti.koskipaa@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1457946117-6714-1-git-send-email-chris@chris-wilson.co.uk
      579de73b
  6. 24 3月, 2016 2 次提交
  7. 22 3月, 2016 1 次提交
  8. 17 3月, 2016 1 次提交
  9. 16 3月, 2016 5 次提交
  10. 04 3月, 2016 1 次提交
  11. 23 2月, 2016 3 次提交
  12. 13 1月, 2016 3 次提交
  13. 08 1月, 2016 3 次提交
  14. 07 1月, 2016 1 次提交
  15. 17 12月, 2015 1 次提交
    • I
      drm/i915: add support for checking if we hold an RPM reference · 1f814dac
      Imre Deak 提交于
      Atm, we assert that the device is not suspended until the point when the
      device is truly put to a suspended state. This is fine, but we can catch
      more problems if we check that RPM refcount is non-zero. After that one
      drops to zero we shouldn't access the device any more, even if the actual
      device suspend may be delayed. Change assert_rpm_wakelock_held()
      accordingly to check for a non-zero RPM refcount in addition to the
      current device-not-suspended check.
      
      For the new asserts to work we need to annotate every place explicitly in
      the code where we expect that the device is powered. The places where we
      only assume this, but may not hold an RPM reference:
      - driver load
        We assume the device to be powered until we enable RPM. Make this
        explicit by taking an RPM reference around the load function.
      - system and runtime sudpend/resume handlers
        These handlers are called when the RPM reference becomes 0 and know the
        exact point after which the device can get powered off. Disable the
        RPM-reference-held check for their duration.
      - the IRQ, hangcheck and RPS work handlers
        These handlers are flushed in the system/runtime suspend handler
        before the device is powered off, so it's guaranteed that they won't
        run while the device is powered off even though they don't hold any
        RPM reference. Disable the RPM-reference-held check for their duration.
      
      In all these cases we still check that the device is not suspended.
      These explicit annotations also have the positive side effect of
      documenting our assumptions better.
      
      This caught additional WARNs from the atomic modeset path, those should
      be fixed separately.
      
      v2:
      - remove the redundant HAS_RUNTIME_PM check (moved to patch 1) (Ville)
      v3:
      - use a new dedicated RPM wakelock refcount to also catch cases where
        our own RPM get/put functions were not called (Chris)
      - assert also that the new RPM wakelock refcount is 0 in the RPM
        suspend handler (Chris)
      - change the assert error message to be more meaningful (Chris)
      - prevent false assert errors and check that the RPM wakelock is 0 in
        the RPM resume handler too
      - prevent false assert errors in the hangcheck work too
      - add a device not suspended assert check to the hangcheck work
      v4:
      - rename disable/enable_rpm_asserts to disable/enable_rpm_wakeref_asserts
        and wakelock_count to wakeref_count
      - disable the wakeref asserts in the IRQ handlers and RPS work too
      - update/clarify commit message
      v5:
      - mark places we plan to change to use proper RPM refcounting with
        separate DISABLE/ENABLE_RPM_WAKEREF_ASSERTS aliases (Chris)
      Signed-off-by: NImre Deak <imre.deak@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: http://patchwork.freedesktop.org/patch/msgid/1450227139-13471-1-git-send-email-imre.deak@intel.com
      1f814dac
  16. 10 12月, 2015 1 次提交
  17. 27 11月, 2015 3 次提交
  18. 26 11月, 2015 2 次提交
  19. 18 11月, 2015 2 次提交
  20. 06 11月, 2015 1 次提交
  21. 26 10月, 2015 1 次提交
  22. 23 10月, 2015 1 次提交
  23. 22 10月, 2015 1 次提交
  24. 21 10月, 2015 1 次提交
  25. 20 10月, 2015 1 次提交