1. 11 8月, 2017 2 次提交
    • C
      drm/i915/perf: Drop lockdep assert for i915_oa_init_reg_state() · 84a095e4
      Chris Wilson 提交于
      This is called from execlist context init which we need to be unlocked.
      Commit f89823c2 ("drm/i915/perf: Implement
      I915_PERF_ADD/REMOVE_CONFIG interface") added a lockdep assert to this
      path for unclear reasons, remove it again!
      
      Fixes: f89823c2 ("drm/i915/perf: Implement I915_PERF_ADD/REMOVE_CONFIG interface")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20170810175743.25401-2-chris@chris-wilson.co.ukReviewed-by: NMatthew Auld <matthew.auld@intel.com>
      84a095e4
    • C
      drm/i915/perf: Initialise dynamic sysfs group before creation · 40f75ea4
      Chris Wilson 提交于
      Another case where we need to call sysfs_attr_init() to setup the
      internal lockdep class prior to use:
      
      [    9.325229] BUG: key ffff880168bc7bb0 not in .data!
      [    9.325240] DEBUG_LOCKS_WARN_ON(1)
      [    9.325250] ------------[ cut here ]------------
      [    9.325280] WARNING: CPU: 1 PID: 275 at kernel/locking/lockdep.c:3156 lockdep_init_map+0x1b2/0x1c0
      [    9.325301] Modules linked in: intel_powerclamp(+) coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i915(+) snd_hda_intel snd_hda_codec snd_hwdep r8169 mii snd_hda_core snd_pcm prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel
      [    9.325375] CPU: 1 PID: 275 Comm: modprobe Not tainted 4.13.0-rc4-CI-Trybot_1040+ #1
      [    9.325395] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0045.B51.1704281422 04/28/2017
      [    9.325422] task: ffff8801721a4ec0 task.stack: ffffc900001dc000
      [    9.325440] RIP: 0010:lockdep_init_map+0x1b2/0x1c0
      [    9.325456] RSP: 0018:ffffc900001dfa10 EFLAGS: 00010282
      [    9.325473] RAX: 0000000000000016 RBX: ffff880168d54b80 RCX: 0000000000000000
      [    9.325488] RDX: 0000000080000001 RSI: 0000000000000001 RDI: ffffffff810f0800
      [    9.325505] RBP: ffffc900001dfa30 R08: 0000000000000001 R09: 0000000000000000
      [    9.325521] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880168bc7bb0
      [    9.325537] R13: 0000000000000000 R14: ffff880168bc7b98 R15: ffffffff81a263a0
      [    9.325554] FS:  00007fb60c3fd700(0000) GS:ffff88017fc80000(0000) knlGS:0000000000000000
      [    9.325574] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    9.325588] CR2: 0000006582777d80 CR3: 000000016d818000 CR4: 00000000003406e0
      [    9.325604] Call Trace:
      [    9.325618]  __kernfs_create_file+0x76/0xe0
      [    9.325632]  sysfs_add_file_mode_ns+0x8a/0x1a0
      [    9.325646]  internal_create_group+0xea/0x2c0
      [    9.325660]  sysfs_create_group+0x13/0x20
      [    9.325737]  i915_perf_register+0xde/0x220 [i915]
      [    9.325800]  i915_driver_load+0xa77/0x16c0 [i915]
      [    9.325863]  i915_pci_probe+0x37/0x90 [i915]
      [    9.325880]  pci_device_probe+0xa8/0x130
      [    9.325894]  driver_probe_device+0x29c/0x450
      [    9.325908]  __driver_attach+0xe3/0xf0
      [    9.325922]  ? driver_probe_device+0x450/0x450
      [    9.325935]  bus_for_each_dev+0x62/0xa0
      [    9.325948]  driver_attach+0x1e/0x20
      [    9.325960]  bus_add_driver+0x173/0x270
      [    9.325974]  driver_register+0x60/0xe0
      [    9.325986]  __pci_register_driver+0x60/0x70
      [    9.326044]  i915_init+0x6f/0x78 [i915]
      [    9.326066]  ? 0xffffffffa024e000
      [    9.326079]  do_one_initcall+0x43/0x170
      [    9.326094]  ? rcu_read_lock_sched_held+0x7a/0x90
      [    9.326109]  ? kmem_cache_alloc_trace+0x261/0x2d0
      [    9.326124]  do_init_module+0x5f/0x206
      [    9.326137]  load_module+0x2561/0x2da0
      [    9.326150]  ? show_coresize+0x30/0x30
      [    9.326165]  ? kernel_read_file+0x105/0x190
      [    9.326180]  SyS_finit_module+0xc1/0x100
      [    9.326192]  ? SyS_finit_module+0xc1/0x100
      [    9.326210]  entry_SYSCALL_64_fastpath+0x1c/0xb1
      [    9.326223] RIP: 0033:0x7fb60bf359f9
      [    9.326234] RSP: 002b:00007fff92b47c48 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      [    9.326255] RAX: ffffffffffffffda RBX: ffffffff814898a3 RCX: 00007fb60bf359f9
      [    9.326271] RDX: 0000000000000000 RSI: 00000028a9ceef8b RDI: 0000000000000000
      [    9.326287] RBP: ffffc900001dff88 R08: 0000000000000000 R09: 0000000000000000
      [    9.326303] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000040000
      [    9.326319] R13: 00000028aaef2a70 R14: 0000000000000000 R15: 00000028aaeee5d0
      [    9.326339]  ? __this_cpu_preempt_check+0x13/0x20
      [    9.326353] Code: f1 39 00 85 c0 0f 84 38 ff ff ff 83 3d 9f 44 ce 01 00 0f 85 2b ff ff ff 48 c7 c6 b2 a2 c7 81 48 c7 c7 53 40 c5 81 e8 3f 82 01 00 <0f> ff e9 11 ff ff ff 0f 1f 80 00 00 00 00 55 31 c9 31 d2 31 f6
      
      Fixes: 701f8231 ("drm/i915/perf: prune OA configs")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20170810175743.25401-1-chris@chris-wilson.co.ukReviewed-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
      40f75ea4
  2. 04 8月, 2017 6 次提交
  3. 17 7月, 2017 1 次提交
  4. 03 7月, 2017 2 次提交
  5. 21 6月, 2017 2 次提交
  6. 15 6月, 2017 6 次提交
    • L
      drm/i915/perf: add GLK support · 28c7ef9e
      Lionel Landwerlin 提交于
      Add OA support for Geminilake (pretty much identical to Broxton), and
      also add the associated OA configurations.
      Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
      Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170613112309.4088-2-lionel.g.landwerlin@intel.com
      28c7ef9e
    • L
      drm/i915/perf: add KBL support · 6c5c1d89
      Lionel Landwerlin 提交于
      Add OA support for Kabylake (pretty much identical to Skylake), and
      also add the associated OA configurations.
      Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
      Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      6c5c1d89
    • R
      drm/i915/perf: remove perf.hook_lock · 1bef3409
      Robert Bragg 提交于
      In earlier iterations of the i915-perf driver we had a number of
      callbacks/hooks from other parts of the i915 driver to e.g. notify us
      when a legacy context was pinned and these could run asynchronously with
      respect to the stream file operations and might also run in atomic
      context.
      
      dev_priv->perf.hook_lock had been for serialising access to state needed
      within these callbacks, but as the code has evolved some of the hooks
      have gone away or are implemented to avoid needing to lock any state.
      
      The remaining use of this lock was actually redundant considering how
      the gen7 oacontrol state used to be updated as part of a context pin
      hook.
      Signed-off-by: NRobert Bragg <robert@sixbynine.org>
      Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
      Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      1bef3409
    • R
      drm/i915/perf: per-gen timebase for checking sample freq · 155e941f
      Robert Bragg 提交于
      An oa_exponent_to_ns() utility and per-gen timebase constants where
      recently removed when updating the tail pointer race condition WA, and
      this restores those so we can update the _PROP_OA_EXPONENT validation
      done in read_properties_unlocked() to not assume we have a 12.5MHz
      timebase as we did for Haswell.
      
      Accordingly the oa_sample_rate_hard_limit value that's referenced by
      proc_dointvec_minmax defining the absolute limit for the OA sampling
      frequency is now initialized to (timestamp_frequency / 2) instead of the
      6.25MHz constant for Haswell.
      
      v2:
          Specify frequency of 19.2MHz for BXT (Ville)
          Initialize oa_sample_rate_hard_limit per-gen too (Lionel)
      Signed-off-by: NRobert Bragg <robert@sixbynine.org>
      Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
      Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      155e941f
    • R
      drm/i915/perf: Add OA unit support for Gen 8+ · 19f81df2
      Robert Bragg 提交于
      Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all
      share (more-or-less) the same OA unit design.
      
      Of particular note in comparison to Haswell: some OA unit HW config
      state has become per-context state and as a consequence it is somewhat
      more complicated to manage synchronous state changes from the cpu while
      there's no guarantee of what context (if any) is currently actively
      running on the gpu.
      
      The periodic sampling frequency which can be particularly useful for
      system-wide analysis (as opposed to command stream synchronised
      MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to
      have become per-context save and restored (while the OABUFFER
      destination is still a shared, system-wide resource).
      
      This support for gen8+ takes care to consider a number of timing
      challenges involved in synchronously updating per-context state
      primarily by programming all config state from the cpu and updating all
      current and saved contexts synchronously while the OA unit is still
      disabled.
      
      The driver intentionally avoids depending on command streamer
      programming to update OA state considering the lack of synchronization
      between the automatic loading of OACTXCONTROL state (that includes the
      periodic sampling state and enable state) on context restore and the
      parsing of any general purpose BB the driver can control. I.e. this
      implementation is careful to avoid the possibility of a context restore
      temporarily enabling any out-of-date periodic sampling state. In
      addition to the risk of transiently-out-of-date state being loaded
      automatically; there are also internal HW latencies involved in the
      loading of MUX configurations which would be difficult to account for
      from the command streamer (and we only want to enable the unit when once
      the MUX configuration is complete).
      
      Since the Gen8+ OA unit design no longer supports clock gating the unit
      off for a single given context (which effectively stopped any progress
      of counters while any other context was running) and instead supports
      tagging OA reports with a context ID for filtering on the CPU, it means
      we can no longer hide the system-wide progress of counters from a
      non-privileged application only interested in metrics for its own
      context. Although we could theoretically try and subtract the progress
      of other contexts before forwarding reports via read() we aren't in a
      position to filter reports captured via MI_REPORT_PERF_COUNT commands.
      As a result, for Gen8+, we always require the
      dev.i915.perf_stream_paranoid to be unset for any access to OA metrics
      if not root.
      
      v5: Drain submitted requests when enabling metric set to ensure no
          lite-restore erases the context image we just updated (Lionel)
      
      v6: In addition to drain, switch to kernel context & update all
          context in place (Chris)
      
      v7: Add missing mutex_unlock() if switching to kernel context fails
          (Matthew)
      
      v8: Simplify OA period/flex-eu-counters programming by using the
          batchbuffer instead of modifying ctx-image (Lionel)
      
      v9: Back to updating the context image (due to erroneous testing,
          batchbuffer programming the OA unit doesn't actually work)
          (Lionel)
          Pin context before updating context image (Chris)
          Drop MMIO programming now that we switch to a kernel context with
          right values in initial context image (Chris)
      
      v10: Just pin_map the contexts we want to modify or let the
           configuration happen on first use (Chris)
      
      v11: Update kernel context OA config through the batchbuffer rather
           than on the fly ctx-image update (Lionel)
      
      v12: Rework OA context registers update again by swithing away from
           user contexts and reconfiguring the kernel context through the
           batchbuffer and updating all the other contexts' context image.
           Also take care to lock slice/subslice configuration when OA is
           on. (Lionel)
      
      v13: Request rpcs updates on all engine when updating the OA config
           (Lionel)
      
      v14: Drop any kind of rpcs management now that we monitor sseu
           configuration changes in a later patch (Lionel)
           Remove usleep after programming the NOA configs on Gen8+, this
           doesn't seem to be needed (Lionel)
      
      v15: Respect coding style for block comments (Chris)
      
      v16: Add missing i915_add_request() in case we fail to emit OA
           configuration (Matthew)
      Signed-off-by: NRobert Bragg <robert@sixbynine.org>
      Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
      Reviewed-by: Matthew Auld <matthew.auld@intel.com> \o/
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      19f81df2
    • L
      drm/i915/perf: rework mux configurations queries · 3f488d99
      Lionel Landwerlin 提交于
      Gen8+ might have mux configurations per slices/subslices. Depending on
      whether slices/subslices have been fused off, only part of the
      configuration needs to be applied. This change reworks the mux
      configurations query mechanism to allow more than one set of registers
      to be programmed.
      
      v2: s/n_mux_regs/n_mux_configs/ (Matthew)
      Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
      Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      3f488d99
  7. 13 5月, 2017 8 次提交
  8. 04 5月, 2017 1 次提交
  9. 29 3月, 2017 2 次提交
  10. 28 3月, 2017 2 次提交
  11. 02 3月, 2017 1 次提交
  12. 19 12月, 2016 2 次提交
    • C
      drm/i915: Simplify releasing context reference · 69df05e1
      Chris Wilson 提交于
      A few users only take the struct_mutex in order to release a reference
      to a context. We can expose a kref_put_mutex() wrapper in order to
      simplify these users, and optimise taking of the mutex to the final
      unref.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-4-chris@chris-wilson.co.uk
      69df05e1
    • C
      drm/i915: Unify active context tracking between legacy/execlists/guc · e8a9c58f
      Chris Wilson 提交于
      The requests conversion introduced a nasty bug where we could generate a
      new request in the middle of constructing a request if we needed to idle
      the system in order to evict space for a context. The request to idle
      would be executed (and waited upon) before the current one, creating a
      minor havoc in the seqno accounting, as we will consider the current
      request to already be completed (prior to deferred seqno assignment) but
      ring->last_retired_head would have been updated and still could allow
      us to overwrite the current request before execution.
      
      We also employed two different mechanisms to track the active context
      until it was switched out. The legacy method allowed for waiting upon an
      active context (it could forcibly evict any vma, including context's),
      but the execlists method took a step backwards by pinning the vma for
      the entire active lifespan of the context (the only way to evict was to
      idle the entire GPU, not individual contexts). However, to circumvent
      the tricky issue of locking (i.e. we cannot take struct_mutex at the
      time of i915_gem_request_submit(), where we would want to move the
      previous context onto the active tracker and unpin it), we take the
      execlists approach and keep the contexts pinned until retirement.
      The benefit of the execlists approach, more important for execlists than
      legacy, was the reduction in work in pinning the context for each
      request - as the context was kept pinned until idle, it could short
      circuit the pinning for all active contexts.
      
      We introduce new engine vfuncs to pin and unpin the context
      respectively. The context is pinned at the start of the request, and
      only unpinned when the following request is retired (this ensures that
      the context is idle and coherent in main memory before we unpin it). We
      move the engine->last_context tracking into the retirement itself
      (rather than during request submission) in order to allow the submission
      to be reordered or unwound without undue difficultly.
      
      And finally an ulterior motive for unifying context handling was to
      prepare for mock requests.
      
      v2: Rename to last_retired_context, split out legacy_context tracking
      for MI_SET_CONTEXT.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-3-chris@chris-wilson.co.uk
      e8a9c58f
  13. 09 12月, 2016 1 次提交
  14. 08 12月, 2016 1 次提交
    • R
      drm/i915/perf: use DRM_DEBUG for userspace issues · 7708550c
      Robert Bragg 提交于
      Avoid using DRM_ERROR for conditions userspace can trigger with a bad
      config when opening a stream or from not reading data in a timely
      fashion (whereby the OA buffer fills up). These conditions are tested
      by i-g-t which treats error messages as failures if using the test
      runner. This wasn't an issue while the i915-perf igt tests were being
      run in isolation.
      
      One message relating to seeing a spurious zeroed report was changed to
      use DRM_NOTE instead of DRM_ERROR. Ideally this warning shouldn't be
      seen, but it's not a serious problem if it is. Considering that the
      tail margin mechanism is only a heuristic it's possible we might see
      this from time to time.
      
      Signed-off-by: Robert Bragg <robert@sixbynine.org:
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161201172152.10893-1-robert@sixbynine.org
      7708550c
  15. 02 12月, 2016 1 次提交
  16. 29 11月, 2016 1 次提交
  17. 22 11月, 2016 1 次提交