1. 14 6月, 2014 1 次提交
    • R
      drm/i915: Force PSR exit by inactivating it. · 7c8f8a70
      Rodrigo Vivi 提交于
      The perfect solution for psr_exit is the hardware tracking the changes and
      doing the psr exit by itself. This scenario works for HSW and BDW with some
      environments like Gnome and Wayland.
      
      However there are many other scenarios that this isn't true. Mainly one right
      now is KDE users on HSW and BDW with PSR on. User would miss many screen
      updates. For instances any key typed could be seen only when mouse cursor is
      moved. So this patch introduces the ability of trigger PSR exit on kernel side
      on some common cases that.
      
      Most of the cases are coverred by psr_exit at set_domain. The remaining cases
      are coverred by triggering it at set_domain, busy_ioctl, sw_finish and
      mark_busy.
      
      The downside here might be reducing the residency time on the cases this
      already work very wall like Gnome environment. But so far let's get focused
      on fixinge issues sio PSR couild be used for everybody and we could even
      get it enabled by default. Later we can add some alternatives to choose the
      level of PSR efficiency over boot flag of even over crtc property.
      
      v2: remove exit from connector_dpms. Daniel pointed this is the wrong way and
      also this isn't needed for BDW and HSW anyway.
      
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: NVijay Purushothaman <vijay.a.purushothaman@intel.com>
      Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      7c8f8a70
  2. 13 6月, 2014 2 次提交
    • C
      drm/i915: Prefault the entire object on first page fault · b90b91d8
      Chris Wilson 提交于
      Inserting additional PTEs has no side-effect for us as the pfn are fixed
      for the entire time the object is resident in the global GTT. The
      downside is that we pay the entire cost of faulting the object upon the
      first hit, for which we in return receive the benefit of removing the
      per-page faulting overhead.
      
      On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences,
      Upload rate for 2 linear surfaces:	8127MiB/s -> 8134MiB/s
      Upload rate for 2 tiled surfaces:	8607MiB/s -> 8625MiB/s
      Upload rate for 4 linear surfaces:	8127MiB/s -> 8127MiB/s
      Upload rate for 4 tiled surfaces:	8611MiB/s -> 8602MiB/s
      Upload rate for 8 linear surfaces:	8114MiB/s -> 8124MiB/s
      Upload rate for 8 tiled surfaces:	8601MiB/s -> 8603MiB/s
      Upload rate for 16 linear surfaces:	8110MiB/s -> 8123MiB/s
      Upload rate for 16 tiled surfaces:	8595MiB/s -> 8606MiB/s
      Upload rate for 32 linear surfaces:	8104MiB/s -> 8121MiB/s
      Upload rate for 32 tiled surfaces:	8589MiB/s -> 8605MiB/s
      Upload rate for 64 linear surfaces:	8107MiB/s -> 8121MiB/s
      Upload rate for 64 tiled surfaces:	2013MiB/s -> 3017MiB/s
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: "Goel, Akash" <akash.goel@intel.com>
      Testcasee: igt/gem_fence_upload/performance
      Reviewed-by: NBrad Volkin <bradley.d.volkin@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b90b91d8
    • C
      drm/i915: Use the .release hook to drop the stolen drm_mm tracking · ef0cf27c
      Chris Wilson 提交于
      Now that we have a release hook into i915_gem_object_free, we can move
      the explicit call to the internal stolen function and hook it up
      throught the callback instead.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      ef0cf27c
  3. 11 6月, 2014 1 次提交
  4. 05 6月, 2014 2 次提交
  5. 27 5月, 2014 2 次提交
    • C
      drm/i915: Prevent negative relocation deltas from wrapping · d23db88c
      Chris Wilson 提交于
      This is pure evil. Userspace, I'm looking at you SNA, repacks batch
      buffers on the fly after generation as they are being passed to the
      kernel for execution. These batches also contain self-referenced
      relocations as a single buffer encompasses the state commands, kernels,
      vertices and sampler. During generation the buffers are placed at known
      offsets within the full batch, and then the relocation deltas (as passed
      to the kernel) are tweaked as the batch is repacked into a smaller buffer.
      This means that userspace is passing negative relocations deltas, which
      subsequently wrap to large values if the batch is at a low address. The
      GPU hangs when it then tries to use the large value as a base for its
      address offsets, rather than wrapping back to the real value (as one
      would hope). As the GPU uses positive offsets from the base, we can
      treat the relocation address as the minimum address read by the GPU.
      For the upper bound, we trust that userspace will not read beyond the
      end of the buffer.
      
      So, how do we fix negative relocations from wrapping? We can either
      check that every relocation looks valid when we write it, and then
      position each object such that we prevent the offset wraparound, or we
      just special-case the self-referential behaviour of SNA and force all
      batches to be above 256k. Daniel prefers the latter approach.
      
      This fixes a GPU hang when it tries to use an address (relocation +
      offset) greater than the GTT size. The issue would occur quite easily
      with full-ppgtt as each fd gets its own VM space, so low offsets would
      often be handed out. However, with the rearrangement of the low GTT due
      to capturing the BIOS framebuffer, it is already affecting kernels 3.15
      onwards. I think only IVB+ is susceptible to this bug, but the workaround
      should only kick in rarely, so it seems sensible to always apply it.
      
      v3: Use a bias for batch buffers to prevent small negative delta relocations
      from wrapping.
      
      v4 from Daniel:
      - s/BIAS/BATCH_OFFSET_BIAS/
      - Extract eb_vma_misplaced/i915_vma_misplaced since the conditions
        were growing rather cumbersome.
      - Add a comment to eb_get_batch explaining why we do this.
      - Apply the batch offset bias everywhere but mention that we've only
        observed it on gen7 gpus.
      - Drop PIN_OFFSET_FIX for now, that slipped in from a feature patch.
      
      v5: Add static to eb_get_batch, spotted by 0-day tester.
      
      Testcase: igt/gem_bad_reloc
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78533
      Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
      Cc: stable@vger.kernel.org
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d23db88c
    • C
      drm/i915: Fix dynamic allocation of physical handles · 00731155
      Chris Wilson 提交于
      A single object may be referenced by multiple registers fundamentally
      breaking the static allotment of ids in the current design. When the
      object is used the second time, the physical address of the first
      assignment is relinquished and a second one granted. However, the
      hardware is still reading (and possibly writing) to the old physical
      address now returned to the system. Eventually hilarity will ensue, but
      in the short term, it just means that cursors are broken when using more
      than one pipe.
      
      v2: Fix up leak of pci handle when handling an error during attachment,
      and avoid a double kmap/kunmap. (Ville)
      Rebase against -fixes.
      
      v3: And fix the error handling added in v2 (Ville)
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77351Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      00731155
  6. 23 5月, 2014 3 次提交
    • O
      drm/i915: s/i915_hw_context/intel_context · 273497e5
      Oscar Mateo 提交于
      Up until now, contexts had one (and only one) backing object that was
      used by the hardware to save/restore render ring contexts (via the
      MI_SET_CONTEXT command). Other rings did not have or need this, so
      our i915_hw_context struct had a 1:1 relationship with a a real HW
      context.
      
      With Logical Ring Contexts and Execlists, this is not possible anymore:
      all rings need a backing object, and it cannot be reused. To prepare
      for that, rename our contexts to the more generic term intel_context.
      
      No functional changes.
      Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      273497e5
    • O
      drm/i915: Split the ringbuffers from the rings (2/3) · ee1b1e5e
      Oscar Mateo 提交于
      This refactoring has been performed using the following Coccinelle
      semantic script:
      
          @@
          struct intel_engine_cs r;
          @@
          (
          - (r).obj
          + r.buffer->obj
          |
          - (r).virtual_start
          + r.buffer->virtual_start
          |
          - (r).head
          + r.buffer->head
          |
          - (r).tail
          + r.buffer->tail
          |
          - (r).space
          + r.buffer->space
          |
          - (r).size
          + r.buffer->size
          |
          - (r).effective_size
          + r.buffer->effective_size
          |
          - (r).last_retired_head
          + r.buffer->last_retired_head
          )
      
          @@
          struct intel_engine_cs *r;
          @@
          (
          - (r)->obj
          + r->buffer->obj
          |
          - (r)->virtual_start
          + r->buffer->virtual_start
          |
          - (r)->head
          + r->buffer->head
          |
          - (r)->tail
          + r->buffer->tail
          |
          - (r)->space
          + r->buffer->space
          |
          - (r)->size
          + r->buffer->size
          |
          - (r)->effective_size
          + r->buffer->effective_size
          |
          - (r)->last_retired_head
          + r->buffer->last_retired_head
          )
      
          @@
          expression E;
          @@
          (
          - LP_RING(E)->obj
          + LP_RING(E)->buffer->obj
          |
          - LP_RING(E)->virtual_start
          + LP_RING(E)->buffer->virtual_start
          |
          - LP_RING(E)->head
          + LP_RING(E)->buffer->head
          |
          - LP_RING(E)->tail
          + LP_RING(E)->buffer->tail
          |
          - LP_RING(E)->space
          + LP_RING(E)->buffer->space
          |
          - LP_RING(E)->size
          + LP_RING(E)->buffer->size
          |
          - LP_RING(E)->effective_size
          + LP_RING(E)->buffer->effective_size
          |
          - LP_RING(E)->last_retired_head
          + LP_RING(E)->buffer->last_retired_head
          )
      
      Note: On top of this this patch also removes the now unused ringbuffer
      fields in intel_engine_cs.
      Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
      [danvet: Add note about fixup patch included here.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      ee1b1e5e
    • O
      drm/i915: s/intel_ring_buffer/intel_engine_cs · a4872ba6
      Oscar Mateo 提交于
      In the upcoming patches we plan to break the correlation between
      engine command streamers (a.k.a. rings) and ringbuffers, so it
      makes sense to refactor the code and make the change obvious.
      
      No functional changes.
      Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      a4872ba6
  7. 22 5月, 2014 1 次提交
  8. 20 5月, 2014 5 次提交
  9. 17 5月, 2014 1 次提交
    • C
      drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl · 5cc9ed4b
      Chris Wilson 提交于
      By exporting the ability to map user address and inserting PTEs
      representing their backing pages into the GTT, we can exploit UMA in order
      to utilize normal application data as a texture source or even as a
      render target (depending upon the capabilities of the chipset). This has
      a number of uses, with zero-copy downloads to the GPU and efficient
      readback making the intermixed streaming of CPU and GPU operations
      fairly efficient. This ability has many widespread implications from
      faster rendering of client-side software rasterisers (chromium),
      mitigation of stalls due to read back (firefox) and to faster pipelining
      of texture data (such as pixel buffer objects in GL or data blobs in CL).
      
      v2: Compile with CONFIG_MMU_NOTIFIER
      v3: We can sleep while performing invalidate-range, which we can utilise
      to drop our page references prior to the kernel manipulating the vma
      (for either discard or cloning) and so protect normal users.
      v4: Only run the invalidate notifier if the range intercepts the bo.
      v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers
      v6: Recheck after reacquire mutex for lost mmu.
      v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary.
      v8: Fix rebasing error after forwarding porting the back port.
      v9: Limit the userptr to page aligned entries. We now expect userspace
          to handle all the offset-in-page adjustments itself.
      v10: Prevent vma from being copied across fork to avoid issues with cow.
      v11: Drop vma behaviour changes -- locking is nigh on impossible.
           Use a worker to load user pages to avoid lock inversions.
      v12: Use get_task_mm()/mmput() for correct refcounting of mm.
      v13: Use a worker to release the mmu_notifier to avoid lock inversion
      v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer
           with its own locking and tree of objects for each mm/mmu_notifier.
      v15: Prevent overlapping userptr objects, and invalidate all objects
           within the mmu_notifier range
      v16: Fix a typo for iterating over multiple objects in the range and
           rearrange error path to destroy the mmu_notifier locklessly.
           Also close a race between invalidate_range and the get_pages_worker.
      v17: Close a race between get_pages_worker/invalidate_range and fresh
           allocations of the same userptr range - and notice that
           struct_mutex was presumed to be held when during creation it wasn't.
      v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory
           for the struct sg_table and to clear it before reporting an error.
      v19: Always error out on read-only userptr requests as we don't have the
           hardware infrastructure to support them at the moment.
      v20: Refuse to implement read-only support until we have the required
           infrastructure - but reserve the bit in flags for future use.
      v21: use_mm() is not required for get_user_pages(). It is only meant to
           be used to fix up the kernel thread's current->mm for use with
           copy_user().
      v22: Use sg_alloc_table_from_pages for that chunky feeling
      v23: Export a function for sanity checking dma-buf rather than encode
           userptr details elsewhere, and clean up comments based on
           suggestions by Bradley.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
      Cc: Akash Goel <akash.goel@intel.com>
      Cc: "Volkin, Bradley D" <bradley.d.volkin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Reviewed-by: NBrad Volkin <bradley.d.volkin@intel.com>
      [danvet: Frob ioctl allocation to pick the next one - will cause a bit
      of fuss with create2 apparently, but such are the rules.]
      [danvet2: oops, forgot to git add after manual patch application]
      [danvet3: Appease sparse.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      5cc9ed4b
  10. 16 5月, 2014 1 次提交
    • O
      drm/i915: Gracefully handle obj not bound to GGTT in is_pin_display · 19656430
      Oscar Mateo 提交于
      Otherwise, we do a NULL pointer dereference.
      
      I've seen this happen while handling an error in
      i915_gem_object_pin_to_display_plane():
      
      If i915_gem_object_set_cache_level() fails, we call is_pin_display()
      to handle the error. At this point, the object is still not pinned
      to GGTT and maybe not even bound, so we have to check before we
      dereference its GGTT vma.
      
      The IGT kms_flip/bo-too-big tests for this bug.
      
      v2: Chris Wilson says restoring the old value is easier, but that
      is_pin_display is useful as a theory of operation. Take the solomonic
      decision: at least this way is_pin_display is a little more robust
      (until Chris can kill it off).
      
      v3: Chris suggests the WARN in i915_gem_obj_to_ggtt has outlived its
      usefulness: add a reminder to remove it.
      
      Issue: VIZ-3772
      Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
      Testcase: igt/kms_flip/bo-too-big
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      19656430
  11. 15 5月, 2014 2 次提交
    • D
      drm/i915: Only do gtt cleanup in vma_unbind for the global vma · 8b1bc9b4
      Daniel Vetter 提交于
      Otherwise we end up tearing down fences when e.g. the client quits
      way too early. Might or might not fix a fence pin_count BUG Ville has
      reported.
      
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      8b1bc9b4
    • D
      drm/i915: Don't drop pinned fences · aff10b30
      Daniel Vetter 提交于
      Userspace can currently provoke this when e.g. trying to use a pinned
      scanout as a cursor or overlay target. Later on that might lead to
      some fun fence pin count mayhem.
      
      Spurred by Ville's report that something goes wrong here and
      originally I've thought that this might slip through the pwrite gtt
      fastpath. But that one checks of obj tiling, so should be ok.
      
      But one thing that _does_ blow up is the vma unbinding with more than
      one address space. The next patch will fix this.
      
      v2: Use a WARN_ON - Chris pointed out that we already catch all cases
      so userspace can't provoke this like I've originally feared.
      
      While reviewing relevant code I've noticed a pile of DRM_ERROR in the
      overlay&cursor code which are all triggerable by userspace. Tune them
      down while at it.
      
      v3: Split out the DRM_ERROR->DRM_DEBUG_KMS change into a separate patch,
      as requested by Chris.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Tested-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      aff10b30
  12. 13 5月, 2014 1 次提交
    • D
      drm/i915: WARN_ON fence pin leaks · d8ffa60b
      Daniel Vetter 提交于
      The fence pin count should always be <= the bo pin count. If that's
      not the case then we have a funny problem and are leaking references
      somewhere.
      
      Which means we can catch fence pin leaks by checking for the same
      upper limit as we do for the bo pin count. Inspired by a discussion
      with Ville about a fence leak igt testcase.
      
      v2: Also check for fence->pin_count <= ggtt_vma->pin_count, since that
      might catch a leak even quicker. Also de-inline them, they're getting
      too big.
      
      v3: Don't separately check for MAX_PIN_COUNT since the > vma->pin_count
      check will catch that already (Chris).
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d8ffa60b
  13. 08 5月, 2014 1 次提交
  14. 07 5月, 2014 1 次提交
    • B
      drm/i915: Make aliasing a 2nd class VM · 6e7186af
      Ben Widawsky 提交于
      There is a good debate to be had about how best to fit the aliasing
      PPGTT into the code. However, as it stands right now, getting aliasing
      PPGTT bindings is a hack, and done through implicit arguments. To make
      this absolutely clear, WARN and return an error if a driver writer tries
      to do something they shouldn't.
      
      I have no issue with an eventual revert of this patch. It makes sense
      for what we have today.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      6e7186af
  15. 05 5月, 2014 6 次提交
  16. 23 4月, 2014 1 次提交
  17. 22 4月, 2014 1 次提交
    • D
      drm/irq: remove cargo-culted locking from irq_install/uninstall · e090c53b
      Daniel Vetter 提交于
      The dev->struct_mutex locking in drm_irq.c only protects
      dev->irq_enabled. Which isn't really much at all and only prevents
      especially nasty ums userspace from concurrently installing the
      interrupt handling a few times. Or at least trying.
      
      There are tons of unlocked readers of dev->irqs_enabled in the vblank
      wait code (and by extension also in the pageflip code since that uses
      the same vblank timestamp engine).
      
      Real modesetting drivers should ensure that nothing can go haywire
      with a sane setup teardown sequence. So we only really need this for
      the drm_control ioctl, everywhere else this will just paper over
      nastiness.
      
      Note that drm/i915 is a bit specially due to the gem+ums combination.
      So there we also need to properly protect the entervt and leavevt
      ioctls. But it's definitely saner to do everything in one go than to
      drop the lock in-between.
      
      Finally there's the gpu reset code in drm/i915. That one's just race
      (concurrent userspace calls to for vblank waits of pageflips could
      spuriously fail). So wrap it up in with a nice comment since fixing
      this is more involved.
      
      v2: Rebase and fix commit message (Thierry)
      Reviewed-by: NThierry Reding <treding@nvidia.com>
      Reviewed-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      e090c53b
  18. 11 4月, 2014 1 次提交
  19. 09 4月, 2014 1 次提交
  20. 04 4月, 2014 1 次提交
    • L
      drm: Add support for two-ended allocation, v3 · 62347f9e
      Lauri Kasanen 提交于
      Clients like i915 need to segregate cache domains within the GTT which
      can lead to small amounts of fragmentation. By allocating the uncached
      buffers from the bottom and the cacheable buffers from the top, we can
      reduce the amount of wasted space and also optimize allocation of the
      mappable portion of the GTT to only those buffers that require CPU
      access through the GTT.
      
      For other drivers, allocating small bos from one end and large ones
      from the other helps improve the quality of fragmentation.
      
      Based on drm_mm work by Chris Wilson.
      
      v3: Changed to use a TTM placement flag
      v2: Updated kerneldoc
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Ben Widawsky <ben@bwidawsk.net>
      Cc: Christian König <deathsimple@vodafone.de>
      Signed-off-by: NLauri Kasanen <cand@gmx.com>
      Signed-off-by: NDavid Airlie <airlied@redhat.com>
      62347f9e
  21. 31 3月, 2014 1 次提交
  22. 21 3月, 2014 1 次提交
  23. 19 3月, 2014 1 次提交
  24. 16 3月, 2014 1 次提交
    • D
      drm: use anon-inode instead of relying on cdevs · 6796cb16
      David Herrmann 提交于
      DRM drivers share a common address_space across all character-devices of a
      single DRM device. This allows simple buffer eviction and mapping-control.
      However, DRM core currently waits for the first ->open() on any char-dev
      to mark the underlying inode as backing inode of the device. This delayed
      initialization causes ugly conditions all over the place:
        if (dev->dev_mapping)
          do_sth();
      
      To avoid delayed initialization and to stop reusing the inode of the
      char-dev, we allocate an anonymous inode for each DRM device and reset
      filp->f_mapping to it on ->open().
      Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
      6796cb16
  25. 13 3月, 2014 1 次提交