1. 19 6月, 2014 1 次提交
    • O
      drm/i915: Remove ctx->last_ring · 14d8ec54
      Oscar Mateo 提交于
      The original comment that introduced it said:
      
      commit 0009e46c
      Author: Ben Widawsky <ben@bwidawsk.net>
      Date:   Fri Dec 6 14:11:02 2013 -0800
      
          drm/i915: Track which ring a context ran on
      
          Previously we dropped the association of a context to a ring. It is
          however very important to know which ring a context ran on (we could
          have reused the other member, but I was nitpicky).
      
          This is very important when we switch address spaces, which unlike
          context objects, do change per ring.
      
          As an example, if we have:
      
                  RCS   BCS
          ctx            A
          ctx      A
          ctx      B
          ctx            B
      
          Without tracking the last ring B ran on, we wouldn't know to switch the
          address space on BCS in the last row.
      
      But this is not really true, because we are already checking to != from (with
      "from" being = ring->last_context) and that should be enough to make sure we
      switch to the right address space.
      
      We would have a problem if we switched the context object for every ring (since
      then we would fail to do it in some situations) but we only switch it for the
      render ring, so we don't care.
      Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      14d8ec54
  2. 17 6月, 2014 2 次提交
    • S
      drm/i915: Replaced Blitter ring based flips with MMIO flips · 84c33a64
      Sourab Gupta 提交于
      This patch enables the framework for using MMIO based flip calls,
      in contrast with the CS based flip calls which are being used currently.
      
      MMIO based flip calls can be enabled on architectures where
      Render and Blitter engines reside in different power wells. The
      decision to use MMIO flips can be made based on workloads to give
      100% residency for Media power well.
      
      v2: The MMIO flips now use the interrupt driven mechanism for issuing the
      flips when target seqno is reached. (Incorporating Ville's idea)
      
      v3: Rebasing on latest code. Code restructuring after incorporating
      Damien's comments
      
      v4: Addressing Ville's review comments
          -general cleanup
          -updating only base addr instead of calling update_primary_plane
          -extending patch for gen5+ platforms
      
      v5: Addressed Ville's review comments
          -Making mmio flip vs cs flip selection based on module parameter
          -Adding check for DRIVER_MODESET feature in notify_ring before calling
           notify mmio flip.
          -Other changes mostly in function arguments
      
      v6: -Having a seperate function to check condition for using mmio flips (Ville)
          -propogating error code from i915_gem_check_olr (Ville)
      
      v7: -Adding __must_check with i915_gem_check_olr (Chris)
          -Renaming mmio_flip_data to mmio_flip (Chris)
          -Rebasing on latest nightly
      
      v8: -Rebasing on latest code
          -squash 3rd patch in series(mmio setbase vs page flip race) with this patch
          -Added new tiling mode update in intel_do_mmio_flip (Chris)
      
      v9: -check for obj->last_write_seqno being 0 instead of obj->ring being NULL in
      intel_postpone_flip, as this is a more restrictive condition (Chris)
      
      v10: -Applied Chris's suggestions for squashing patches 2,3 into this patch.
      These patches make the selection of CS vs MMIO flip at the page flip time, and
      make the module parameter for using mmio flips as tristate, the states being
      'force CS flips', 'force mmio flips', 'driver discretion'.
      Changed the logic for driver discretion (Chris)
      
      v11: Minor code cleanup(better readability, fixing whitespace errors, using
      lockdep to check mutex locked status in postpone_flip, removal of __must_check
      in function definition) (Chris)
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NSourab Gupta <sourab.gupta@intel.com>
      Signed-off-by: NAkash Goel <akash.goel@intel.com>
      Tested-by: Chris Wilson <chris@chris-wilson.co.uk> # snb, ivb
      [danvet: Fix up parameter alignement checkpatch spotted.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      84c33a64
    • A
      drm/i915: Added write-enable pte bit supportt · 24f3a8cf
      Akash Goel 提交于
      This adds support for a write-enable bit in the entry of GTT.
      This is handled via a read-only flag in the GEM buffer object which
      is then used to see how to set the bit when writing the GTT entries.
      Currently by default the Batch buffer & Ring buffers are marked as read only.
      
      v2: Moved the pte override code for read-only bit to 'byt_pte_encode'. (Chris)
          Fixed the issue of leaving 'gt_old_ro' as unused. (Chris)
      
      v3: Removed the 'gt_old_ro' field, now setting RO bit only for Ring Buffers(Daniel).
      
      v4: Added a new 'flags' parameter to all the pte(gen6) encode & insert_entries functions,
          in lieu of overloading the cache_level enum (Daniel).
      
      v5: Removed the superfluous VLV check & changed the definition location of PTE_READ_ONLY flag (Imre)
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: NAkash Goel <akash.goel@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      24f3a8cf
  3. 14 6月, 2014 1 次提交
    • R
      drm/i915: Force PSR exit by inactivating it. · 7c8f8a70
      Rodrigo Vivi 提交于
      The perfect solution for psr_exit is the hardware tracking the changes and
      doing the psr exit by itself. This scenario works for HSW and BDW with some
      environments like Gnome and Wayland.
      
      However there are many other scenarios that this isn't true. Mainly one right
      now is KDE users on HSW and BDW with PSR on. User would miss many screen
      updates. For instances any key typed could be seen only when mouse cursor is
      moved. So this patch introduces the ability of trigger PSR exit on kernel side
      on some common cases that.
      
      Most of the cases are coverred by psr_exit at set_domain. The remaining cases
      are coverred by triggering it at set_domain, busy_ioctl, sw_finish and
      mark_busy.
      
      The downside here might be reducing the residency time on the cases this
      already work very wall like Gnome environment. But so far let's get focused
      on fixinge issues sio PSR couild be used for everybody and we could even
      get it enabled by default. Later we can add some alternatives to choose the
      level of PSR efficiency over boot flag of even over crtc property.
      
      v2: remove exit from connector_dpms. Daniel pointed this is the wrong way and
      also this isn't needed for BDW and HSW anyway.
      
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: NVijay Purushothaman <vijay.a.purushothaman@intel.com>
      Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      7c8f8a70
  4. 13 6月, 2014 3 次提交
  5. 12 6月, 2014 1 次提交
  6. 11 6月, 2014 3 次提交
  7. 05 6月, 2014 1 次提交
    • S
      drm/i915: Detect if MIPI panel based on VBT and initialize only if present · 3e6bd011
      Shobhit Kumar 提交于
      It seems by default the VBT has MIPI configuration block as well. The
      Generic driver will assume always MIPI if MIPI configuration block is found.
      This is causing probelm when actually there is eDP. Fix this by looking
      into general definition block which will have device configurations. From here
      we can figure out what is the LFP type and initialize MIPI only if MIPI
      is found.
      
      v2: Addressed review comments by Damien
          - Moved PORT definitions to intel_bios.h and renamed as DVO_PORT_MIPIA
          - renamed is_mipi to has_mipi and moved definition as suggested
          - Check has_mipi inside parse_mipi and intel_dsi_init insted of outside
      
      v3: Make has_mipi as a bitfield as suggested
      Signed-off-by: NShobhit Kumar <shobhit.kumar@intel.com>
      Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
      [danvet: fold in conditions to pack everything neatly below 80 chars.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      3e6bd011
  8. 27 5月, 2014 2 次提交
    • C
      drm/i915: Prevent negative relocation deltas from wrapping · d23db88c
      Chris Wilson 提交于
      This is pure evil. Userspace, I'm looking at you SNA, repacks batch
      buffers on the fly after generation as they are being passed to the
      kernel for execution. These batches also contain self-referenced
      relocations as a single buffer encompasses the state commands, kernels,
      vertices and sampler. During generation the buffers are placed at known
      offsets within the full batch, and then the relocation deltas (as passed
      to the kernel) are tweaked as the batch is repacked into a smaller buffer.
      This means that userspace is passing negative relocations deltas, which
      subsequently wrap to large values if the batch is at a low address. The
      GPU hangs when it then tries to use the large value as a base for its
      address offsets, rather than wrapping back to the real value (as one
      would hope). As the GPU uses positive offsets from the base, we can
      treat the relocation address as the minimum address read by the GPU.
      For the upper bound, we trust that userspace will not read beyond the
      end of the buffer.
      
      So, how do we fix negative relocations from wrapping? We can either
      check that every relocation looks valid when we write it, and then
      position each object such that we prevent the offset wraparound, or we
      just special-case the self-referential behaviour of SNA and force all
      batches to be above 256k. Daniel prefers the latter approach.
      
      This fixes a GPU hang when it tries to use an address (relocation +
      offset) greater than the GTT size. The issue would occur quite easily
      with full-ppgtt as each fd gets its own VM space, so low offsets would
      often be handed out. However, with the rearrangement of the low GTT due
      to capturing the BIOS framebuffer, it is already affecting kernels 3.15
      onwards. I think only IVB+ is susceptible to this bug, but the workaround
      should only kick in rarely, so it seems sensible to always apply it.
      
      v3: Use a bias for batch buffers to prevent small negative delta relocations
      from wrapping.
      
      v4 from Daniel:
      - s/BIAS/BATCH_OFFSET_BIAS/
      - Extract eb_vma_misplaced/i915_vma_misplaced since the conditions
        were growing rather cumbersome.
      - Add a comment to eb_get_batch explaining why we do this.
      - Apply the batch offset bias everywhere but mention that we've only
        observed it on gen7 gpus.
      - Drop PIN_OFFSET_FIX for now, that slipped in from a feature patch.
      
      v5: Add static to eb_get_batch, spotted by 0-day tester.
      
      Testcase: igt/gem_bad_reloc
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78533
      Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
      Cc: stable@vger.kernel.org
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d23db88c
    • C
      drm/i915: Fix dynamic allocation of physical handles · 00731155
      Chris Wilson 提交于
      A single object may be referenced by multiple registers fundamentally
      breaking the static allotment of ids in the current design. When the
      object is used the second time, the physical address of the first
      assignment is relinquished and a second one granted. However, the
      hardware is still reading (and possibly writing) to the old physical
      address now returned to the system. Eventually hilarity will ensue, but
      in the short term, it just means that cursors are broken when using more
      than one pipe.
      
      v2: Fix up leak of pci handle when handling an error during attachment,
      and avoid a double kmap/kunmap. (Ville)
      Rebase against -fixes.
      
      v3: And fix the error handling added in v2 (Ville)
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77351Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      00731155
  9. 23 5月, 2014 3 次提交
  10. 22 5月, 2014 1 次提交
    • D
      drm/i915: move bsd dispatch index somewhere better · bdf1e7e3
      Daniel Vetter 提交于
      Adding stuff at the bottom is really no how this should be done, since
      that's the place for ums/dri dungeons.
      
      This was added in
      
      commit a8ebba75
      Author: Zhao Yakui <yakui.zhao@intel.com>
      Date:   Thu Apr 17 10:37:40 2014 +0800
      
          drm/i915: Use the coarse ping-pong mechanism based on drm fd to dispatch the BSD command on BDW GT3
      
      Also add a note to prevent this from happening again - people really
      should be less lazy and take more time to look for a good home of
      their new driver-global state.
      
      Cc: Imre Deak <imre.deak@intel.com>
      Cc: Zhao Yakui <yakui.zhao@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      bdf1e7e3
  11. 21 5月, 2014 1 次提交
  12. 20 5月, 2014 3 次提交
  13. 19 5月, 2014 2 次提交
  14. 17 5月, 2014 1 次提交
    • C
      drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl · 5cc9ed4b
      Chris Wilson 提交于
      By exporting the ability to map user address and inserting PTEs
      representing their backing pages into the GTT, we can exploit UMA in order
      to utilize normal application data as a texture source or even as a
      render target (depending upon the capabilities of the chipset). This has
      a number of uses, with zero-copy downloads to the GPU and efficient
      readback making the intermixed streaming of CPU and GPU operations
      fairly efficient. This ability has many widespread implications from
      faster rendering of client-side software rasterisers (chromium),
      mitigation of stalls due to read back (firefox) and to faster pipelining
      of texture data (such as pixel buffer objects in GL or data blobs in CL).
      
      v2: Compile with CONFIG_MMU_NOTIFIER
      v3: We can sleep while performing invalidate-range, which we can utilise
      to drop our page references prior to the kernel manipulating the vma
      (for either discard or cloning) and so protect normal users.
      v4: Only run the invalidate notifier if the range intercepts the bo.
      v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers
      v6: Recheck after reacquire mutex for lost mmu.
      v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary.
      v8: Fix rebasing error after forwarding porting the back port.
      v9: Limit the userptr to page aligned entries. We now expect userspace
          to handle all the offset-in-page adjustments itself.
      v10: Prevent vma from being copied across fork to avoid issues with cow.
      v11: Drop vma behaviour changes -- locking is nigh on impossible.
           Use a worker to load user pages to avoid lock inversions.
      v12: Use get_task_mm()/mmput() for correct refcounting of mm.
      v13: Use a worker to release the mmu_notifier to avoid lock inversion
      v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer
           with its own locking and tree of objects for each mm/mmu_notifier.
      v15: Prevent overlapping userptr objects, and invalidate all objects
           within the mmu_notifier range
      v16: Fix a typo for iterating over multiple objects in the range and
           rearrange error path to destroy the mmu_notifier locklessly.
           Also close a race between invalidate_range and the get_pages_worker.
      v17: Close a race between get_pages_worker/invalidate_range and fresh
           allocations of the same userptr range - and notice that
           struct_mutex was presumed to be held when during creation it wasn't.
      v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory
           for the struct sg_table and to clear it before reporting an error.
      v19: Always error out on read-only userptr requests as we don't have the
           hardware infrastructure to support them at the moment.
      v20: Refuse to implement read-only support until we have the required
           infrastructure - but reserve the bit in flags for future use.
      v21: use_mm() is not required for get_user_pages(). It is only meant to
           be used to fix up the kernel thread's current->mm for use with
           copy_user().
      v22: Use sg_alloc_table_from_pages for that chunky feeling
      v23: Export a function for sanity checking dma-buf rather than encode
           userptr details elsewhere, and clean up comments based on
           suggestions by Bradley.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
      Cc: Akash Goel <akash.goel@intel.com>
      Cc: "Volkin, Bradley D" <bradley.d.volkin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Reviewed-by: NBrad Volkin <bradley.d.volkin@intel.com>
      [danvet: Frob ioctl allocation to pick the next one - will cause a bit
      of fuss with create2 apparently, but such are the rules.]
      [danvet2: oops, forgot to git add after manual patch application]
      [danvet3: Appease sparse.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      5cc9ed4b
  15. 15 5月, 2014 1 次提交
  16. 14 5月, 2014 2 次提交
  17. 13 5月, 2014 3 次提交
    • D
      drm/i915: WARN_ON fence pin leaks · d8ffa60b
      Daniel Vetter 提交于
      The fence pin count should always be <= the bo pin count. If that's
      not the case then we have a funny problem and are leaking references
      somewhere.
      
      Which means we can catch fence pin leaks by checking for the same
      upper limit as we do for the bo pin count. Inspired by a discussion
      with Ville about a fence leak igt testcase.
      
      v2: Also check for fence->pin_count <= ggtt_vma->pin_count, since that
      might catch a leak even quicker. Also de-inline them, they're getting
      too big.
      
      v3: Don't separately check for MAX_PIN_COUNT since the > vma->pin_count
      check will catch that already (Chris).
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d8ffa60b
    • C
      drm/i915/chv: Add DPIO offset for Cherryview. v3 · a09caddd
      Chon Ming Lee 提交于
      CHV has 2 display phys.  First phy (IOSF offset 0x1A) has two channels,
      and second phy (IOSF offset 0x12) has single channel.  The first phy is
      used for port B and port C, while second phy is only for port D.
      
      v2: Move the pipe to determine which phy to select for
      vlv_dpio_read/vlv_dpio_write to another patch. (Daniel)
      v3: Rebase the code based on rework on how to calculate DPIO offset.
      Signed-off-by: NChon Ming Lee <chon.ming.lee@intel.com>
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      a09caddd
    • B
      drm/i915: Use hash tables for the command parser · 44e895a8
      Brad Volkin 提交于
      For clients that submit large batch buffers the command parser has
      a substantial impact on performance. On my HSW ULT system performance
      drops as much as ~20% on some tests. Most of the time is spent in the
      command lookup code. Converting that from the current naive search to
      a hash table lookup reduces the performance drop to ~10%.
      
      The choice of value for I915_CMD_HASH_ORDER allows all commands
      currently used in the parser tables to hash to their own bucket (except
      for one collision on the render ring). The tradeoff is that it wastes
      memory. Because the opcodes for the commands in the tables are not
      particularly well distributed, reducing the order still leaves many
      buckets empty. The increased collisions don't seem to have a huge
      impact on the performance gain, but for now anyhow, the parser trades
      memory for performance.
      
      NB: Ville noticed that the error paths through the ring init code
      will leak memory. I've not addressed that here. We can do a follow
      up pass to handle all of the leaks.
      
      v2: improved comment describing selection of hash key mask (Damien)
      replace a BUG_ON() with an error return (Tvrtko, Ville)
      commit message improvements
      Signed-off-by: NBrad Volkin <bradley.d.volkin@intel.com>
      Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      44e895a8
  18. 08 5月, 2014 1 次提交
  19. 07 5月, 2014 3 次提交
  20. 05 5月, 2014 5 次提交