1. 03 9月, 2014 1 次提交
  2. 20 8月, 2014 1 次提交
  3. 15 8月, 2014 1 次提交
    • O
      drm/i915/bdw: Emission of requests with logical rings · 48e29f55
      Oscar Mateo 提交于
      On a previous iteration of this patch, I created an Execlists
      version of __i915_add_request and asbtracted it away as a
      vfunc. Daniel Vetter wondered then why that was needed:
      
      "with the clean split in command submission I expect every
      function to know wether it'll submit to an lrc (everything in
      intel_lrc.c) or wether it'll submit to a legacy ring (existing
      code), so I don't see a need for an add_request vfunc."
      
      The honest, hairy truth is that this patch is the glue keeping
      the whole logical ring puzzle together:
      
      - i915_add_request is used by intel_ring_idle, which in turn is
        used by i915_gpu_idle, which in turn is used in several places
        inside the eviction and gtt codes.
      - Also, it is used by i915_gem_check_olr, which is littered all
        over i915_gem.c
      - ...
      
      If I were to duplicate all the code that directly or indirectly
      uses __i915_add_request, I'll end up creating a separate driver.
      
      To show the differences between the existing legacy version and
      the new Execlists one, this time I have special-cased
      __i915_add_request instead of adding an add_request vfunc. I
      hope this helps to untangle this Gordian knot.
      Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
      [danvet: Adjust to ringbuf->FIXME_lrc_ctx per the discussion with
      Thomas Daniel.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      48e29f55
  4. 13 8月, 2014 5 次提交
  5. 12 8月, 2014 2 次提交
    • D
      drm/i915: Some cleanups for the ppgtt lifetime handling · ee960be7
      Daniel Vetter 提交于
      So when reviewing Michel's patch I've noticed a few things and cleaned
      them up:
      - The early checks in ppgtt_release are now redundant: The inactive
        list should always be empty now, so we can ditch these checks. Even
        for the aliasing ppgtt (though that's a different confusion) since
        we tear that down after all the objects are gone.
      - The ppgtt handling functions are splattered all over. Consolidate
        them in i915_gem_gtt.c, give them OCD prefixes and add wrappers for
        get/put.
      - There was a bit a confusion in ppgtt_release about whether it cares
        about the active or inactive list. It should care about them both,
        so augment the WARNINGs to check for both.
      
      There's still create_vm_for_ctx left to do, put that is blocked on the
      removal of ppgtt->ctx. Once that's done we can rename it to
      i915_ppgtt_create and move it to its siblings for handling ppgtts.
      
      v2: Move the ppgtt checks into the inline get/put functions as
      suggested by Chris.
      
      v3: Inline the now redundant ppgtt local variable.
      
      Cc: Michel Thierry <michel.thierry@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMichel Thierry <michel.thierry@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      ee960be7
    • M
      drm/i915: vma/ppgtt lifetime rules · b9d06dd9
      Michel Thierry 提交于
      VMAs should take a reference of the address space they use.
      
      Now, when the fd is closed, it will release the ref that the context was
      holding, but it will still be referenced by any vmas that are still
      active.
      
      ppgtt_release() should then only be called when the last thing referencing
      it releases the ref, and it can just call the base cleanup and free the
      ppgtt.
      
      Note that with this we will extend the lifetime of ppgtts which
      contain shared objects. But all the non-shared objects will get
      removed as soon as they drop of the active list and for the shared
      ones the shrinker can eventually reap them. Since we currently can't
      evict ppgtt pagetables either I don't think that temporary leak is
      important.
      Signed-off-by: NMichel Thierry <michel.thierry@intel.com>
      [danvet: Add note about potential ppgtt leak with this approach.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b9d06dd9
  6. 11 8月, 2014 6 次提交
  7. 07 8月, 2014 1 次提交
  8. 24 7月, 2014 1 次提交
  9. 23 7月, 2014 4 次提交
    • C
      drm/i915: Simplify i915_gem_release_all_mmaps() · eedd10f4
      Chris Wilson 提交于
      An object can only have an active gtt mapping if it is currently bound
      into the global gtt. Therefore we can simply walk the list of all bound
      objects and check the flag upon those for an active gtt mapping.
      
      From commit 48018a57
      Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
      Date:   Fri Dec 13 15:22:31 2013 -0200
      
          drm/i915: release the GTT mmaps when going into D3
      
      Also note that the WARN is inappropriate for this function as GPU
      activity is orthogonal to GTT mmap status. Rather it is the caller that
      relies upon this condition and so it should assert that the GPU is idle
      itself.
      
      References: https://bugs.freedesktop.org/show_bug.cgi?id=80081Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@gmail.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: NPaulo Zanoni <paulo.r.zanoni@intel.com>
      Tested-by: NPaulo Zanoni <paulo.r.zanoni@intel.com>
      [danvet: cherry-pick from -next to -fixes.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      eedd10f4
    • A
      drm/i915: Do not unmap object unless no other VMAs reference it · 9490edb5
      Armin Reese 提交于
      When using an IOMMU, GEM objects are mapped by their DMA address as the
      physical address is unknown. This depends on the underlying IOMMU
      driver to map and unmap the physical pages properly as defined in
      intel_iommu.c.
      
      The current code will tell the IOMMU to unmap the GEM BO's pages on the
      destruction of the first VMA that "maps" that BO. This is clearly wrong
      as there may be other VMAs "mapping" that BO (using flink). The scanout
      is one such example.
      
      The patch fixes this issue by only unmapping the DMA maps when there are
      no more VMAs mapping that object. This is equivalent to when an object
      is considered unbound as can be seen by the code. On the first VMA that
      again because bound, we will remap.
      
      An alternate solution would be to move the dma mapping to object
      creation and destrubtion. I am not sure if this is considered an
      unfriendly thing to do.
      
      Some notes to backporters trying to backport full PPGTT:
      
      The bug can never be hit without enabling the IOMMU. The existing code
      will also do the right thing when the object is shared via dmabuf. The
      failure should be demonstrable with flink. In cases when not using
      intel_iommu_strict it is likely (likely, as defined by: off the top of
      my head) on current workloads to *not* hit this bug since we often
      teardown all VMAs for an object shared across multiple VMs.  We also
      finish access to that object before the first dma_unmapping.
      intel_iommu_strict with flinked buffers is likely to hit this issue.
      Signed-off-by: NArmin Reese <armin.c.reese@intel.com>
      [danvet: Add the excellent commit message provided by Ben.]
      Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      9490edb5
    • J
      drm/i915: add helper for checking whether IRQs are enabled · 9df7575f
      Jesse Barnes 提交于
      Now that we use the runtime IRQ enable/disable functions in our suspend
      path, we can simply check the pm._irqs_disabled flag everywhere.  So
      rename it to catch the users, and add an inline for it to make the
      checks clear everywhere.
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      Reviewed-by: NPaulo Zanoni <paulo.r.zanoni@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      9df7575f
    • C
      drm/i915: Abandon oom quickly if killed by a signal · a1db2fa7
      Chris Wilson 提交于
      Whilst waiting to obtain our locks for the last resort shrinking before
      an oom, we check whether or not a fatal signal was pending. If there was,
      we do not need to keep waiting as the oom will be aborted.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      a1db2fa7
  10. 08 7月, 2014 3 次提交
  11. 20 6月, 2014 1 次提交
    • D
      drm/i915: Track frontbuffer invalidation/flushing · f99d7069
      Daniel Vetter 提交于
      So these are the guts of the new beast. This tracks when a frontbuffer
      gets invalidated (due to frontbuffer rendering) and hence should be
      constantly scaned out, and when it's flushed again and can be
      compressed/one-shot-upload.
      
      Rules for flushing are simple: The frontbuffer needs one more full
      upload starting from the next vblank. Which means that the flushing
      can _only_ be called once the frontbuffer update has been latched.
      
      But this poses a problem for pageflips: We can't just delay the
      flushing until the pageflip is latched, since that would pose the risk
      that we override frontbuffer rendering that has been scheduled
      in-between the pageflip ioctl and the actual latching.
      
      To handle this track asynchronous invalidations (and also pageflip)
      state per-ring and delay any in-between flushing until the rendering
      has completed. And also cancel any delayed flushing if we get a new
      invalidation request (whether delayed or not).
      
      Also call intel_mark_fb_busy in both cases in all cases to make sure
      that we keep the screen at the highest refresh rate both on flips,
      synchronous plane updates and for frontbuffer rendering.
      
      v2: Lots of improvements
      
      Suggestions from Chris:
      - Move invalidate/flush in flush_*_domain and set_to_*_domain.
      - Drop the flush in busy_ioctl since it's redundant. Was a leftover
        from an earlier concept to track flips/delayed flushes.
      - Don't forget about the initial modeset enable/final disable.
        Suggested by Chris.
      
      Track flips accurately, too. Since flips complete independently of
      rendering we need to track pending flips in a separate mask. Again if
      an invalidate happens we need to cancel the evenutal flush to avoid
      races.
      
      v3:
      Provide correct header declarations for flip functions. Currently not
      needed outside of intel_display.c, but part of the proper interface.
      
      v4: Add proper domain management to fbcon so that the fbcon buffer is
      also tracked correctly.
      
      v5: Fixup locking around the fbcon set_to_gtt_domain call.
      
      v6: More comments from Chris:
      - Split out fbcon changes.
      - Drop superflous checks for potential scanout before calling intel_fb
        functions - we can micro-optimize this later.
      - s/intel_fb_/intel_fb_obj_/ to make it clear that this deals in gem
        object. We already have precedence for fb_obj in the pin_and_fence
        functions.
      
      v7: Clarify the semantics of the flip flush handling by renaming
      things a bit:
      - Don't go through a gem object but take the relevant frontbuffer bits
        directly. These functions center on the plane, the actual object is
        irrelevant - even a flip to the same object as already active should
        cause a flush.
      - Add a new intel_frontbuffer_flip for synchronous plane updates. It
        currently just calls intel_frontbuffer_flush since the implemenation
        differs.
      
      This way we achieve a clear split between one-shot update events on
      one side and frontbuffer rendering with potentially a very long delay
      between the invalidate and flush.
      
      Chris and I also had some discussions about mark_busy and whether it
      is appropriate to call from flush. But mark busy is a state which
      should be derived from the 3 events (invalidate, flush, flip) we now
      have by the users, like psr does by tracking relevant information in
      psr.busy_frontbuffer_bits. DRRS (the only real use of mark_busy for
      frontbuffer) needs to have similar logic. With that the overall
      mark_busy in the core could be removed.
      
      v8: Only when retiring gpu buffers only flush frontbuffer bits we
      actually invalidated in a batch. Just for safety since before any
      additional usage/invalidate we should always retire current rendering.
      Suggested by Chris Wilson.
      
      v9: Actually use intel_frontbuffer_flip in all appropriate places.
      Spotted by Chris.
      
      v10: Address more comments from Chris:
      - Don't call _flip in set_base when the crtc is inactive, avoids redunancy
        in the modeset case with the initial enabling of all planes.
      - Add comments explaining that the initial/final plane enable/disable
        still has work left to do before it's fully generic.
      
      v11: Only invalidate for gtt/cpu access when writing. Spotted by Chris.
      
      v12: s/_flush/_flip/ in intel_overlay.c per Chris' comment.
      
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      f99d7069
  12. 19 6月, 2014 2 次提交
    • D
      drm/i915: Introduce accurate frontbuffer tracking · a071fa00
      Daniel Vetter 提交于
      So from just a quick look we seem to have enough information to
      accurately figure out whether a given gem bo is used as a frontbuffer
      and where exactly: We have obj->pin_count as a first check with no
      false negatives and only negligible false positives. And then we can
      just walk the modeset objects and figure out where exactly a buffer is
      used as scanout.
      
      Except that we can't due to locking order: If we already hold
      dev->struct_mutex we can't acquire any modeset locks, so could
      potential chase freed pointers and other evil stuff.
      
      So we need something else. For that introduce a new set of bits
      obj->frontbuffer_bits to track where a buffer object is used. That we
      can then chase without grabbing any modeset locks.
      
      Of course the consumers of this (DRRS, PSR, FBC, ...) still need to be
      able to do their magic both when called from modeset and from gem
      code. But that can be easily achieved by adding locks for these
      specific subsystems which always nest within either kms or gem
      locking.
      
      This patch just adds the relevant update code to all places.
      
      Note that if we ever support multi-planar scanout targets then we need
      one frontbuffer tracking bit per attachment point that we expose to
      userspace.
      
      v2:
      - Fix more oopsen. Oops.
      - WARN if we leak obj->frontbuffer_bits when freeing a gem buffer. Fix
        the bugs this brought to light.
      - s/update_frontbuffer_bits/update_fb_bits/. More consistent with the
        fb tracking functions (fb for gem object, frontbuffer for raw bits).
        And the function name was way too long.
      
      v3: Size obj->frontbuffer_bits correctly so that all pipes fit in.
      
      v4: Don't update fb bits in set_base on failure. Noticed by Chris.
      
      v5: s/i915_gem_update_fb_bits/i915_gem_track_fb/ Also remove a few
      local enum pipe variables which are now no longer needed to make the
      function arguments no drop over the 80 char limit.
      
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      a071fa00
    • D
      drm/i915: Drop schedule_back from psr_exit · 3108e99e
      Daniel Vetter 提交于
      It doesn't make sense to never again schedule the work, since by the
      time we might want to re-enable psr the world might have changed and
      we can do it again.
      
      The only exception is when we shut down the pipe, but that's an
      entirely different thing and needs to be handled in psr_disable.
      
      Note that later patch will again split psr_exit into psr_invalidate
      and psr_flush. But the split is different and this simplification
      helps with the transition.
      
      v2: Improve the commit message a bit.
      
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Reviewed-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      3108e99e
  13. 18 6月, 2014 3 次提交
  14. 17 6月, 2014 2 次提交
    • S
      drm/i915: Replaced Blitter ring based flips with MMIO flips · 84c33a64
      Sourab Gupta 提交于
      This patch enables the framework for using MMIO based flip calls,
      in contrast with the CS based flip calls which are being used currently.
      
      MMIO based flip calls can be enabled on architectures where
      Render and Blitter engines reside in different power wells. The
      decision to use MMIO flips can be made based on workloads to give
      100% residency for Media power well.
      
      v2: The MMIO flips now use the interrupt driven mechanism for issuing the
      flips when target seqno is reached. (Incorporating Ville's idea)
      
      v3: Rebasing on latest code. Code restructuring after incorporating
      Damien's comments
      
      v4: Addressing Ville's review comments
          -general cleanup
          -updating only base addr instead of calling update_primary_plane
          -extending patch for gen5+ platforms
      
      v5: Addressed Ville's review comments
          -Making mmio flip vs cs flip selection based on module parameter
          -Adding check for DRIVER_MODESET feature in notify_ring before calling
           notify mmio flip.
          -Other changes mostly in function arguments
      
      v6: -Having a seperate function to check condition for using mmio flips (Ville)
          -propogating error code from i915_gem_check_olr (Ville)
      
      v7: -Adding __must_check with i915_gem_check_olr (Chris)
          -Renaming mmio_flip_data to mmio_flip (Chris)
          -Rebasing on latest nightly
      
      v8: -Rebasing on latest code
          -squash 3rd patch in series(mmio setbase vs page flip race) with this patch
          -Added new tiling mode update in intel_do_mmio_flip (Chris)
      
      v9: -check for obj->last_write_seqno being 0 instead of obj->ring being NULL in
      intel_postpone_flip, as this is a more restrictive condition (Chris)
      
      v10: -Applied Chris's suggestions for squashing patches 2,3 into this patch.
      These patches make the selection of CS vs MMIO flip at the page flip time, and
      make the module parameter for using mmio flips as tristate, the states being
      'force CS flips', 'force mmio flips', 'driver discretion'.
      Changed the logic for driver discretion (Chris)
      
      v11: Minor code cleanup(better readability, fixing whitespace errors, using
      lockdep to check mutex locked status in postpone_flip, removal of __must_check
      in function definition) (Chris)
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NSourab Gupta <sourab.gupta@intel.com>
      Signed-off-by: NAkash Goel <akash.goel@intel.com>
      Tested-by: Chris Wilson <chris@chris-wilson.co.uk> # snb, ivb
      [danvet: Fix up parameter alignement checkpatch spotted.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      84c33a64
    • C
      drm/i915: Simplify i915_gem_release_all_mmaps() · 6254b204
      Chris Wilson 提交于
      An object can only have an active gtt mapping if it is currently bound
      into the global gtt. Therefore we can simply walk the list of all bound
      objects and check the flag upon those for an active gtt mapping.
      
      From commit 48018a57
      Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
      Date:   Fri Dec 13 15:22:31 2013 -0200
      
          drm/i915: release the GTT mmaps when going into D3
      
      Also note that the WARN is inappropriate for this function as GPU
      activity is orthogonal to GTT mmap status. Rather it is the caller that
      relies upon this condition and so it should assert that the GPU is idle
      itself.
      
      References: https://bugs.freedesktop.org/show_bug.cgi?id=80081Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@gmail.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: NPaulo Zanoni <paulo.r.zanoni@intel.com>
      Tested-by: NPaulo Zanoni <paulo.r.zanoni@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      6254b204
  15. 14 6月, 2014 1 次提交
    • R
      drm/i915: Force PSR exit by inactivating it. · 7c8f8a70
      Rodrigo Vivi 提交于
      The perfect solution for psr_exit is the hardware tracking the changes and
      doing the psr exit by itself. This scenario works for HSW and BDW with some
      environments like Gnome and Wayland.
      
      However there are many other scenarios that this isn't true. Mainly one right
      now is KDE users on HSW and BDW with PSR on. User would miss many screen
      updates. For instances any key typed could be seen only when mouse cursor is
      moved. So this patch introduces the ability of trigger PSR exit on kernel side
      on some common cases that.
      
      Most of the cases are coverred by psr_exit at set_domain. The remaining cases
      are coverred by triggering it at set_domain, busy_ioctl, sw_finish and
      mark_busy.
      
      The downside here might be reducing the residency time on the cases this
      already work very wall like Gnome environment. But so far let's get focused
      on fixinge issues sio PSR couild be used for everybody and we could even
      get it enabled by default. Later we can add some alternatives to choose the
      level of PSR efficiency over boot flag of even over crtc property.
      
      v2: remove exit from connector_dpms. Daniel pointed this is the wrong way and
      also this isn't needed for BDW and HSW anyway.
      
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: NVijay Purushothaman <vijay.a.purushothaman@intel.com>
      Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      7c8f8a70
  16. 13 6月, 2014 2 次提交
    • C
      drm/i915: Prefault the entire object on first page fault · b90b91d8
      Chris Wilson 提交于
      Inserting additional PTEs has no side-effect for us as the pfn are fixed
      for the entire time the object is resident in the global GTT. The
      downside is that we pay the entire cost of faulting the object upon the
      first hit, for which we in return receive the benefit of removing the
      per-page faulting overhead.
      
      On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences,
      Upload rate for 2 linear surfaces:	8127MiB/s -> 8134MiB/s
      Upload rate for 2 tiled surfaces:	8607MiB/s -> 8625MiB/s
      Upload rate for 4 linear surfaces:	8127MiB/s -> 8127MiB/s
      Upload rate for 4 tiled surfaces:	8611MiB/s -> 8602MiB/s
      Upload rate for 8 linear surfaces:	8114MiB/s -> 8124MiB/s
      Upload rate for 8 tiled surfaces:	8601MiB/s -> 8603MiB/s
      Upload rate for 16 linear surfaces:	8110MiB/s -> 8123MiB/s
      Upload rate for 16 tiled surfaces:	8595MiB/s -> 8606MiB/s
      Upload rate for 32 linear surfaces:	8104MiB/s -> 8121MiB/s
      Upload rate for 32 tiled surfaces:	8589MiB/s -> 8605MiB/s
      Upload rate for 64 linear surfaces:	8107MiB/s -> 8121MiB/s
      Upload rate for 64 tiled surfaces:	2013MiB/s -> 3017MiB/s
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: "Goel, Akash" <akash.goel@intel.com>
      Testcasee: igt/gem_fence_upload/performance
      Reviewed-by: NBrad Volkin <bradley.d.volkin@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b90b91d8
    • C
      drm/i915: Use the .release hook to drop the stolen drm_mm tracking · ef0cf27c
      Chris Wilson 提交于
      Now that we have a release hook into i915_gem_object_free, we can move
      the explicit call to the internal stolen function and hook it up
      throught the callback instead.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      ef0cf27c
  17. 11 6月, 2014 1 次提交
  18. 05 6月, 2014 2 次提交
  19. 27 5月, 2014 1 次提交
    • C
      drm/i915: Prevent negative relocation deltas from wrapping · d23db88c
      Chris Wilson 提交于
      This is pure evil. Userspace, I'm looking at you SNA, repacks batch
      buffers on the fly after generation as they are being passed to the
      kernel for execution. These batches also contain self-referenced
      relocations as a single buffer encompasses the state commands, kernels,
      vertices and sampler. During generation the buffers are placed at known
      offsets within the full batch, and then the relocation deltas (as passed
      to the kernel) are tweaked as the batch is repacked into a smaller buffer.
      This means that userspace is passing negative relocations deltas, which
      subsequently wrap to large values if the batch is at a low address. The
      GPU hangs when it then tries to use the large value as a base for its
      address offsets, rather than wrapping back to the real value (as one
      would hope). As the GPU uses positive offsets from the base, we can
      treat the relocation address as the minimum address read by the GPU.
      For the upper bound, we trust that userspace will not read beyond the
      end of the buffer.
      
      So, how do we fix negative relocations from wrapping? We can either
      check that every relocation looks valid when we write it, and then
      position each object such that we prevent the offset wraparound, or we
      just special-case the self-referential behaviour of SNA and force all
      batches to be above 256k. Daniel prefers the latter approach.
      
      This fixes a GPU hang when it tries to use an address (relocation +
      offset) greater than the GTT size. The issue would occur quite easily
      with full-ppgtt as each fd gets its own VM space, so low offsets would
      often be handed out. However, with the rearrangement of the low GTT due
      to capturing the BIOS framebuffer, it is already affecting kernels 3.15
      onwards. I think only IVB+ is susceptible to this bug, but the workaround
      should only kick in rarely, so it seems sensible to always apply it.
      
      v3: Use a bias for batch buffers to prevent small negative delta relocations
      from wrapping.
      
      v4 from Daniel:
      - s/BIAS/BATCH_OFFSET_BIAS/
      - Extract eb_vma_misplaced/i915_vma_misplaced since the conditions
        were growing rather cumbersome.
      - Add a comment to eb_get_batch explaining why we do this.
      - Apply the batch offset bias everywhere but mention that we've only
        observed it on gen7 gpus.
      - Drop PIN_OFFSET_FIX for now, that slipped in from a feature patch.
      
      v5: Add static to eb_get_batch, spotted by 0-day tester.
      
      Testcase: igt/gem_bad_reloc
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78533
      Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
      Cc: stable@vger.kernel.org
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d23db88c