1. 08 3月, 2014 1 次提交
    • D
      drm/i915: Disable full ppgtt by default · 93a25a9e
      Daniel Vetter 提交于
      There are too many oustanding issues:
      
      - Fence handling in the current code is broken. There's a patch series
        from me, but it's blocked on and extended review (which includes
        writing the testcases).
      
      - IOMMU mapping handling is broken, we need to properly refcount it -
        currently it gets destroyed when the first vma is unbound, so way
        too early.
      
      - There's a pending reset issue on snb. Since Mika's reset work and
        full ppgtt have been pulled in in separate branches and ended up
        intermittingly breaking each another it's unclear who's the exact
        culprit here.
      
      - We still have persistent evidince of crazy recursion bugs through
        vma_unbind and ppgtt_relase, e.g.
      
        https://bugs.freedesktop.org/show_bug.cgi?id=73383
      
        This issue (and a few others meanwhile resolved) have blocked our
        performance measuring/tuning group since 3 months.
      
      - Secure batch dispatching is broken. This is blocking Brad Volkin's
        command checker work since 3 months.
      
      All these issues are confirmed to only happen when full ppgtt is
      enabled, falling back to aliasing ppgtt resolves them. But even
      aliasing ppgtt itself still has a regression:
      
      - We currently unconditionally bind objects into the aliasing ppgtt,
        which means all priviledged objects like ringbuffers are visible to
        unpriviledged access again. On top of that this also breaks the
        command checker for aliasing ppgtt, since it can't hide the
        validated batch any more.
      
      Furthermore topic/full-ppgtt has never been reviewed:
      
      - Lifetime rules around vma unbinding/release are unclear, resulting
        into this awesome hack called ppgtt_release. Which seems to take the
        blame for most of the recursion fallout.
      
      - Context/ring init works different on gpu reset than anywhere else.
        Such differeneces have in the past always lead to really hard to
        track down bugs.
      
      - Aliasing ppgtt is treated in a bunch of places as a real address
        space, but it isn't - the real address space is always the global
        gtt in that case. This results in a bit a mess between contexts and
        ppgtt object, further complication the context/ppgtt/vma lifetime
        rules.
      
      - We don't have any docs describing the overall concepts introduced
        with full ppgtt. A short, concise overview describing vmas and some
        of the strange bits around them (like the unbound vmas used by
        execbuf, or the new binding rules) really is needed.
      
      Note that a lot of the post topic/full-ppgtt merge fallout has already
      been addressed, this entire list here of 10 issues really only contains
      the still outstanding issues.
      
      Finally the 3.15 merge window is approaching and I think we need to
      use the remaining time to ensure that our fallback option of using
      aliasing ppgtt is in solid shape. Hence I think it's time to throw the
      switch. While at it demote the helper from static inline status
      because really.
      
      Cc: Ben Widawsky <ben@bwidawsk.net>
      Cc: Dave Airlie <airlied@gmail.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      93a25a9e
  2. 06 3月, 2014 6 次提交
    • B
      drm/i915/bdw: Kill ppgtt->num_pt_pages · 5abbcca3
      Ben Widawsky 提交于
      With the original PPGTT implementation if the number of PDPs was not a
      power of two, the number of pages for the page tables would end up being
      rounded up. The code actually had a bug here afaict, but this is a
      theoretical bug as I don't believe this can actually occur with the
      current code/HW..
      
      With the rework of the page table allocations, there is no longer a
      distinction between number of page table pages, and number of page
      directory entries. To avoid confusion, kill the redundant (and newer)
      struct member.
      
      Cc: Imre Deak <imre.deak@intel.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      5abbcca3
    • B
      drm/i915: Split GEN6 PPGTT initialization up · b146520f
      Ben Widawsky 提交于
      Simply to match the GEN8 style of PPGTT initialization, split up the
      allocations and mappings. Unlike GEN8, we skip a separate dma_addr_t
      allocation function, as it is much simpler pre-gen8.
      
      With this code it would be easy to make a more general PPGTT
      initialization function with per GEN alloc/map/etc. or use a common
      helper, similar to the ringbuffer code. I don't see a benefit to doing
      this just yet, but who knows...
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b146520f
    • B
      drm/i915: Split GEN6 PPGTT cleanup · a00d825d
      Ben Widawsky 提交于
      This cleanup is similar to the GEN8 cleanup (though less necessary).
      Having everything split will make cleaning the initialization path error
      paths easier to understand.
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      a00d825d
    • B
      drm/i915: Update i915_gem_gtt.c copyright · c4ac524c
      Ben Widawsky 提交于
      I keep meaning to do this... by now almost the entire file has been
      written by an Intel employee (including Daniel post-2010).
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      c4ac524c
    • B
      Revert "drm/i915/bdw: Limit GTT to 2GB" · 7907f45b
      Ben Widawsky 提交于
      This reverts commit 3a2ffb65.
      
      Now that the code is fixed to use smaller allocations, it should be safe
      to let the full GGTT be used on BDW.
      
      The testcase for this is anything which uses more than half of the GTT,
      thus eclipsing the old limit.
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      7907f45b
    • B
      drm/i915/bdw: Reorganize PT allocations · 7ad47cf2
      Ben Widawsky 提交于
      The previous allocation mechanism would get 2 contiguous allocations,
      one for the page directories, and one for the page tables. As each page
      table is 1 page, and there are 512 of these per page directory, this
      goes to 2MB. An unfriendly request at best. Worse still, our HW now
      supports 4 page directories, and a 2MB allocation is not allowed.
      
      In order to fix this, this patch attempts to split up each page table
      allocation into a single, discrete allocation. There is nothing really
      fancy about the patch itself, it just has to manage an extra pointer
      indirection, and have a fancier bit of logic to free up the pages.
      
      To accommodate some of the added complexity, two new helpers are
      introduced to allocate, and free the page table pages.
      
      NOTE: I really wanted to split the way we do allocations, and the way in
      which we identify the page table/page directory being used. I found
      splitting this functionality up to be too unwieldy. I apologize in
      advance to the reviewer. I'd recommend looking at the result, rather
      than the diff.
      
      v2/NOTE2: This patch predated commit:
      6f1cc993
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Tue Dec 31 15:50:31 2013 +0000
      
          drm/i915: Avoid dereference past end of page arr
      
      It fixed the same issue as that patch, but because of the limbo state of
      PPGTT, Chris patch was merged instead. The excess churn is a result of
      my using my original patch, which has my preferred naming. Primarily
      act_* is changed to which_*, but it's mostly the same otherwise. I've
      kept the convention Chris used for the pte wrap (I had something
      slightly different, and broken - but fixable)
      
      v3: Rename which_p[..]e to drop which_ (Chris)
      Remove BUG_ON in inner loop (Chris)
      Redo the pde/pdpe wrap logic (Chris)
      
      v4: s/1MB/2MB in commit message (Imre)
      Plug leaking gen8_pt_pages in both the error path, as well as general
      free case (Imre)
      
      v5: Rename leftover "which_" variables (Imre)
      Add the pde = 0 wrap that was missed from v3 (Imre)
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      [danvet: Squash in fixup from Ben.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      7ad47cf2
  3. 04 3月, 2014 4 次提交
    • B
      drm/i915: Make clear/insert vfuncs args absolute · 782f1495
      Ben Widawsky 提交于
      This patch converts insert_entries and clear_range, both functions which
      are specific to the VM. These functions tend to encapsulate the gen
      specific PTE writes. Passing absolute addresses to the insert_entries,
      and clear_range will help make the logic clearer within the functions as
      to what's going on. Currently, all callers simply do the appropriate
      page shift, which IMO, ends up looking weird with an upcoming change for
      the gen8 page table allocations.
      
      Up until now, the PPGTT was a funky 2 level page table. GEN8 changes
      this to look more like a 3 level page table, and to that extent we need
      a significant amount more memory simply for the page tables. To address
      this, the allocations will be split up in finer amounts.
      
      v2: Replace size_t with uint64_t (Chris, Imre)
      
      v3: Fix size in gen8_ppgtt_init (Ben)
      Fix Size in i915_gem_suspend_gtt_mappings/restore (Imre)
      
      Reviewed-by: Imre Deak <imre.deak@intel.com> (v2)
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      782f1495
    • B
      drm/i915/bdw: Split ppgtt initialization up · bf2b4ed2
      Ben Widawsky 提交于
      Like cleanup in an earlier patch, the code becomes much more readable,
      and easier to extend if we extract out helper functions for the various
      stages of init.
      
      Note that with this patch it becomes really simple, and tempting to begin
      using the 'goto out' idiom with explicit free/fini semantics. I've
      kept the error path as similar as possible to the cleanup() function to
      make sure cleanup is as robust as possible
      
      v2: Remove comment "NB:From here on, ppgtt->base.cleanup() should
      function properly"
      Update commit message to reflect above
      
      v3: Rebased on top of bugfixes found in the previous patch by Imre
      Moved number of pd pages assertion to the proper place (Imre)
      
      v4:
      Allocate dma address space for num_pd_pages, not num_pd_entries (Ben)
      Don't use gen8_pt_dma_addr after free on error path (Imre)
      With new fix from v4 of the previous patch.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      bf2b4ed2
    • B
      drm/i915/bdw: Reorganize PPGTT init · f3a964b9
      Ben Widawsky 提交于
      Create 3 clear stages in PPGTT init. This will help with upcoming
      changes be more readable. The 3 stages are, allocation, dma mapping, and
      writing the P[DT]Es
      
      One nice benefit to the patches is that it makes 2 very clear error
      points, allocation, and mapping, and avoids having to do any handling
      after writing PTEs (something which was likely buggy before). This
      simplified error handling I suspect will be helpful when we move to
      deferred/dynamic page table allocation and mapping.
      
      The patches also attempts to break up some of the steps into more
      logical reviewable chunks, particularly when we free.
      
      v2: Don't call cleanup on the error path since that takes down the
      drm_mm and list entry, which aren't setup at this point.
      
      v3: Fixes addressing Imre's comments from:
      <1392821989.19792.13.camel@intelbox>
      
      Don't do dynamic allocation for the page table DMA addresses. I can't
      remember why I did it in the first place. This addresses one of Imre's
      other issues.
      
      Fix error path leak of page tables.
      
      v4: Fix the fix of the error path leak. Original fix still leaked page
      tables. (Imre)
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      f3a964b9
    • B
      drm/i915/bdw: Free PPGTT struct · b18b6bde
      Ben Widawsky 提交于
      GEN8 never freed the PPGTT struct. As GEN8 doesn't use full PPGTT, the
      leak is small and only found on a module reload. ie. I don't think this
      needs to go to stable.
      
      v2: The very naive, kfree in gen8 ppgtt cleanup, is subject to a double
      free on PPGTT initialization failure. (Spotted by Imre). Instead this
      patch pulls the ppgtt struct freeing out of the cleanup and leaves it to
      the allocators/callers or the one doing the last kref_put as in standard
      convention
      Reported-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b18b6bde
  4. 14 2月, 2014 2 次提交
  5. 13 2月, 2014 1 次提交
  6. 28 1月, 2014 1 次提交
    • J
      drm/i915: move module parameters into a struct, in a new file · d330a953
      Jani Nikula 提交于
      With 20+ module parameters, I think referring to them via a struct
      improves clarity over just having a bunch of globals. While at it, move
      the parameter initialization and definitions into a new file
      i915_params.c to reduce clutter in i915_drv.c.
      
      Apart from the ill-named i915_enable_rc6, i915_enable_fbc and
      i915_enable_ppgtt parameters, for which we lose the "i915_" prefix
      internally, the module parameters now look the same both on the kernel
      command line and in code. For example, "i915.modeset".
      
      The downsides of the change are losing static on a couple of variables
      and not having the initialization and module_param_named() right next to
      each other. On the other hand, all module parameters are now defined in
      one place at i915_params.c. Plus you can do this to find all module
      parameter references:
      
      $ git grep "i915\." -- drivers/gpu/drm/i915
      
      v2:
      - move the definitions into a new file
      - s/i915_params/i915/
      - make i915_try_reset i915.reset, for consistency
      Signed-off-by: NJani Nikula <jani.nikula@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d330a953
  7. 10 1月, 2014 1 次提交
  8. 08 1月, 2014 1 次提交
  9. 07 1月, 2014 3 次提交
    • C
      drm/i915: Avoid dereference past end of page array in gen8_ppgtt_insert_entries() · 6f1cc993
      Chris Wilson 提交于
      The bug from gen6_ppgtt_insert_entries() was replicated into
      gen8_ppgtt_insert_entries(). This applies the fix for the OOPS from the
      previous patch to the gen8 routine.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Ben Widawsky <benjamin.widawsky@intel.com>
      Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      6f1cc993
    • C
      drm/i915: Avoid dereference past end of page array in gen6_ppgtt_insert_entries() · cc79714f
      Chris Wilson 提交于
      [   89.237347] BUG: unable to handle kernel paging request at ffff880096326000
      [   89.237369] IP: [<ffffffff81347227>] gen6_ppgtt_insert_entries+0x117/0x170
      [   89.237382] PGD 2272067 PUD 25df0e067 PMD 25de5c067 PTE 8000000096326060
      [   89.237394] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      [   89.237404] CPU: 1 PID: 1981 Comm: gem_concurrent_ Not tainted 3.13.0-rc4+ #639
      [   89.237411] Hardware name: Intel Corporation 2012 Client Platform/Emerald Lake 2, BIOS ACRVMBY1.86C.0078.P00.1201161002 01/16/2012
      [   89.237420] task: ffff88024c038030 ti: ffff88024b130000 task.ti: ffff88024b130000
      [   89.237425] RIP: 0010:[<ffffffff81347227>]  [<ffffffff81347227>] gen6_ppgtt_insert_entries+0x117/0x170
      [   89.237435] RSP: 0018:ffff88024b131ae0  EFLAGS: 00010286
      [   89.237440] RAX: ffff880096325000 RBX: 0000000000000400 RCX: 0000000000001000
      [   89.237445] RDX: 0000000000000200 RSI: 0000000000000001 RDI: 0000000000000010
      [   89.237451] RBP: ffff88024b131b30 R08: ffff88024cc3aef0 R09: 0000000000000000
      [   89.237456] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88024cc3ae00
      [   89.237462] R13: ffff88024a578000 R14: 0000000000000001 R15: ffff88024a578ffc
      [   89.237469] FS:  00007ff5475d8900(0000) GS:ffff88025d020000(0000) knlGS:0000000000000000
      [   89.237475] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   89.237480] CR2: ffff880096326000 CR3: 000000024d531000 CR4: 00000000001407e0
      [   89.237485] Stack:
      [   89.237488]  ffff880000000000 0000020000000000 ffff88024b23f2c0 0000000100000000
      [   89.237499]  0000000000000001 000000000007ffff ffff8801e7bf5ac0 ffff8801e7bf5ac0
      [   89.237510]  ffff88024cc3ae00 ffff880248a2ee40 ffff88024b131b58 ffffffff813455ed
      [   89.237521] Call Trace:
      [   89.237528]  [<ffffffff813455ed>] ppgtt_bind_vma+0x3d/0x60
      [   89.237534]  [<ffffffff8133d8dc>] i915_gem_object_pin+0x55c/0x6a0
      [   89.237541]  [<ffffffff8134275b>] i915_gem_execbuffer_reserve_vma.isra.14+0x5b/0x110
      [   89.237548]  [<ffffffff81342a88>] i915_gem_execbuffer_reserve+0x278/0x2c0
      [   89.237555]  [<ffffffff81343d29>] i915_gem_do_execbuffer.isra.22+0x699/0x1250
      [   89.237562]  [<ffffffff81344d91>] ? i915_gem_execbuffer2+0x51/0x290
      [   89.237569]  [<ffffffff81344de6>] i915_gem_execbuffer2+0xa6/0x290
      [   89.237575]  [<ffffffff813014f2>] drm_ioctl+0x4d2/0x610
      [   89.237582]  [<ffffffff81080bf1>] ? cpuacct_account_field+0xa1/0xc0
      [   89.237588]  [<ffffffff81080b55>] ? cpuacct_account_field+0x5/0xc0
      [   89.237597]  [<ffffffff811371c0>] do_vfs_ioctl+0x300/0x520
      [   89.237603]  [<ffffffff810757a1>] ? vtime_account_user+0x91/0xa0
      [   89.237610]  [<ffffffff810e40eb>] ?  context_tracking_user_exit+0x9b/0xe0
      [   89.237617]  [<ffffffff81083d7d>] ? trace_hardirqs_on+0xd/0x10
      [   89.237623]  [<ffffffff81137425>] SyS_ioctl+0x45/0x80
      [   89.237630]  [<ffffffff815afffa>] tracesys+0xd4/0xd9
      [   89.237634] Code: 5b 41 5c 41 5d 41 5e 41 5f 5d c3 66 0f 1f 84 00 00 00 00 00 83 45 bc 01 49 8b 84 24 78 01 00 00 65 ff 0c 25 e0 b8 00 00 8b 55 bc <4c> 8b 2c d0 65 ff 04 25 e0 b8 00 00 49 8b 45 00 48 c1 e8 2d 48
      [   89.237741] RIP  [<ffffffff81347227>] gen6_ppgtt_insert_entries+0x117/0x170
      [   89.237749]  RSP <ffff88024b131ae0>
      [   89.237753] CR2: ffff880096326000
      [   89.237758] ---[ end trace 27416ba8b18d496c ]---
      
      This bug dates back to the original introduction of the
      gen6_ppgtt_insert_entries()
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Ben Widawsky <benjamin.widawsky@intel.com>
      Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
      [danvet: Dropped cc: stable since without full ppgtt there's no way
      we'll access the last page directory with this function since that
      range is occupied (only in the allocator) with the ppgtt pdes. Without
      aliasing we can start to use that range and blow up.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      cc79714f
    • C
      drm/i915: Mention when we enable the Ironlake iommu workarounds · c0a7f818
      Chris Wilson 提交于
      The iommu and gfx on Ironlake do not like each other and require a
      big hammer to prevent hard machine hangs. In
      
      commit 5c042287
      Author: Ben Widawsky <ben@bwidawsk.net>
      Date:   Mon Oct 17 15:51:55 2011 -0700
      
          drm/i915: ILK + VT-d workaround
      
      we added the workaround, but never emitted any debug message that it was
      active. Doing so should help identify known performance regressions.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      c0a7f818
  10. 06 1月, 2014 1 次提交
  11. 18 12月, 2013 18 次提交
    • B
      drm/i915: Add PPGTT dumper · 87d60b63
      Ben Widawsky 提交于
      Dump the aliasing PPGTT with it. The aliasing PPGTT should actually
      always be empty.
      
      TODO: Broadwell. Since we don't yet use full PPGTT on Broadwell, not
      having the dumper is okay.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      87d60b63
    • B
      drm/i915: Remove extraneous mm_switch in ppgtt enable · d2ff7192
      Ben Widawsky 提交于
      Originally this commit message said:
      Now that do_switch does the mm switch, and we always enable the aliasing
      PPGTT, and contexts at the same time, there is no need to continue doing
      this during PPGTT enabling.
      
      Since originally writing the patch however, I introduced the concept of
      synchronous mm switching (using MMIO). Since this is generally not
      recommended in the spec (for reasons unknown), I've isolated its usage
      as much as possible. As such the "extraneous" switch only ever will
      occur when we have full PPGTT.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d2ff7192
    • B
      drm/i915: Use multiple VMs -- the point of no return · 7e0d96bc
      Ben Widawsky 提交于
      As with processes which run on the CPU, the goal of multiple VMs is to
      provide process isolation. Specific to GEN, there is also the ability to
      map more objects per process (2GB each instead of 2Gb-2k total).
      
      For the most part, all the pipes have been laid, and all we need to do
      is remove asserts and actually start changing address spaces with the
      context switch. Since prior to this we've converted the setting of the
      page tables to a streamed version, this is quite easy.
      
      One important thing to point out (since it'd been hotly contested) is
      that with this patch, every context created will have it's own address
      space (provided the HW can do it).
      
      v2: Disable BDW on rebase
      
      NOTE: I tried to make this commit as small as possible. I needed one
      place where I could "turn everything on" and that is here. It could be
      split into finer commits, but I didn't really see much point.
      
      Cc: Eric Anholt <eric@anholt.net>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      7e0d96bc
    • B
      drm/i915: Do aliasing PPGTT init with contexts · bdf4fd7e
      Ben Widawsky 提交于
      We have a default context which suits the aliasing PPGTT well. Tie them
      together so it looks like any other context/PPGTT pair. This makes the
      code cleaner as it won't have to special case aliasing as often.
      
      The patch has one slightly tricky part in the default context creation
      function. In the future (and on aliased setup) we create a new VM for a
      context (potentially). However, if we have aliasing PPGTT, which occurs
      at this point in time for all platforms GEN6+, we can simply manage the
      refcounting to allow things to behave as normal. Now is a good time to
      recall that the aliasing_ppgtt doesn't have a real VM, it uses the GGTT
      drm_mm.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      bdf4fd7e
    • B
      drm/i915: Restore PDEs for all VMs · 80da2161
      Ben Widawsky 提交于
      In following with the old restore code, we must now restore ever PPGTT's
      PDEs, since they aren't proper GEM ojbects.
      
      v2: Rebased on BDW. Only do restore pdes for gen6 & 7
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      80da2161
    • B
      drm/i915: Write PDEs at init instead of enable · 9f273d48
      Ben Widawsky 提交于
      We won't be calling enable() for all PPGTTs. We do need to write PDEs
      for all PPGTTs however. By moving the writing to init (which is called
      for all PPGTTs) we should accomplish this.
      
      ADD NOTE ABOUT PDE restore
      
      TODO: Eventually, we should allocate the page tables on demand.
      
      v2: Rebased on BDW. Only do PDEs for pre-gen8
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      9f273d48
    • B
      drm/i915: Add VM to context · c7c48dfd
      Ben Widawsky 提交于
      Pretty straightforward so far except for the bit about the refcounting.
      The PPGTT will potentially be shared amongst multiple contexts. Because
      contexts themselves have a refcounted lifecycle, the easiest way to
      manage this will be to refcount the PPGTT. To acheive this, we piggy
      back off of the existing context refcount, and will increment and
      decrement the PPGTT refcount with context creation, and destruction.
      
      To put it more clearly, if context A, and context B both use PPGTT 0, we
      can't free the PPGTT until both A, and B are destroyed.
      
      Note that because the PPGTT is permanently pinned (for now), it really
      just matters for the PPGTT destruction, as opposed to making space under
      memory pressure.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      c7c48dfd
    • B
      drm/i915: Reorganize intel_enable_ppgtt · 246cbfb5
      Ben Widawsky 提交于
      This patch consolidates the way in which we handle the various supported
      PPGTT by module parameter in addition to what the hardware supports. It
      strives to make doing the right thing in the code as simple as possible,
      with the USES_ macros.
      
      I've opted to add the full PPGTT argument simply so one can see how I
      intend to use this function. It will not/cannot be used until later.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      246cbfb5
    • B
      drm/i915: Generalize PPGTT init · d6660add
      Ben Widawsky 提交于
      Rearrange the initialization code to try to special case the aliasing
      PPGTT less, and provide usable interfaces for the general case later.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d6660add
    • B
      drm/i915: Flush TLBs after !RCS PP_DIR_BASE · 90252e5c
      Ben Widawsky 提交于
      I've found this by accident. The docs don't really come out and say you
      need to do this. What the docs do tell you is you need to flush the TLBs
      before you set the PP_DIR_BASE, and that the RCS will invalidate its
      TLBs upon setting the new PP_DIR_BASE. It makes no such comment about
      any of the other rings.
      
      Empirically, this indeed fixes a really obvious bug whereby the batches
      being sent to the blitter were not executing (we were executing the
      HSWP somehow instead).
      
      NOTE: This should make no difference with the current code. It only
      applies when we start using multiple VMs.
      
      NOTE2: HSW appears to be immune to this.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      90252e5c
    • B
      drm/i915: Use LRI for switching PP_DIR_BASE · 48a10389
      Ben Widawsky 提交于
      The docs seem to suggest this is the appropriate method (though it
      doesn't say so outright). In other words, we probably should have done
      this before. We certainly must do this for switching VMs on the fly,
      since synchronizing the rings to MMIO updates isn't acceptable.
      
      v2:
      Make the reset code actually work for all rings. Note that this was
      fixed in subsequent commits, but was indeed broken for this commit.
      
      Add a posting read to the reset case. It probably should have existed
      before hand, but since we have no failures; there is no reason to make
      it a separate commit.
      
      Make IS_GEN6 not use the ring because I am seeing crashes when using it.
      It is a bit of a hack in this patch, it will get fixed up in a couple of
      patches.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      48a10389
    • B
      drm/i915: Extract mm switching to function · eeb9488e
      Ben Widawsky 提交于
      In order to do the full context switch with address space, it's
      convenient to have a way to switch the address space. We already have
      this in our code - just pull it out to be called by the context switch
      code later.
      
      v2: Rebased on BDW support. Required adding BDW.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      eeb9488e
    • B
      drm/i915: Use platform specific ppgtt enable · b4a74e3a
      Ben Widawsky 提交于
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b4a74e3a
    • B
      drm/i915: One hopeful eviction on PPGTT alloc · e3cc1995
      Ben Widawsky 提交于
      The patch before this changed the way in which we allocate space for the
      PPGTT PDEs. It began carving out the PPGTT PDEs (which live in the
      Global GTT) from the GGTT's drm_mm. Prior to that patch, the PDEs were
      hidden from the drm_mm, and therefore could never fail to be allocated.
      
      In unfortunate cases, the drm_mm may be full when we want to allocate
      the space. This can technically occur whenever we try to allocate, which
      happens in two places currently. Practically, it can only really ever
      happen at GPU reset.
      
      Later, when we allocate more PDEs for multiple PPGTTs this will
      potentially even more useful.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      e3cc1995
    • B
      drm/i915: Use drm_mm for PPGTT PDEs · c8d4c0d6
      Ben Widawsky 提交于
      When PPGTT support was originally enabled, it was only designed to
      support 1 PPGTT. It therefore made sense to simply hide the GGTT space
      required to enable this from the drm_mm allocator.
      
      Since we intend to support full PPGTT, which means more than 1, and they
      can be created and destroyed ad hoc it will be required to use the
      proper allocation techniques we already have.
      
      The first step here is to make the existing single PPGTT use the
      allocator.
      
      The astute observer will notice that we are reserving space in the GGTT
      for the PDEs for the lifetime of the address space, and would be right
      to question whether or not this is a good idea. It does not make a
      difference with this current patch only the aliasing PPGTT (indeed the
      PDEs should still be hidden from the shrinker). For the future, we are
      allocating from top to bottom to avoid using the precious "gtt
      space" The GGTT space at that point should only be used for scanout, HW
      contexts, ringbuffers, HWSP, PDEs, and a couple of other small buffers
      (potentially) used by the kernel. Everything else should be mapped into
      a PPGTT. To put the consumption in more tangible terms, it takes
      approximately 4 sets of PDEs to equal one 19x10 framebuffer (with no
      fancy stride or alignment constraints). 3/4 of the total [average] GGTT
      can be used for PDEs, and hopefully never touch the 1/4 that the
      framebuffer needs.
      
      The astute, and persistent observer might ask about the page tables
      which are also pinned for the address space. This waste is unfortunate.
      We use 2MB of memory per address space. We leave wrapping the PDEs as a
      real GEM object as a TODO.
      
      v2: Align PDEs to 64b in GTT
      Allocate the node dynamically so we can use drm_mm_put_block
      Now tested on IGT
      Allocate node at the top to avoid fragmentation (Chris)
      
      v3: Use Chris' top down allocator
      
      v4: Embed drm_mm_node into ppgtt struct (Jesse)
      Remove hunks which didn't belong (Jesse)
      
      v5: Don't subtract guard page since we now killed the guard page prior
      to this patch. (Ben)
      
      v6: Rebased and removed guard page stuff.
      Added a chunk to the commit message
      Allow adding a context to mappable region
      
      v7: Undo v3, so we can make the drm patch last in the series
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> (v4)
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      
      squash: drm/i915: allow PPGTT to use mappable
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      c8d4c0d6
    • B
      a3d67d23
    • B
      drm/i915: Create bind/unbind abstraction for VMAs · 6f65e29a
      Ben Widawsky 提交于
      To sum up what goes on here, we abstract the vma binding, similarly to
      the previous object binding. This helps for distinguishing legacy
      binding, versus modern binding. To keep the code churn as minimal as
      possible, I am leaving in insert_entries(). It serves as the per
      platform pte writing basically. bind_vma and insert_entries do share a
      lot of similarities, and I did have designs to combine the two, but as
      mentioned already... too much churn in an already massive patchset.
      
      What follows are the 3 commits which existed discretely in the original
      submissions. Upon rebasing on Broadwell support, it became clear that
      separation was not good, and only made for more error prone code. Below
      are the 3 commit messages with all their history.
      
      drm/i915: Add bind/unbind object functions to VMA
      drm/i915: Use the new vm [un]bind functions
      drm/i915: reduce vm->insert_entries() usage
      
      drm/i915: Add bind/unbind object functions to VMA
      
      As we plumb the code with more VM information, it has become more
      obvious that the easiest way to deal with bind and unbind is to simply
      put the function pointers in the vm, and let those choose the correct
      way to handle the page table updates. This change allows many places in
      the code to simply be vm->bind, and not have to worry about
      distinguishing PPGTT vs GGTT.
      
      Notice that this patch has no impact on functionality. I've decided to
      save the actual change until the next patch because I think it's easier
      to review that way. I'm happy to squash the two, or let Daniel do it on
      merge.
      
      v2:
      Make ggtt handle the quirky aliasing ppgtt
      Add flags to bind object to support above
      Don't ever call bind/unbind directly for PPGTT until we have real, full
      PPGTT (use NULLs to assert this)
      Make sure we rebind the ggtt if there already is a ggtt binding.  This
      happens on set cache levels.
      Use VMA for bind/unbind (Daniel, Ben)
      
      v3: Reorganize ggtt_vma_bind to be more concise and easier to read
      (Ville). Change logic in unbind to only unbind ggtt when there is a
      global mapping, and to remove a redundant check if the aliasing ppgtt
      exists.
      
      v4: Make the bind function a bit smarter about the cache levels to avoid
      unnecessary multiple remaps. "I accept it is a wart, I think unifying
      the pin_vma / bind_vma could be unified later" (Chris)
      Removed the git notes, and put version info here. (Daniel)
      
      v5: Update the comment to not suck (Chris)
      
      v6:
      Move bind/unbind to the VMA. It makes more sense in the VMA structure
      (always has, but I was previously lazy). With this change, it will allow
      us to keep a distinct insert_entries.
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      
      drm/i915: Use the new vm [un]bind functions
      
      Building on the last patch which created the new function pointers in
      the VM for bind/unbind, here we actually put those new function pointers
      to use.
      
      Split out as a separate patch to aid in review. I'm fine with squashing
      into the previous patch if people request it.
      
      v2: Updated to address the smart ggtt which can do aliasing as needed
      Make sure we bind to global gtt when mappable and fenceable. I thought
      we could get away without this initialy, but we cannot.
      
      v3: Make the global GTT binding explicitly use the ggtt VM for
      bind_vma(). While at it, use the new ggtt_vma helper (Chris)
      
      At this point the original mailing list thread diverges. ie.
      
      v4^:
      use target_obj instead of obj for gen6 relocate_entry
      vma->bind_vma() can be called safely during pin. So simply do that
      instead of the complicated conditionals.
      Don't restore PPGTT bound objects on resume path
      Bug fix in resume path for globally bound Bos
      Properly handle secure dispatch
      Rebased on vma bind/unbind conversion
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      
      drm/i915: reduce vm->insert_entries() usage
      
      FKA: drm/i915: eliminate vm->insert_entries()
      
      With bind/unbind function pointers in place, we no longer need
      insert_entries. We could, and want, to remove clear_range, however it's
      not totally easy at this point. Since it's used in a couple of place
      still that don't only deal in objects: setup, ppgtt init, and restore
      gtt mappings.
      
      v2: Don't actually remove insert_entries, just limit its usage. It will
      be useful when we introduce gen8. It will always be called from the vma
      bind/unbind.
      
      Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      6f65e29a
    • B
      drm/i915: Provide PDP updates via MMIO · e178f705
      Ben Widawsky 提交于
      The initial implementation of this function used MMIO to write the PDPs.
      Upon review it was determined (correctly) that the docs say to use LRI.
      The issue is there are times where we want to do a synchronous write
      (GPU reset).
      
      I've tested this, and it works. I've verified with as many people as
      possible that it should work.
      
      This should fix the failing reset problems.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      e178f705
  12. 26 11月, 2013 1 次提交