1. 06 9月, 2021 2 次提交
    • D
      drm/i915: Stop rcu support for i915_address_space · dcc5d820
      Daniel Vetter 提交于
      The full audit is quite a bit of work:
      
      - i915_dpt has very simple lifetime (somehow we create a display pagetable vm
        per object, so its _very_ simple, there's only ever a single vma in there),
        and uses i915_vm_close(), which internally does a i915_vm_put(). No rcu.
      
        Aside: wtf is i915_dpt doing in the intel_display.c garbage collector as a new
        feature, instead of added as a separate file with some clean-ish interface.
      
        Also, i915_dpt unfortunately re-introduces some coding patterns from
        pre-dma_resv_lock conversion times.
      
      - i915_gem_proto_ctx is fully refcounted and no rcu, all protected by
        fpriv->proto_context_lock.
      
      - i915_gem_context is itself rcu protected, and that might leak to anything it
        points at. Before
      
      	commit cf977e18
      	Author: Chris Wilson <chris@chris-wilson.co.uk>
      	Date:   Wed Dec 2 11:21:40 2020 +0000
      
      	    drm/i915/gem: Spring clean debugfs
      
        and
      
      	commit db80a129
      	Author: Chris Wilson <chris@chris-wilson.co.uk>
      	Date:   Mon Jan 18 11:08:54 2021 +0000
      
      	    drm/i915/gem: Remove per-client stats from debugfs/i915_gem_objects
      
        we had a bunch of debugfs files that relied on rcu protecting everything, but
        those are gone now. The main one was removed even earlier with
      
        There doesn't seem to be anything left that's actually protecting
        stuff now that the ctx->vm itself is invariant. See
      
      	commit ccbc1b97
      	Author: Jason Ekstrand <jason@jlekstrand.net>
      	Date:   Thu Jul 8 10:48:30 2021 -0500
      
      	    drm/i915/gem: Don't allow changing the VM on running contexts (v4)
      
        Note that we drop the vm refcount before the final release of the gem context
        refcount, so this is all very dangerous even without rcu. Note that aside from
        later on creating new engines (a defunct feature) and debug output we're never
        looked at gem_ctx->vm for anything functional, hence why this is ok.
        Fingers crossed.
      
        Preceeding patches removed all vestiges of rcu use from gem_ctx->vm
        derferencing to make it clear it's really not used.
      
        The gem_ctx->rcu protection was introduced in
      
      	commit a4e7ccda
      	Author: Chris Wilson <chris@chris-wilson.co.uk>
      	Date:   Fri Oct 4 14:40:09 2019 +0100
      
      	    drm/i915: Move context management under GEM
      
        The commit message is somewhat entertaining because it fails to
        mention this fact completely, and compensates that by an in-commit
        changelog entry that claims that ctx->vm is protected by ctx->mutex.
        Which was the case _before_ this commit, but no longer after it.
      
      - intel_context holds a full reference. Unfortunately intel_context is also rcu
        protected and the reference to the ->vm is dropped before the
        rcu barrier - only the kfree is delayed. So again we need to check
        whether that leaks anywhere on the intel_context->vm. RCU is only
        used to protect intel_context sitting on the breadcrumb lists, which
        don't look at the vm anywhere, so we are fine.
      
        Nothing else relies on rcu protection of intel_context and hence is
        fully protected by the kref refcount alone, which protects
        intel_context->vm in turn.
      
        The breadcrumbs rcu usage was added in
      
      	commit c744d503
      	Author: Chris Wilson <chris@chris-wilson.co.uk>
      	Date:   Thu Nov 26 14:04:06 2020 +0000
      
      	    drm/i915/gt: Split the breadcrumb spinlock between global and contexts
      
        its parent commit added the intel_context rcu protection:
      
      	commit 14d1eaf0
      	Author: Chris Wilson <chris@chris-wilson.co.uk>
      	Date:   Thu Nov 26 14:04:05 2020 +0000
      
      	    drm/i915/gt: Protect context lifetime with RCU
      
        given some credence to my claim that I've actually caught them all.
      
      - drm_i915_gem_object's shares_resv_from pointer has a full refcount to the
        dma_resv, which is a sub-refcount that's released after the final
        i915_vm_put() has been called. Safe.
      
        Aside: Maybe we should have a struct dma_resv_shared which is just dma_resv +
        kref as a stand-alone thing. It's a pretty useful pattern which other drivers
        might want to copy.
      
        For a bit more context see
      
      	commit 4d8151ae
      	Author: Thomas Hellström <thomas.hellstrom@linux.intel.com>
      	Date:   Tue Jun 1 09:46:41 2021 +0200
      
      	    drm/i915: Don't free shared locks while shared
      
      - the fpriv->vm_xa was relying on rcu_read_lock for lookup, but that
        was updated in a prep patch too to just be a spinlock-protected
        lookup.
      
      - intel_gt->vm is set at driver load in intel_gt_init() and released
        in intel_gt_driver_release(). There seems to be some issue that
        in some error paths this is called twice, but otherwise no rcu to be
        found anywhere. This was added in the below commit, which
        unfortunately doesn't explain why this complication exists.
      
      	commit e6ba7648
      	Author: Chris Wilson <chris@chris-wilson.co.uk>
      	Date:   Sat Dec 21 16:03:24 2019 +0000
      
      	    drm/i915: Remove i915->kernel_context
      
        The proper fix most likely for this is to start using drmm_ at large
        scale, but that's also huge amounts of work.
      
      - i915_vma->vm is some real pain, because rcu is rcu protected, at
        least in the vma lookup in the context lookup cache in
        eb_lookup_vma(). This was added in
      
      	commit 4ff4b44c
      	Author: Chris Wilson <chris@chris-wilson.co.uk>
      	Date:   Fri Jun 16 15:05:16 2017 +0100
      
      	    drm/i915: Store a direct lookup from object handle to vma
      
        This was changed to a radix tree from the hashtable in, but with the
        locking unchanged, in
      
      	commit d1b48c1e
      	Author: Chris Wilson <chris@chris-wilson.co.uk>
      	Date:   Wed Aug 16 09:52:08 2017 +0100
      
      	    drm/i915: Replace execbuf vma ht with an idr
      
        In
      
      	commit 93159e12
      	Author: Chris Wilson <chris@chris-wilson.co.uk>
      	Date:   Mon Mar 23 09:28:41 2020 +0000
      
      	    drm/i915/gem: Avoid gem_context->mutex for simple vma lookup
      
        the locking was changed from dev->struct_mutex to rcu, which added
        the requirement to rcu protect i915_vma. Somehow this was missed in
        review (or I'm completely blind).
      
        Irrespective of all that the vma lookup cache rcu_read_lock grabs a
        full reference of the vma and the rcu doesn't leak further. So no
        impact on i915_address_space from that.
      
        I have not found any other rcu use for i915_vma, but given that it
        seems broken I also didn't bother to do a careful in-depth audit.
      
      Alltogether there's nothing left in-tree anymore which requires that a
      pointer deref to an i915_address_space is safe undre rcu_read_lock
      only.
      
      rcu protection of i915_address_space was introduced in
      
      commit b32fa811
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Thu Jun 20 19:37:05 2019 +0100
      
          drm/i915/gtt: Defer address space cleanup to an RCU worker
      
      by mixing up a bugfixing (i915_address_space needs to be released from
      a worker) with enabling rcu support. The commit message also seems
      somewhat confused, because it talks about cleanup of WC pages
      requiring sleep, while the code and linked bugzilla are about a
      requirement to take dev->struct_mutex (which yes sleeps but it's a
      much more specific problem). Since final kref_put can be called from
      pretty much anywhere (including hardirq context through the
      scheduler's i915_active cleanup) we need a worker here. Hence that
      part must be kept.
      
      Ideally all these reclaim workers should have some kind of integration
      with our shrinkers, but for some of these it's rather tricky. Anyway,
      that's a preexisting condition in the codeebase that we wont fix in
      this patch here.
      
      We also remove the rcu_barrier in ggtt_cleanup_hw added in
      
      commit 60a4233a
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Mon Jul 29 14:24:12 2019 +0100
      
          drm/i915: Flush the i915_vm_release before ggtt shutdown
      Reviewed-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      Cc: Jon Bloomfield <jon.bloomfield@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-11-daniel.vetter@ffwll.ch
      dcc5d820
    • D
      drm/i915: Rename i915_gem_context_get_vm_rcu to i915_gem_context_get_eb_vm · c6d04e48
      Daniel Vetter 提交于
      The important part isn't so much that this does an rcu lookup - that's
      more an implementation detail, which will also be removed.
      
      The thing that makes this different from other functions is that it's
      gettting you the vm that batchbuffers will run in for that gem
      context, which is either a full ppgtt stored in gem->ctx, or the ggtt.
      
      We'll make more use of this function later on.
      Reviewed-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      Cc: Jon Bloomfield <jon.bloomfield@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210902142057.929669-5-daniel.vetter@ffwll.ch
      c6d04e48
  2. 03 9月, 2021 5 次提交
  3. 01 9月, 2021 1 次提交
  4. 27 8月, 2021 1 次提交
  5. 25 8月, 2021 2 次提交
  6. 24 8月, 2021 1 次提交
  7. 21 8月, 2021 1 次提交
  8. 20 8月, 2021 1 次提交
    • M
      drm/i915: Fix syncmap memory leak · faf89098
      Matthew Brost 提交于
      A small race exists between intel_gt_retire_requests_timeout and
      intel_timeline_exit which could result in the syncmap not getting
      free'd. Rather than work to hard to seal this race, simply cleanup the
      syncmap on fini.
      
      unreferenced object 0xffff88813bc53b18 (size 96):
        comm "gem_close_race", pid 5410, jiffies 4294917818 (age 1105.600s)
        hex dump (first 32 bytes):
          01 00 00 00 00 00 00 00 00 00 00 00 0a 00 00 00  ................
          00 00 00 00 00 00 00 00 6b 6b 6b 6b 06 00 00 00  ........kkkk....
        backtrace:
          [<00000000120b863a>] __sync_alloc_leaf+0x1e/0x40 [i915]
          [<00000000042f6959>] __sync_set+0x1bb/0x240 [i915]
          [<0000000090f0e90f>] i915_request_await_dma_fence+0x1c7/0x400 [i915]
          [<0000000056a48219>] i915_request_await_object+0x222/0x360 [i915]
          [<00000000aaac4ee3>] i915_gem_do_execbuffer+0x1bd0/0x2250 [i915]
          [<000000003c9d830f>] i915_gem_execbuffer2_ioctl+0x405/0xce0 [i915]
          [<00000000fd7a8e68>] drm_ioctl_kernel+0xb0/0xf0 [drm]
          [<00000000e721ee87>] drm_ioctl+0x305/0x3c0 [drm]
          [<000000008b0d8986>] __x64_sys_ioctl+0x71/0xb0
          [<0000000076c362a4>] do_syscall_64+0x33/0x80
          [<00000000eb7a4831>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Signed-off-by: NMatthew Brost <matthew.brost@intel.com>
      Fixes: 531958f6 ("drm/i915/gt: Track timeline activeness in enter/exit")
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210730195342.110234-1-matthew.brost@intel.com
      faf89098
  9. 19 8月, 2021 1 次提交
    • M
      drm/i915/dg2: Maintain backward-compatible nested batch behavior · 9e9dfd08
      Matt Roper 提交于
      For tgl+, the per-context setting of MI_MODE[12] determines whether
      the bits of a nested MI_BATCH_BUFFER_START instruction should be
      interpreted in the traditional manner or whether they should
      instead use a new tgl+ meaning that breaks backward compatibility, but
      allows nesting into 3rd-level batchbuffers.  For previous platforms,
      the hardware default for this register bit is to maintain
      backward-compatible behavior unless a context intentionally opts into
      the new behavior; however Xe_HPG flips the hardware default behavior.
      
      From a SW perspective, we want to maintain the backward-compatible
      behavior for userspace, so we'll apply a fake workaround to set it back
      to the legacy behavior on platforms where the hardware default is to
      break compatibility.  At the moment there is no Linux userspace that
      utilizes third-level batchbuffers, so this will avoid userspace from
      needing to make any changes.  using the legacy meaning is the correct
      thing to do.  If/when we have userspace consumers that want to utilize
      third-level batch nesting, we can provide a context parameter to allow
      them to opt-in.
      
      Bspec: 45974, 45718
      Cc: John Harrison <John.C.Harrison@Intel.com>
      Signed-off-by: NMatt Roper <matthew.d.roper@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210805163647.801064-9-matthew.d.roper@intel.comReviewed-by: NAnusha Srivatsa <anusha.srivatsa@intel.com>
      9e9dfd08
  10. 13 8月, 2021 2 次提交
  11. 11 8月, 2021 4 次提交
  12. 05 8月, 2021 4 次提交
  13. 04 8月, 2021 15 次提交