1. 19 4月, 2017 1 次提交
    • P
      mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU · 5f0d5a3a
      Paul E. McKenney 提交于
      A group of Linux kernel hackers reported chasing a bug that resulted
      from their assumption that SLAB_DESTROY_BY_RCU provided an existence
      guarantee, that is, that no block from such a slab would be reallocated
      during an RCU read-side critical section.  Of course, that is not the
      case.  Instead, SLAB_DESTROY_BY_RCU only prevents freeing of an entire
      slab of blocks.
      
      However, there is a phrase for this, namely "type safety".  This commit
      therefore renames SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU in order
      to avoid future instances of this sort of confusion.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: <linux-mm@kvack.org>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      [ paulmck: Add comments mentioning the old name, as requested by Eric
        Dumazet, in order to help people familiar with the old name find
        the new one. ]
      Acked-by: NDavid Rientjes <rientjes@google.com>
      5f0d5a3a
  2. 03 1月, 2017 1 次提交
  3. 23 12月, 2016 1 次提交
  4. 19 12月, 2016 1 次提交
    • C
      drm/i915: Unify active context tracking between legacy/execlists/guc · e8a9c58f
      Chris Wilson 提交于
      The requests conversion introduced a nasty bug where we could generate a
      new request in the middle of constructing a request if we needed to idle
      the system in order to evict space for a context. The request to idle
      would be executed (and waited upon) before the current one, creating a
      minor havoc in the seqno accounting, as we will consider the current
      request to already be completed (prior to deferred seqno assignment) but
      ring->last_retired_head would have been updated and still could allow
      us to overwrite the current request before execution.
      
      We also employed two different mechanisms to track the active context
      until it was switched out. The legacy method allowed for waiting upon an
      active context (it could forcibly evict any vma, including context's),
      but the execlists method took a step backwards by pinning the vma for
      the entire active lifespan of the context (the only way to evict was to
      idle the entire GPU, not individual contexts). However, to circumvent
      the tricky issue of locking (i.e. we cannot take struct_mutex at the
      time of i915_gem_request_submit(), where we would want to move the
      previous context onto the active tracker and unpin it), we take the
      execlists approach and keep the contexts pinned until retirement.
      The benefit of the execlists approach, more important for execlists than
      legacy, was the reduction in work in pinning the context for each
      request - as the context was kept pinned until idle, it could short
      circuit the pinning for all active contexts.
      
      We introduce new engine vfuncs to pin and unpin the context
      respectively. The context is pinned at the start of the request, and
      only unpinned when the following request is retired (this ensures that
      the context is idle and coherent in main memory before we unpin it). We
      move the engine->last_context tracking into the retirement itself
      (rather than during request submission) in order to allow the submission
      to be reordered or unwound without undue difficultly.
      
      And finally an ulterior motive for unifying context handling was to
      prepare for mock requests.
      
      v2: Rename to last_retired_context, split out legacy_context tracking
      for MI_SET_CONTEXT.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-3-chris@chris-wilson.co.uk
      e8a9c58f
  5. 15 11月, 2016 4 次提交
  6. 11 11月, 2016 1 次提交
  7. 08 11月, 2016 1 次提交
  8. 29 10月, 2016 6 次提交
  9. 25 10月, 2016 1 次提交
    • C
      dma-buf: Rename struct fence to dma_fence · f54d1867
      Chris Wilson 提交于
      I plan to usurp the short name of struct fence for a core kernel struct,
      and so I need to rename the specialised fence/timeline for DMA
      operations to make room.
      
      A consensus was reached in
      https://lists.freedesktop.org/archives/dri-devel/2016-July/113083.html
      that making clear this fence applies to DMA operations was a good thing.
      Since then the patch has grown a bit as usage increases, so hopefully it
      remains a good thing!
      
      (v2...: rebase, rerun spatch)
      v3: Compile on msm, spotted a manual fixup that I broke.
      v4: Try again for msm, sorry Daniel
      
      coccinelle script:
      @@
      
      @@
      - struct fence
      + struct dma_fence
      @@
      
      @@
      - struct fence_ops
      + struct dma_fence_ops
      @@
      
      @@
      - struct fence_cb
      + struct dma_fence_cb
      @@
      
      @@
      - struct fence_array
      + struct dma_fence_array
      @@
      
      @@
      - enum fence_flag_bits
      + enum dma_fence_flag_bits
      @@
      
      @@
      (
      - fence_init
      + dma_fence_init
      |
      - fence_release
      + dma_fence_release
      |
      - fence_free
      + dma_fence_free
      |
      - fence_get
      + dma_fence_get
      |
      - fence_get_rcu
      + dma_fence_get_rcu
      |
      - fence_put
      + dma_fence_put
      |
      - fence_signal
      + dma_fence_signal
      |
      - fence_signal_locked
      + dma_fence_signal_locked
      |
      - fence_default_wait
      + dma_fence_default_wait
      |
      - fence_add_callback
      + dma_fence_add_callback
      |
      - fence_remove_callback
      + dma_fence_remove_callback
      |
      - fence_enable_sw_signaling
      + dma_fence_enable_sw_signaling
      |
      - fence_is_signaled_locked
      + dma_fence_is_signaled_locked
      |
      - fence_is_signaled
      + dma_fence_is_signaled
      |
      - fence_is_later
      + dma_fence_is_later
      |
      - fence_later
      + dma_fence_later
      |
      - fence_wait_timeout
      + dma_fence_wait_timeout
      |
      - fence_wait_any_timeout
      + dma_fence_wait_any_timeout
      |
      - fence_wait
      + dma_fence_wait
      |
      - fence_context_alloc
      + dma_fence_context_alloc
      |
      - fence_array_create
      + dma_fence_array_create
      |
      - to_fence_array
      + to_dma_fence_array
      |
      - fence_is_array
      + dma_fence_is_array
      |
      - trace_fence_emit
      + trace_dma_fence_emit
      |
      - FENCE_TRACE
      + DMA_FENCE_TRACE
      |
      - FENCE_WARN
      + DMA_FENCE_WARN
      |
      - FENCE_ERR
      + DMA_FENCE_ERR
      )
       (
       ...
       )
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk>
      Acked-by: NSumit Semwal <sumit.semwal@linaro.org>
      Acked-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161025120045.28839-1-chris@chris-wilson.co.uk
      f54d1867
  10. 09 9月, 2016 7 次提交
  11. 22 8月, 2016 1 次提交
    • D
      drm/i915: Ensure consistent control flow __i915_gem_active_get_rcu · c75870d8
      Daniel Vetter 提交于
      This issue here is (I think) purely theoretical, since a compiler
      would need to be especially foolish to recompute the value of
      i915_gem_request_completed right after it was already used. Hence the
      additional barrier() is also not really a restriction.
      
      But I believe this to be at least permissible, and since our rcu
      trickery is a beast it's worth to annotate all the corner cases.
      Chris proposed to instead just wrap a READ_ONCE around
      request->fence.seqno in i915_gem_request_completed. But that has a
      measurable impact on code size, and everywhere we hold a full
      reference to the underlying request it's also not needed. And
      personally I'd like to have just enough barriers and locking needed
      for correctness, but not more - it makes it much easier in the future
      to understand what's going on.
      
      Since the busy ioctl has now fully embraced it's races there's no
      point annotating it there too. We really only need it in
      active_get_rcu, since that function _must_ deliver a correct snapshot
      of the active fences (and not chase something else).
      
      v2: Polish the comment a bit more (Chris).
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: http://patchwork.freedesktop.org/patch/msgid/1471856122-466-1-git-send-email-daniel.vetter@ffwll.ch
      c75870d8
  12. 15 8月, 2016 2 次提交
  13. 10 8月, 2016 1 次提交
  14. 09 8月, 2016 4 次提交
  15. 05 8月, 2016 3 次提交
    • C
      drm/i915: Enable i915_gem_wait_for_idle() without holding struct_mutex · dcff85c8
      Chris Wilson 提交于
      The principal motivation for this was to try and eliminate the
      struct_mutex from i915_gem_suspend - but we still need to hold the mutex
      current for the i915_gem_context_lost(). (The issue there is that there
      may be an indirect lockdep cycle between cpu_hotplug (i.e. suspend) and
      struct_mutex via the stop_machine().) For the moment, enabling last
      request tracking for the engine, allows us to do busyness checking and
      waiting without requiring the struct_mutex - which is useful in its own
      right.
      
      As a side-effect of having a robust means for tracking engine busyness,
      we can replace our other busyness heuristic, that of comparing against
      the last submitted seqno. For paranoid reasons, we have a semi-ordered
      check of that seqno inside the hangchecker, which we can now improve to
      an ordered check of the engine's busyness (removing a locked xchg in the
      process).
      
      v2: Pass along "bool interruptible" as being unlocked we cannot rely on
      i915->mm.interruptible being stable or even under our control.
      v3: Replace check Ironlake i915_gpu_busy() with the common precalculated value
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-6-git-send-email-chris@chris-wilson.co.uk
      dcff85c8
    • C
      drm/i915: Introduce i915_gem_active_wait_unlocked() · 2467658e
      Chris Wilson 提交于
      It is useful to be able to wait on pending rendering without grabbing
      the struct_mutex. We can do this by using the i915_gem_active_get_rcu()
      primitive to acquire a reference to the pending request without
      requiring struct_mutex, just the RCU read lock, and then call
      i915_wait_request().
      
      v2: Rebase onto new i915_gem_active_get_unlocked() semantics that take
      the RCU read lock on behalf of the caller.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-1-git-send-email-chris@chris-wilson.co.uk
      2467658e
    • C
      drm/i915: Enable lockless lookup of request tracking via RCU · 0eafec6d
      Chris Wilson 提交于
      If we enable RCU for the requests (providing a grace period where we can
      inspect a "dead" request before it is freed), we can allow callers to
      carefully perform lockless lookup of an active request.
      
      However, by enabling deferred freeing of requests, we can potentially
      hog a lot of memory when dealing with tens of thousands of requests per
      second - with a quick insertion of a synchronize_rcu() inside our
      shrinker callback, that issue disappears.
      
      v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
      hogging memory with the delayed slab frees. At the moment, we wait for a
      grace period in the shrinker, and block for all RCU callbacks on oom.
      Suggested alternatives focus on flushing our RCU callback when we have a
      certain number of outstanding request frees, and blocking on that flush
      after a second high watermark. (So rather than wait for the system to
      run out of memory, we stop issuing requests - both are nondeterministic.)
      
      Paul E. McKenney wrote:
      
      Another approach is synchronize_rcu() after some largish number of
      requests.  The advantage of this approach is that it throttles the
      production of callbacks at the source.  The corresponding disadvantage
      is that it slows things up.
      
      Another approach is to use call_rcu(), but if the previous call_rcu()
      is still in flight, block waiting for it.  Yet another approach is
      the get_state_synchronize_rcu() / cond_synchronize_rcu() pair.  The
      idea is to do something like this:
      
              cond_synchronize_rcu(cookie);
              cookie = get_state_synchronize_rcu();
      
      You would of course do an initial get_state_synchronize_rcu() to
      get things going.  This would not block unless there was less than
      one grace period's worth of time between invocations.  But this
      assumes a busy system, where there is almost always a grace period
      in flight.  But you can make that happen as follows:
      
              cond_synchronize_rcu(cookie);
              cookie = get_state_synchronize_rcu();
              call_rcu(&my_rcu_head, noop_function);
      
      Note that you need additional code to make sure that the old callback
      has completed before doing a new one.  Setting and clearing a flag
      with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
      and smp_store_release()).
      
      v3: More comments on compiler and processor order of operations within
      the RCU lookup and discover we can use rcu_access_pointer() here instead.
      
      v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: "Goel, Akash" <akash.goel@intel.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
      0eafec6d
  16. 04 8月, 2016 5 次提交