1. 10 3月, 2020 1 次提交
    • C
      drm/i915: Mark racy read of intel_engine_cs.saturated · 60900add
      Chris Wilson 提交于
      [ 3783.276728] BUG: KCSAN: data-race in __i915_request_submit [i915] / i915_request_await_dma_fence [i915]
      [ 3783.276766]
      [ 3783.276787] write to 0xffff8881f1bc60a0 of 1 bytes by interrupt on cpu 2:
      [ 3783.277187]  __i915_request_submit+0x47e/0x4a0 [i915]
      [ 3783.277580]  __execlists_submission_tasklet+0x997/0x2780 [i915]
      [ 3783.277973]  execlists_submission_tasklet+0xd3/0x170 [i915]
      [ 3783.278006]  tasklet_action_common.isra.0+0x42/0xa0
      [ 3783.278035]  __do_softirq+0xd7/0x2cd
      [ 3783.278063]  irq_exit+0xbe/0xe0
      [ 3783.278089]  do_IRQ+0x51/0x100
      [ 3783.278114]  ret_from_intr+0x0/0x1c
      [ 3783.278140]  finish_task_switch+0x72/0x260
      [ 3783.278170]  __schedule+0x1e5/0x510
      [ 3783.278198]  schedule+0x45/0xb0
      [ 3783.278226]  smpboot_thread_fn+0x23e/0x300
      [ 3783.278256]  kthread+0x19a/0x1e0
      [ 3783.278283]  ret_from_fork+0x1f/0x30
      [ 3783.278305]
      [ 3783.278327] read to 0xffff8881f1bc60a0 of 1 bytes by task 19440 on cpu 3:
      [ 3783.278724]  i915_request_await_dma_fence+0x2a6/0x530 [i915]
      [ 3783.279130]  i915_request_await_object+0x2fe/0x470 [i915]
      [ 3783.279524]  i915_gem_do_execbuffer+0x45dc/0x4c20 [i915]
      [ 3783.279908]  i915_gem_execbuffer2_ioctl+0x2c3/0x580 [i915]
      [ 3783.279940]  drm_ioctl_kernel+0xe4/0x120
      [ 3783.279968]  drm_ioctl+0x297/0x4c7
      [ 3783.279996]  ksys_ioctl+0x89/0xb0
      [ 3783.280021]  __x64_sys_ioctl+0x42/0x60
      [ 3783.280047]  do_syscall_64+0x6e/0x2c0
      [ 3783.280074]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200309132726.28358-1-chris@chris-wilson.co.uk
      60900add
  2. 07 3月, 2020 1 次提交
    • C
      drm/i915: Do not poison i915_request.link on removal · dff2a11b
      Chris Wilson 提交于
      Do not poison the timeline link on the i915_request to allow both
      forward/backward list traversal under RCU.
      
      [ 9759.139229] RIP: 0010:active_request+0x2a/0x90 [i915]
      [ 9759.139240] Code: 41 56 41 55 41 54 55 48 89 fd 53 48 89 f3 48 83 c5 60 e8 49 de dc e0 48 8b 83 e8 01 00 00 48 39 c5 74 12 48 8d 90 20 fe ff ff <48> 8b 80 50 fe ff ff a8 01 74 11 e8 66 20 dd e0 48 89 d8 5b 5d 41
      [ 9759.139251] RSP: 0018:ffffc9000014ce80 EFLAGS: 00010012
      [ 9759.139260] RAX: dead000000000122 RBX: ffff888817cac040 RCX: 0000000000022000
      [ 9759.139267] RDX: deacffffffffff42 RSI: ffff888817cac040 RDI: ffff888851fee900
      [ 9759.139275] RBP: ffff888851fee960 R08: 000000000000001a R09: ffffffffa04702e0
      [ 9759.139282] R10: ffffffff82187ea0 R11: 0000000000000002 R12: 0000000000000004
      [ 9759.139289] R13: ffffffffa04d5179 R14: ffff8887f994ae40 R15: ffff888857b9a068
      [ 9759.139296] FS:  0000000000000000(0000) GS:ffff88885ed80000(0000) knlGS:0000000000000000
      [ 9759.139304] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 9759.139311] CR2: 00007fff5bdec000 CR3: 00000008534fe001 CR4: 00000000001606e0
      [ 9759.139318] Call Trace:
      [ 9759.139325]  <IRQ>
      [ 9759.139389]  execlists_reset+0x14d/0x310 [i915]
      [ 9759.139400]  ? _raw_spin_unlock_irqrestore+0xf/0x30
      [ 9759.139445]  ? fwtable_read32+0x90/0x230 [i915]
      [ 9759.139499]  execlists_submission_tasklet+0xf6/0x150 [i915]
      [ 9759.139508]  tasklet_action_common.isra.17+0x32/0xa0
      [ 9759.139516]  __do_softirq+0x114/0x3dc
      [ 9759.139525]  ? handle_irq_event_percpu+0x59/0x70
      [ 9759.139533]  irq_exit+0xa1/0xc0
      [ 9759.139540]  do_IRQ+0x76/0x150
      [ 9759.139547]  common_interrupt+0xf/0xf
      [ 9759.139554]  </IRQ>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20200306140115.3495686-1-chris@chris-wilson.co.uk
      dff2a11b
  3. 06 3月, 2020 1 次提交
  4. 05 3月, 2020 2 次提交
  5. 04 3月, 2020 2 次提交
  6. 29 2月, 2020 1 次提交
  7. 28 2月, 2020 1 次提交
  8. 20 2月, 2020 1 次提交
  9. 12 2月, 2020 4 次提交
  10. 06 2月, 2020 1 次提交
  11. 03 2月, 2020 1 次提交
  12. 23 1月, 2020 1 次提交
  13. 17 1月, 2020 1 次提交
  14. 06 1月, 2020 1 次提交
  15. 23 12月, 2019 1 次提交
  16. 22 12月, 2019 1 次提交
  17. 20 12月, 2019 2 次提交
  18. 19 12月, 2019 1 次提交
  19. 18 12月, 2019 1 次提交
  20. 17 12月, 2019 1 次提交
  21. 16 12月, 2019 1 次提交
    • C
      drm/i915: Copy across scheduler behaviour flags across submit fences · 99de9536
      Chris Wilson 提交于
      We want the bonded request to have the same scheduler properties as its
      master so that it is placed at the same depth in the queue. For example,
      consider we have requests A, B and B', where B & B' are a bonded pair to
      run in parallel on two engines.
      
      	A -> B
           	     \- B'
      
      B will run after A and so may be scheduled on an idle engine and wait on
      A using a semaphore. B' sees B being executed and so enters the queue on
      the same engine as A. As B' did not inherit the semaphore-chain from B,
      it may have higher precedence than A and so preempts execution. However,
      B' then sits on a semaphore waiting for B, who is waiting for A, who is
      blocked by B.
      
      Ergo B' needs to inherit the scheduler properties from B (i.e. the
      semaphore chain) so that it is scheduled with the same priority as B and
      will not be executed ahead of Bs dependencies.
      
      Furthermore, to prevent the priorities changing via the expose fence on
      B', we need to couple in the dependencies for PI. This requires us to
      relax our sanity-checks that dependencies are strictly in order.
      
      v2: Synchronise (B, B') execution on all platforms, regardless of using
      a scheduler, any no-op syncs should be elided.
      
      Fixes: ee113690 ("drm/i915/execlists: Virtual engine bonding")
      Closes: https://gitlab.freedesktop.org/drm/intel/issues/464
      Testcase: igt/gem_exec_balancer/bonded-chain
      Testcase: igt/gem_exec_balancer/bonded-semaphore
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20191210151332.3902215-1-chris@chris-wilson.co.uk
      (cherry picked from commit c81471f5)
      Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      99de9536
  22. 14 12月, 2019 2 次提交
  23. 12 12月, 2019 1 次提交
  24. 11 12月, 2019 1 次提交
    • C
      drm/i915: Copy across scheduler behaviour flags across submit fences · c81471f5
      Chris Wilson 提交于
      We want the bonded request to have the same scheduler properties as its
      master so that it is placed at the same depth in the queue. For example,
      consider we have requests A, B and B', where B & B' are a bonded pair to
      run in parallel on two engines.
      
      	A -> B
           	     \- B'
      
      B will run after A and so may be scheduled on an idle engine and wait on
      A using a semaphore. B' sees B being executed and so enters the queue on
      the same engine as A. As B' did not inherit the semaphore-chain from B,
      it may have higher precedence than A and so preempts execution. However,
      B' then sits on a semaphore waiting for B, who is waiting for A, who is
      blocked by B.
      
      Ergo B' needs to inherit the scheduler properties from B (i.e. the
      semaphore chain) so that it is scheduled with the same priority as B and
      will not be executed ahead of Bs dependencies.
      
      Furthermore, to prevent the priorities changing via the expose fence on
      B', we need to couple in the dependencies for PI. This requires us to
      relax our sanity-checks that dependencies are strictly in order.
      
      v2: Synchronise (B, B') execution on all platforms, regardless of using
      a scheduler, any no-op syncs should be elided.
      
      Fixes: ee113690 ("drm/i915/execlists: Virtual engine bonding")
      Closes: https://gitlab.freedesktop.org/drm/intel/issues/464
      Testcase: igt/gem_exec_balancer/bonded-chain
      Testcase: igt/gem_exec_balancer/bonded-semaphore
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20191210151332.3902215-1-chris@chris-wilson.co.uk
      c81471f5
  25. 07 12月, 2019 1 次提交
  26. 22 11月, 2019 1 次提交
    • C
      drm/i915: Use a ctor for TYPESAFE_BY_RCU i915_request · 67a3acaa
      Chris Wilson 提交于
      As we start peeking into requests for longer and longer, e.g.
      incorporating use of spinlocks when only protected by an
      rcu_read_lock(), we need to be careful in how we reset the request when
      recycling and need to preserve any barriers that may still be in use as
      the request is reset for reuse.
      
      Quoting Linus Torvalds:
      
      > If there is refcounting going on then why use SLAB_TYPESAFE_BY_RCU?
      
        .. because the object can be accessed (by RCU) after the refcount has
        gone down to zero, and the thing has been released.
      
        That's the whole and only point of SLAB_TYPESAFE_BY_RCU.
      
        That flag basically says:
      
        "I may end up accessing this object *after* it has been free'd,
        because there may be RCU lookups in flight"
      
        This has nothing to do with constructors. It's ok if the object gets
        reused as an object of the same type and does *not* get
        re-initialized, because we're perfectly fine seeing old stale data.
      
        What it guarantees is that the slab isn't shared with any other kind
        of object, _and_ that the underlying pages are free'd after an RCU
        quiescent period (so the pages aren't shared with another kind of
        object either during an RCU walk).
      
        And it doesn't necessarily have to have a constructor, because the
        thing that a RCU walk will care about is
      
          (a) guaranteed to be an object that *has* been on some RCU list (so
          it's not a "new" object)
      
          (b) the RCU walk needs to have logic to verify that it's still the
          *same* object and hasn't been re-used as something else.
      
        In contrast, a SLAB_TYPESAFE_BY_RCU memory gets free'd and re-used
        immediately, but because it gets reused as the same kind of object,
        the RCU walker can "know" what parts have meaning for re-use, in a way
        it couidn't if the re-use was random.
      
        That said, it *is* subtle, and people should be careful.
      
      > So the re-use might initialize the fields lazily, not necessarily using a ctor.
      
        If you have a well-defined refcount, and use "atomic_inc_not_zero()"
        to guard the speculative RCU access section, and use
        "atomic_dec_and_test()" in the freeing section, then you should be
        safe wrt new allocations.
      
        If you have a completely new allocation that has "random stale
        content", you know that it cannot be on the RCU list, so there is no
        speculative access that can ever see that random content.
      
        So the only case you need to worry about is a re-use allocation, and
        you know that the refcount will start out as zero even if you don't
        have a constructor.
      
        So you can think of the refcount itself as always having a zero
        constructor, *BUT* you need to be careful with ordering.
      
        In particular, whoever does the allocation needs to then set the
        refcount to a non-zero value *after* it has initialized all the other
        fields. And in particular, it needs to make sure that it uses the
        proper memory ordering to do so.
      
        NOTE! One thing to be very worried about is that re-initializing
        whatever RCU lists means that now the RCU walker may be walking on the
        wrong list so the walker may do the right thing for this particular
        entry, but it may miss walking *other* entries. So then you can get
        spurious lookup failures, because the RCU walker never walked all the
        way to the end of the right list. That ends up being a much more
        subtle bug.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20191122094924.629690-1-chris@chris-wilson.co.uk
      67a3acaa
  27. 27 10月, 2019 1 次提交
  28. 26 10月, 2019 1 次提交
  29. 24 10月, 2019 1 次提交
  30. 18 10月, 2019 1 次提交
  31. 15 10月, 2019 1 次提交
  32. 14 10月, 2019 1 次提交
  33. 10 10月, 2019 1 次提交