1. 22 9月, 2020 2 次提交
  2. 22 8月, 2020 1 次提交
  3. 30 7月, 2020 1 次提交
  4. 25 7月, 2020 8 次提交
  5. 19 7月, 2020 1 次提交
  6. 24 6月, 2020 1 次提交
  7. 01 6月, 2020 1 次提交
  8. 25 5月, 2020 1 次提交
    • D
      habanalabs: don't set default fence_ops->wait · ed65bfd9
      Daniel Vetter 提交于
      It's the default.
      
      Also so much for "we're not going to tell the graphics people how to
      review their code", dma_fence is a pretty core piece of gpu driver
      infrastructure. And it's very much uapi relevant, including piles of
      corresponding userspace protocols and libraries for how to pass these
      around.
      
      Would be great if habanalabs would not use this (from a quick look
      it's not needed at all), since open source the userspace and playing
      by the usual rules isn't on the table. If that's not possible (because
      it's actually using the uapi part of dma_fence to interact with gpu
      drivers) then we have exactly what everyone promised we'd want to
      avoid.
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
      Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
      ed65bfd9
  9. 19 5月, 2020 3 次提交
    • O
      habanalabs: add signal/wait to CS IOCTL operations · b75f2250
      Omer Shpigelman 提交于
      Add the following two operations to the CS IOCTL:
      
      Signal:
      
      The signal operation is basically a command submission, that is created by
      the driver upon user request. It will be implemented using a dedicated PQE
      that will increment a specific SOB. There will be a new flag:
      HL_CS_FLAGS_SIGNAL. When the user set this flag in the CS IOCTL structure,
      the driver will execute a dedicated code path that will prepare this
      special PQE and submit it. The user only needs to provide a queue index on
      which to put the signal.
      
      Wait:
      
      The wait operation is also a command submission that is created by the
      driver upon user request. It will be implemented using a dedicated PQE that
      will contain packets of "ARM a monitor" + FENCE packet. There will be a new
      flag: HL_CS_FLAGS_WAIT. When the user set this flag in the CS structure,
      the driver will execute a dedicated code path that will prepare this
      special PQE and submit it.
      
      The user needs to provide the following parameters:
      1. queue ID
      2. an array of signal_seq numbers and the number of signals to wait on
         (the length of signal_seq_arr).
      
      The IOCTL will return the CS sequence number of the wait it put on the
      queue ID.
      
      Currently, the code supports signal_seq_nr==1. But this API definition will
      allow us to put a single PQE that waits on multiple signals.
      
      To correctly configure the monitor and fence, the driver will need to
      retrieve the specified signal CS object that contains the relevant SOB and
      its expected value. In case the signal CS has already been completed, there
      is no point of adding a wait operation. In this case, the driver will
      return to the user *without* putting anything on the PQ. The return code
      should reflect to the user that the signal was completed, as we won't
      return a CS sequence number for this wait.
      Signed-off-by: NOmer Shpigelman <oshpigelman@habana.ai>
      Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
      Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
      b75f2250
    • O
      habanalabs: handle the h/w sync object · b0b5d925
      Omer Shpigelman 提交于
      Define a structure representing the h/w sync object (SOB).
      
      a SOB can contain up to 2^15 values. Each signal CS will increment the SOB
      by 1, so after some time we will reach the maximum number the SOB can
      represent. When that happens, the driver needs to move to a different SOB
      for the signal operation.
      
      A SOB can be in 1 of 4 states:
      
      1. Working state with value < 2^15
      
      2. We reached a value of 2^15, but the signal operations weren't completed
      yet OR there are pending waits on this signal. For the next submission, the
      driver will move to another SOB.
      
      3. ALL the signal operations on the SOB have finished AND there are no more
      pending waits on the SOB AND we reached a value of 2^15 (This basically
      means the refcnt of the SOB is 0 - see explanation below). When that
      happens, the driver can clear the SOB by simply doing WREG32 0 to it and
      set the refcnt back to 1.
      
      4. The SOB is cleared and can be used next time by the driver when it needs
      to reuse an SOB.
      
      Per SOB, the driver will maintain a single refcnt, that will be initialized
      to 1. When a signal or wait operation on this SOB is submitted to the PQ,
      the refcnt will be incremented. When a signal or wait operation on this SOB
      completes, the refcnt will be decremented. After the submission of the
      signal operation that increments the SOB to a value of 2^15, the refcnt is
      also decremented.
      Signed-off-by: NOmer Shpigelman <oshpigelman@habana.ai>
      Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
      Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
      b0b5d925
    • O
      uapi: habanalabs: add signal/wait operations · f9e5f295
      Omer Shpigelman 提交于
      This is a pre-requisite to upstreaming GAUDI support.
      
      Signal/wait operations are done by the user to perform sync between two
      Primary Queues (PQs). The sync is done using the sync manager and it is
      usually resolved inside the device, but sometimes it can be resolved in the
      host, i.e. the user should be able to wait in the host until a signal has
      been completed.
      
      The mechanism to define signal and wait operations is done by the driver
      because it needs atomicity and serialization, which is already done in the
      driver when submitting work to the different queues.
      
      To implement this feature, the driver "takes" a couple of h/w resources,
      and this is reflected by the defines added to the uapi file.
      
      The signal/wait operations are done via the existing CS IOCTL, and they use
      the same data structure. There is a difference in the meaning of some of
      the parameters, and for that we added unions to make the code more
      readable.
      Signed-off-by: NOmer Shpigelman <oshpigelman@habana.ai>
      Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
      Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
      f9e5f295
  10. 17 5月, 2020 1 次提交
    • O
      habanalabs: handle barriers in DMA QMAN streams · 926ba4cc
      Oded Gabbay 提交于
      When we have DMA QMAN with multiple streams, we need to know whether the
      command buffer contains at least one DMA packet in order to configure the
      barriers correctly when adding the 2xMSG_PROT at the end of the JOB. If
      there is no DMA packet, then there is no need to put engine barrier. This
      is relevant only for GAUDI as GOYA doesn't have streams so the engine can't
      be busy by another stream.
      Reviewed-by: NTomer Tayar <ttayar@habana.ai>
      Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
      926ba4cc
  11. 24 3月, 2020 3 次提交
  12. 14 12月, 2019 1 次提交
    • O
      habanalabs: rate limit error msg on waiting for CS · 018e0e35
      Oded Gabbay 提交于
      In case a user submits a CS, and the submission fails, and the user doesn't
      check the return value and instead use the error return value as a valid
      sequence number of a CS and ask to wait on it, the driver will print an
      error and return an error code for that wait.
      
      The real problem happens if now the user ignores the error of the wait, and
      try to wait again and again. This can lead to a flood of error messages
      from the driver and even soft lockup event.
      Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
      Reviewed-by: NTomer Tayar <ttayar@habana.ai>
      018e0e35
  13. 21 11月, 2019 3 次提交
  14. 05 9月, 2019 3 次提交
  15. 29 7月, 2019 1 次提交
  16. 09 5月, 2019 1 次提交
    • O
      habanalabs: change polling functions to macros · a08b51a9
      Oded Gabbay 提交于
      This patch changes two polling functions to macros, in order to make their
      API the same as the standard readl_poll_timeout so we would be able to
      define the "condition for exit" when calling these macros.
      
      This will simplify the code as it will eliminate the need to check both
      for timeout and for the (cond) in the calling function.
      Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
      a08b51a9
  17. 01 5月, 2019 1 次提交
  18. 26 4月, 2019 1 次提交
  19. 27 3月, 2019 1 次提交
  20. 03 3月, 2019 1 次提交
    • O
      habanalabs: perform accounting for active CS · cbaa99ed
      Oded Gabbay 提交于
      This patch adds accounting for active CS. Active means that the CS was
      submitted to the H/W queues and was not completed yet.
      
      This is necessary to support suspend operation. Because the device will be
      reset upon suspend, we can only suspend after all active CS have been
      completed. Hence, we need to perform accounting on their number.
      Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
      cbaa99ed
  21. 28 2月, 2019 1 次提交
    • O
      habanalabs: soft-reset device if context-switch fails · af5f7eea
      Oded Gabbay 提交于
      This patch fix a bug in the driver, where if the TPC or MME remains in
      non-IDLE even after all the command submissions are done (due to user bug
      or malicious user), then future command submissions will fail in the
      context-switch stage and the driver will remain in "stuck" mode.
      
      The fix is to do a soft-reset of the device in case the context-switch
      fails, because the device should be IDLE during context-switch. If it is
      not IDLE, then something is wrong and we should reset the compute engines.
      Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af5f7eea
  22. 05 3月, 2019 1 次提交
    • O
      habanalabs: ratelimit warnings at start of IOCTLs · 680cb399
      Oded Gabbay 提交于
      At the start of some IOCTLs we check if the device is disabled or in reset.
      If it is, we return -EBUSY and print a message to kernel log.
      
      Because these IOCTLs can be called at very high frequency, use ratelimit
      to avoid spamming the kernel log. Also use the same type of message -
      dev_warn - in all the relevant IOCTLs.
      Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
      680cb399
  23. 18 2月, 2019 2 次提交
    • O
      habanalabs: add debugfs support · c2164773
      Oded Gabbay 提交于
      This patch adds debugfs support to the driver. It allows the user-space to
      display information that is contained in the internal structures of the
      driver, such as:
      - active command submissions
      - active user virtual memory mappings
      - number of allocated command buffers
      
      It also enables the user to perform reads and writes through Goya's PCI
      bars.
      Reviewed-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c2164773
    • O
      habanalabs: add command submission module · eff6f4a0
      Oded Gabbay 提交于
      This patch adds the main flow for the user to submit work to the device.
      
      Each work is described by a command submission object (CS). The CS contains
      3 arrays of command buffers: One for execution, and two for context-switch
      (store and restore).
      
      For each CB, the user specifies on which queue to put that CB. In case of
      an internal queue, the entry doesn't contain a pointer to the CB but the
      address in the on-chip memory that the CB resides at.
      
      The driver parses some of the CBs to enforce security restrictions.
      
      The user receives a sequence number that represents the CS object. The user
      can then query the driver regarding the status of the CS, using that
      sequence number.
      
      In case the CS doesn't finish before the timeout expires, the driver will
      perform a soft-reset of the device.
      Reviewed-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eff6f4a0