1. 17 5月, 2019 1 次提交
    • O
      habanalabs: don't limit packet size for device CPU · cbb10f1e
      Oded Gabbay 提交于
      This patch removes a limitation on the maximum packet size that is read by
      the device CPU as that limitation is not needed.
      
      Therefore, the patch also removes an elaborate calculation that is based
      on this limitation which is also not needed now. Instead, use a fixed
      value for the memory pool size of the packets.
      Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
      cbb10f1e
  2. 16 5月, 2019 1 次提交
  3. 14 5月, 2019 1 次提交
  4. 13 5月, 2019 1 次提交
  5. 12 5月, 2019 1 次提交
  6. 09 5月, 2019 3 次提交
  7. 08 5月, 2019 2 次提交
  8. 04 5月, 2019 1 次提交
    • O
      habanalabs: force user to set device debug mode · 19734970
      Oded Gabbay 提交于
      This patch adds the implementation of the HL_DEBUG_OP_SET_MODE opcode in
      the DEBUG IOCTL.
      
      It forces the user who wants to debug the device to set the device into
      debug mode before he can configure the debug engines. The patch also makes
      sure to disable debug mode upon user releasing FD, in case the user forgot
      to disable debug mode.
      Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
      19734970
  9. 05 5月, 2019 2 次提交
  10. 04 5月, 2019 1 次提交
  11. 20 6月, 2019 1 次提交
  12. 04 6月, 2019 1 次提交
  13. 03 6月, 2019 1 次提交
  14. 29 5月, 2019 1 次提交
    • O
      habanalabs: fix bug in checking huge page optimization · d7241701
      Oded Gabbay 提交于
      This patch fix a bug in the mmu code that checks whether we can use huge
      page mappings for host pages.
      
      The code is supposed to enable huge page mappings only if ALL DMA
      addresses are aligned to 2MB AND the number of pages in each DMA chunk is
      a modulo of the number of pages in 2MB. However, the code ignored the
      first requirement for the first DMA chunk.
      
      This patch fix that issue by making sure the requirement of address
      alignment is validated against all DMA chunks.
      Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
      d7241701
  15. 25 5月, 2019 3 次提交
    • T
      habanalabs: Avoid using a non-initialized MMU cache mutex · 8d45f1de
      Tomer Tayar 提交于
      The MMU cache mutex is used in the ASIC hw_init() functions, but it is
      initialized only later in hl_mmu_init().
      This patch prevents it by moving the initialization to the
      device_early_init() function.
      Signed-off-by: NTomer Tayar <ttayar@habana.ai>
      Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
      8d45f1de
    • J
      habanalabs: fix debugfs code · 8438846c
      Jann Horn 提交于
      This fixes multiple things in the habanalabs debugfs code, in particular:
      
       - mmu_write() was unnecessarily verbose, copying around between multiple
         buffers
       - mmu_write() could write a user-specified, unbounded amount of userspace
         memory into a kernel buffer (out-of-bounds write)
       - multiple debugfs read handlers ignored the user-supplied count,
         potentially corrupting out-of-bounds userspace data
       - hl_device_read() was unnecessarily verbose
       - hl_device_write() could read uninitialized stack memory
       - multiple debugfs read handlers copied terminating null characters to
         userspace
      Signed-off-by: NJann Horn <jannh@google.com>
      Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
      Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
      Cc: stable@vger.kernel.org
      8438846c
    • O
      habanalabs: halt debug engines on user process close · 89225ce4
      Omer Shpigelman 提交于
      This patch fix a potential bug where a user's process has closed
      unexpectedly without disabling the debug engines. In that case, the debug
      engines might continue running but because the user's MMU mappings are
      going away, we will get page fault errors.
      
      This behavior is also opposed to the general rule where nothing runs on
      the device after the user process closes.
      
      The patch stops the debug H/W engines upon process termination and thus
      makes sure nothing runs on the device after the process goes away.
      Signed-off-by: NOmer Shpigelman <oshpigelman@habana.ai>
      Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
      89225ce4
  16. 21 5月, 2019 1 次提交
  17. 02 5月, 2019 2 次提交
  18. 30 4月, 2019 1 次提交
  19. 01 5月, 2019 4 次提交
  20. 30 4月, 2019 1 次提交
  21. 29 4月, 2019 1 次提交
    • T
      habanalabs: Use single pool for CPU accessible host memory · 03d5f641
      Tomer Tayar 提交于
      The device's CPU accessible memory on host is managed in a dedicated
      pool, except for 2 regions - Primary Queue (PQ) and Event Queue (EQ) -
      which are allocated from generic DMA pools.
      Due to address length limitations of the CPU, the addresses of all these
      memory regions must have the same MSBs starting at bit 40.
      This patch modifies the allocation of the PQ and EQ to be also from the
      dedicated pool, to ensure compliance with the limitation.
      Signed-off-by: NTomer Tayar <ttayar@habana.ai>
      Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
      03d5f641
  22. 28 4月, 2019 1 次提交
    • O
      habanalabs: return old dram bar address upon change · a38693d7
      Oded Gabbay 提交于
      This patch changes the ASIC interface function that changes the DRAM bar
      window. The change is to return the old address that the DRAM bar pointed
      to instead of an error code.
      
      This simplifies the code that use this function (mainly in debugfs) to
      restore the bar to the old setting.
      
      This is also needed for easier support in future ASICs.
      Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
      a38693d7
  23. 26 4月, 2019 1 次提交
  24. 22 4月, 2019 1 次提交
    • O
      habanalabs: use ASIC functions interface for rreg/wreg · b2377e03
      Oded Gabbay 提交于
      This patch slightly changes the macros of RREG32 and WREG32, which are
      used when reading or writing from registers.
      
      Instead of directly calling a function in the common code from these
      macros, the new code calls a function from the ASIC functions interface.
      
      This change allows us to share much more code between real ASICs and
      simulators, which in turn reduces the maintenance burden and
      the chances for forgetting to port code between the ASIC files.
      
      The patch also implements the hl_poll_timeout macro, instead of calling
      the generic readl_poll_timeout macro. This is required to allow use of
      this macro in the simulator files.
      
      As a result from this change, more functions in goya.c are shared with the
      simulator and therefore, should not be defined as static.
      Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
      b2377e03
  25. 21 4月, 2019 2 次提交
  26. 10 4月, 2019 1 次提交
  27. 06 4月, 2019 3 次提交
    • O
      habanalabs: prevent device PTE read/write during hard-reset · 9f201aba
      Oded Gabbay 提交于
      During hard-reset, contexts are closed as part of the tear-down process.
      After a context is closed, the driver cleans up the page tables of that
      context in the device's DRAM. This action is both dangerous and
      unnecessary.
      
      It is unnecessary, because the device is going through a hard-reset, which
      means the device's DRAM contents are no longer valid and the device's MMU
      is being reset.
      
      It is dangerous, because if the hard-reset came as a result of a PCI
      freeze, this action may cause the entire host machine to hang.
      
      Therefore, prevent all device PTE updates when a hard-reset operation is
      pending.
      Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
      9f201aba
    • O
      habanalabs: improve IOCTLs behavior when disabled or reset · 3f5398cf
      Oded Gabbay 提交于
      This patch makes some improvement in how IOCTLs behave when the device is
      disabled or under reset.
      
      The new code checks, at the start of every IOCTL, if the device is
      disabled or in reset. If so, it prints an appropriate kernel message and
      returns -EBUSY to user-space.
      
      In addition, the code modifies the location of where the
      hard_reset_pending flag is being set or cleared:
      
      1. It is now cleared immediately after the reset *tear-down* flow is
         finished but before the re-initialization flow begins.
      
      2. It is being set in the remove function of the device, to make the
         behavior the same with the hard-reset flow
      
      There are two exceptions to the disable or in reset check:
      
      1. The HL_INFO_DEVICE_STATUS opcode in the INFO IOCTL. This opcode allows
         the user to inquire about the status of the device, whether it is
         operational, in reset or malfunction (disabled). If the driver will
         block this IOCTL, the user won't be able to retrieve the status in
         case of malfunction or in reset.
      
      2. The WAIT_FOR_CS IOCTL. This IOCTL allows the user to inquire about the
         status of a CS. We want to allow the user to continue to do so, even if
         we started a soft-reset process because it will allow the user to get
         the correct error code for each CS he submitted.
      Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
      3f5398cf
    • O
      habanalabs: all FD must be closed before removing device · caa3c8e5
      Oded Gabbay 提交于
      This patch fixes a bug in the implementation of the function that removes
      the device.
      
      The bug can happen when the device is removed but not the driver itself
      (e.g. remove by the OS due to PCI freeze in Power architecture).
      
      In that case, there maybe open users that are calling IOCTLs while the
      device is removed. This is a possible race condition that the driver must
      handle. Otherwise, a kernel panic may occur.
      
      This race is prevented in the hard-reset flow, because the driver makes
      sure the users are closed before continuing with the hard-reset. This
      race can not occur when the driver itself is removed because the OS makes
      sure all the file descriptors are closed.
      
      The fix is to make sure the open users close their file descriptors and if
      they don't (after a certain amount of time), the driver sends them a
      SIGKILL, because the remove of the device can't be stopped.
      
      The patch re-uses the same code that is called from the hard-reset flow.
      Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
      caa3c8e5