- 28 1月, 2021 30 次提交
-
-
由 Ofir Bitton 提交于
In order to support completions that arrive directly to the user, the driver needs to supply the user with the first available msix interrupt available. Signed-off-by: NOfir Bitton <obitton@habana.ai> Reviewed-by: NOded Gabbay <ogabbay@kernel.org> Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 farah kassabri 提交于
Currently hint address is ignored in case va block page size is not power of 2. We need to support th user hint address also in this case, but only if the hint address is aligned to page size. Signed-off-by: Nfarah kassabri <fkassabri@habana.ai> Reviewed-by: NOded Gabbay <ogabbay@kernel.org> Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Ofir Bitton 提交于
In order to improve driver security debuggability, we add security violations dump to debugfs. Signed-off-by: NOfir Bitton <obitton@habana.ai> Reviewed-by: NOded Gabbay <ogabbay@kernel.org> Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Ofir Bitton 提交于
In order to support operation mode in which BMC is not active, driver must not take BMC errors into consideration. Signed-off-by: NOfir Bitton <obitton@habana.ai> Reviewed-by: NOded Gabbay <ogabbay@kernel.org> Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Ofir Bitton 提交于
Driver must print sync manager SEI information upon receiving interrupt from FW. Signed-off-by: NOfir Bitton <obitton@habana.ai> Reviewed-by: NOded Gabbay <ogabbay@kernel.org> Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Christophe JAILLET 提交于
Axe 'hl_pci_set_dma_mask()' and replace it with an equivalent 'dma_set_mask_and_coherent()' call. This makes the code a bit less verbose. It also removes an erroneous comment, because 'hl_pci_set_dma_mask()' does not try to use a fall-back value. Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: NOded Gabbay <ogabbay@kernel.org> Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Ofir Bitton 提交于
Due to HW limitation we must remove all direct access to SM registers, in order to do that we will access SM registers using the HW QMANS. When possible and no user context is present, we can directly access the HW QMANS. Whenever there is an active user, driver will prepare a pending command buffer list which will be sent upon user submissions. Signed-off-by: NOfir Bitton <obitton@habana.ai> Reviewed-by: NOded Gabbay <ogabbay@kernel.org> Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Ofir Bitton 提交于
In order to support scnenarios in which driver needs access to HW components but it cannot access them directly, we add support for scheduling command buffers internally. These command buffers will be transmitted upon next user command submission context. Signed-off-by: NOfir Bitton <obitton@habana.ai> Reviewed-by: NOded Gabbay <ogabbay@kernel.org> Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Ofir Bitton 提交于
A CS must increment the relevant context reference count. We want to increment the reference inside the CS allocation function as opposed for today where we increment it outside. This is logical since we want to avoid explicitly incrementing the context every time we call the CS allocate function. Signed-off-by: NOfir Bitton <obitton@habana.ai> Reviewed-by: NOded Gabbay <ogabbay@kernel.org> Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Ofir Bitton 提交于
We separate some of the common code source files to different folders for a better maintainability and testability. Signed-off-by: NOfir Bitton <obitton@habana.ai> Reviewed-by: NOded Gabbay <ogabbay@kernel.org> Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Ofir Bitton 提交于
Boot cpu can report errors in various boot stages. Current implementaion does not take into consideration errors reported in late stages, hence we will check for errors at the most late stage when fetching cpucp information. Signed-off-by: NOfir Bitton <obitton@habana.ai> Reviewed-by: NOded Gabbay <ogabbay@kernel.org> Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Ofir Bitton 提交于
In case MMU is enabled, we must take MMU page size into consideration when reporting dram size to the user. This is because the MMU page size can be a value which is NOT a power-of-2 value. As a result, the total DRAM size (which is always a power-of-2 value) needed to be rounded-down. Signed-off-by: NOfir Bitton <obitton@habana.ai> Reviewed-by: NOded Gabbay <ogabbay@kernel.org> Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Moti Haimovski 提交于
DRAM physical page sizes depend of the amount of HBMs available in the device. this number is device-dependent and may also be subject to binning when one or more of the DRAM controllers are found to to be faulty. Such a configuration may lead to partitioning the DRAM to non-power-of-2 pages. To support this feature we also need to add infrastructure of address scarmbling. Signed-off-by: NMoti Haimovski <mhaimovski@habana.ai> Reviewed-by: NOded Gabbay <ogabbay@kernel.org> Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Ofir Bitton 提交于
Accessing kernel allocated memory through debugfs should not be allowed as it introduces a security vulnerability. We remove the option to read/write kernel memory for all asics. Signed-off-by: NOfir Bitton <obitton@habana.ai> Reviewed-by: NOded Gabbay <ogabbay@kernel.org> Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Ofir Bitton 提交于
Initialize local variable that is returned by the function, in case it is never assigned. Signed-off-by: NOfir Bitton <obitton@habana.ai> Reviewed-by: NOded Gabbay <ogabbay@kernel.org> Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Alon Mizrahi 提交于
When working with DRAM MMU, we should supply the userspace with the virtual start address of the DRAM instead of the physical one. This is because the physical one has no meaning for the user as he only knows the virtual address range. Signed-off-by: NAlon Mizrahi <amizrahi@habana.ai> Reviewed-by: NOded Gabbay <ogabbay@kernel.org> Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Oded Gabbay 提交于
Update the latest version of this file that the F/W exports Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Oded Gabbay 提交于
The number of functional HBMs in the same ASIC can be different due to malfunctioning HBM banks. Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Ofir Bitton 提交于
In order to have more information while debugging boot issues, we should print the firmware security status at every boot stage. Signed-off-by: NOfir Bitton <obitton@habana.ai> Reviewed-by: NOded Gabbay <ogabbay@kernel.org> Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Omer Shpigelman 提交于
For consistency, modify all memory ioctl functions to get the ioctl arguments structure rather than the arguments themselves. Signed-off-by: NOmer Shpigelman <oshpigelman@habana.ai> Reviewed-by: NOded Gabbay <ogabbay@kernel.org> Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Omer Shpigelman 提交于
Change all memory functions documentation according to kernel doc format. Signed-off-by: NOmer Shpigelman <oshpigelman@habana.ai> Reviewed-by: NOded Gabbay <ogabbay@kernel.org> Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Alon Mizrahi 提交于
Often WARN is defined in data-centers as BUG and we would like to avoid hanging the entire server on some internal error of the driver (important as it might be). Therefore, use dev_crit instead. Signed-off-by: NAlon Mizrahi <amizrahi@habana.ai> Reviewed-by: NOded Gabbay <ogabbay@kernel.org> Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Moti Haimovski 提交于
Instead of having it hard-coded as a define, pass it to the user in runtime. Signed-off-by: NMoti Haimovski <mhaimovski@habana.ai> Reviewed-by: NOded Gabbay <ogabbay@kernel.org> Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Ohad Sharabi 提交于
Currently mmu_prepare is located at context switch. Since we support a single context, no reason to reconfigure the MMU registers every context switch. Signed-off-by: NOhad Sharabi <osharabi@habana.ai> Reviewed-by: NOded Gabbay <ogabbay@kernel.org> Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Ofir Bitton 提交于
As all packets use the same CTL register masks, we remove duplicated masks and use common masks instead. Signed-off-by: NOfir Bitton <obitton@habana.ai> Reviewed-by: NOded Gabbay <ogabbay@kernel.org> Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Ofir Bitton 提交于
In order to support the staged submission feature, user must be allowed to use the same CS sequence for all submissions in the same staged submission. Signed-off-by: NOfir Bitton <obitton@habana.ai> Reviewed-by: NOded Gabbay <ogabbay@kernel.org> Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Ofir Bitton 提交于
As part of the staged submission feature, we need Gaudi to support command submissions that will never get a completion. Signed-off-by: NOfir Bitton <obitton@habana.ai> Reviewed-by: NOded Gabbay <ogabbay@kernel.org> Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Ofir Bitton 提交于
In order for reserving VA ranges for kernel memory, we need to allow the VM module to be initiated with kernel context. Signed-off-by: NOfir Bitton <obitton@habana.ai> Reviewed-by: NOded Gabbay <ogabbay@kernel.org> Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Ohad Sharabi 提交于
remove mmu_cache_lock as it protects a section which is already protected by mmu_lock. in addition, wrap mmu cache invalidate calls in hl_vm_ctx_fini with mmu_lock. Signed-off-by: NOhad Sharabi <osharabi@habana.ai> Reviewed-by: NOded Gabbay <ogabbay@kernel.org> Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Oded Gabbay 提交于
Update to latest firmware hl_boot_if.h file. Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
- 22 1月, 2021 3 次提交
-
-
由 Oded Gabbay 提交于
When device is removed, we need to make sure the F/W won't send us any more events because during the remove process we disable the interrupts. Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Oded Gabbay 提交于
Need to take the lower 32 bits of the driver's 64-bit idle mask and put it in the legacy 32-bit variable that the userspace reads to know the idle mask. Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Ofir Bitton 提交于
Driver does not zero some pci counters packets before sending to FW. This causes an out of sync PI/CI between driver and FW. Signed-off-by: NOfir Bitton <obitton@habana.ai> Reviewed-by: NOded Gabbay <ogabbay@kernel.org> Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
- 12 1月, 2021 3 次提交
-
-
由 Oded Gabbay 提交于
When using Deep learning framework such as tensorflow or pytorch, there are tens of thousands of host memory mappings. When the user frees all those mappings at the same time, the process of unmapping and unpinning them can take a long time, which may cause a soft lockup bug. To prevent this, we need to free the core to do other things during the unmapping process. For now, we chose to do it every 32K unmappings (each unmap is a single 4K page). Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Oded Gabbay 提交于
There are some points in the reset process where if the code fails for some reason, and the system admin tries to initiate the reset process again we will get a kernel panic. This is because there aren't any protections in different fini functions that are called during the reset process. The protections that are added in this patch make sure that if the fini functions are called multiple times, without calling init functions between them, there won't be double release of already released resources. Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Oded Gabbay 提交于
When doing dma_alloc_coherent in the driver, we add a certain hard-coded offset to the DMA address before returning to the callee function. This offset is needed when our device use this DMA address to perform outbound transactions to the host. However, if we want to map the DMA'able memory to the user via dma_mmap_coherent(), we need to pass the original dma address, without this offset. Otherwise, we will get erronouos mapping. Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
- 30 12月, 2020 1 次提交
-
-
由 Dinghao Liu 提交于
When kzalloc() fails, we should execute hl_mmu_fini() to release the MMU module. It's the same when hl_ctx_init() fails. Signed-off-by: NDinghao Liu <dinghao.liu@zju.edu.cn> Reviewed-by: NOded Gabbay <ogabbay@kernel.org> Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
- 28 12月, 2020 3 次提交
-
-
由 Oded Gabbay 提交于
When the device is in reset or needs to be reset, the disabled property is don't-care. Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Oded Gabbay 提交于
We need to make sure our device is idle when rebooting a virtual machine. This is done in the driver level. The firmware will later handle FLR but we want to be extra safe and stop the devices until the FLR is handled. Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-
由 Alon Mizrahi 提交于
Up until now validation errors were counted in the parsing field of the cs_counters struct, so we added a new counter and increased it when needed. In addition, there were some locations where only one of the counters was updated (ctx or aggregate) so add the second one to be updated as well. Signed-off-by: NAlon Mizrahi <amizrahi@habana.ai> Reviewed-by: NOded Gabbay <ogabbay@kernel.org> Signed-off-by: NOded Gabbay <ogabbay@kernel.org>
-