- 22 9月, 2020 5 次提交
-
-
由 Oded Gabbay 提交于
When shifting a boolean variable by more than 31 bits and putting the result into a u64 variable, we need to cast the boolean into unsigned 64 bits to prevent possible overflow. Reported-by: Nkernel test robot <lkp@intel.com> Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Oded Gabbay 提交于
ArmCP mandates that the device CPU is always an ARM processor, which might be wrong in the future. Most of this change is an internal renaming of variables, functions and defines but there are two entries in sysfs which have armcp in their names. Add identical cpucp entries but don't remove yet the armcp entries. Those will be deprecated next year. Add the documentation about it in sysfs documentation. Signed-off-by: NMoti Haimovski <mhaimovski@habana.ai> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 farah kassabri 提交于
change busy engines bitmask to 64 bits in order to represent more engines, needed for future ASIC support. Signed-off-by: Nfarah kassabri <fkassabri@habana.ai> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Dotan Barak 提交于
If there is a failure during the testing of a queue, to ease up debugging - print the queue id. Signed-off-by: NDotan Barak <dbarak@habana.ai> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Ofir Bitton 提交于
Update firmware header with new API for getting pcie info such as tx/rx throughput and replay counter. These counters are needed by customers for monitor and maintenance of multiple devices. Add new opcodes to the INFO ioctl to retrieve these counters. Signed-off-by: NOfir Bitton <obitton@habana.ai> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 22 8月, 2020 1 次提交
-
-
由 Ofir Bitton 提交于
During command buffer parsing, driver extracts packet id from user buffer. Driver must validate this packet id, since it is being used in order to extract information from internal structures. Signed-off-by: NOfir Bitton <obitton@habana.ai> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 29 7月, 2020 2 次提交
-
-
由 kernel test robot 提交于
Signed-off-by: Nkernel test robot <lkp@intel.com> Link: https://lore.kernel.org/r/20200729000313.GA14680@e442e3f624c4Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Greg Kroah-Hartman 提交于
There's no need to try to be cute with the include file locations in the Makefile, so just specify exactly where the files are. Bonus is this fixes the problem of building with O= as well as trying to just build the subdirectory alone. Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Cc: Oded Gabbay <oded.gabbay@gmail.com> Cc: Omer Shpigelman <oshpigelman@habana.ai> Cc: Tomer Tayar <ttayar@habana.ai> Cc: Moti Haimovski <mhaimovski@habana.ai> Cc: Ofir Bitton <obitton@habana.ai> Cc: Ben Segal <bpsegal20@gmail.com> Cc: Christine Gharzuzi <cgharzuzi@habana.ai> Cc: Pawel Piskorski <ppiskorski@habana.ai> Link: https://lore.kernel.org/r/20200728171851.55842-1-gregkh@linuxfoundation.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 25 7月, 2020 7 次提交
-
-
由 Ofir Bitton 提交于
Create a device MMU-mapped internal command buffer pool, in order to allow the driver to allocate CBs for the signal/wait operations that are fetched by the queues when they are configured with the user's address space ID. We must pre-map this internal pool due to performance issues. This pool is needed for future ASIC support and it is currently unused in GOYA and GAUDI. Signed-off-by: NOfir Bitton <obitton@habana.ai> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Oded Gabbay 提交于
Currently the driver halts the device CPU in the halt engines function, which halts all the engines of the ASIC. The problem is that if later on we stop the reset process (due to inability to clean memory mappings in time), the CPU will remain in halt mode. This creates many issues, such as thermal/power control and FLR handling. Therefore, move the halting of the device CPU to the very end of the reset process, just before writing to the registers to initiate the reset. In addition, the driver now needs to send a message to the device F/W to disable it from sending interrupts to the host machine because during halt engines function the driver disables the MSI/MSI-X interrupts. Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com> Reviewed-by: NTomer Tayar <ttayar@habana.ai>
-
由 Ofir Bitton 提交于
Currently the amount of maximum queues is statically configured. Using a static value is causing redundunt cycles when traversing all queues and consumes more memory than actually needed. In this patch we configure each asic with the exact number of queues needed. Signed-off-by: NOfir Bitton <obitton@habana.ai> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Ofir Bitton 提交于
Divide iATU initialization into inbound/outbound methods. We must separate it in order to enable different match mode per PCIe region. In addition, added support for PCI address match mode. Signed-off-by: NOfir Bitton <obitton@habana.ai> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Adam Aharon 提交于
The profiler needs to know the PLL values for correctly showing the profiling data. Because our firmware can use different PLL configurations, we need to read the PLL values from the ASIC to pass them to the profiler. Signed-off-by: NAdam Aharon <aaharon@habana.ai> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Ofir Bitton 提交于
Currently sync stream is limited only for external queues. We want to remove this constraint by adding a new queue property dedicated for sync stream. In addition we move the initialization and reset methods to the common code since we can re-use them with slight changes. Signed-off-by: NOfir Bitton <obitton@habana.ai> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Ofir Bitton 提交于
Training schemes requires much more concurrent command submissions than inference does. In addition, training command submissions can be completed in a non serialized manner. Hence, we add support in which each ASIC will be able to configure the amount of concurrent pending command submissions, rather than use a predefined amount. This change will enhance performance by allowing the user to add more concurrent work without waiting for the previous work to be completed. Signed-off-by: NOfir Bitton <obitton@habana.ai> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 11 7月, 2020 2 次提交
-
-
由 Oded Gabbay 提交于
We see that sometimes the CPU in GOYA and GAUDI is occupied by the power/thermal loop and can't answer requests from the driver fast enough. Therefore, to avoid false notifications on timeouts, increase the timeout to 4 seconds on each message sent to the device CPU. Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com> Reviewed-by: NTomer Tayar <ttayar@habana.ai>
-
由 Oded Gabbay 提交于
For debugging purposes, we need to allow the root user better control of the clock gating feature of the DMA and compute engines. Therefore, change the clock gating debugfs interface to be bitmask instead of true/false. Each bit represents a different engine, according to gaudi_engine_id enum. See debugfs documentation for more details. Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com> Reviewed-by: NOmer Shpigelman <oshpigelman@habana.ai>
-
- 01 7月, 2020 1 次提交
-
-
由 Lee Jones 提交于
Seeing as 'addr' is unsigned, it would be impossible for the assigned value to be anything other than zero or positive. Squashes the following W=1 warnings: drivers/misc/habanalabs/goya/goya.c: In function ‘goya_debugfs_read32’: drivers/misc/habanalabs/goya/goya.c:3945:19: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits] 3945 | } else if ((addr >= DRAM_PHYS_BASE) && | ^~ drivers/misc/habanalabs/goya/goya.c: In function ‘goya_debugfs_write32’: drivers/misc/habanalabs/goya/goya.c:4002:19: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits] 4002 | } else if ((addr >= DRAM_PHYS_BASE) && | ^~ drivers/misc/habanalabs/goya/goya.c: In function ‘goya_debugfs_read64’: drivers/misc/habanalabs/goya/goya.c:4047:19: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits] 4047 | } else if ((addr >= DRAM_PHYS_BASE) && | ^~ drivers/misc/habanalabs/goya/goya.c: In function ‘goya_debugfs_write64’: drivers/misc/habanalabs/goya/goya.c:4091:19: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits] 4091 | } else if ((addr >= DRAM_PHYS_BASE) && | ^~ drivers/misc/habanalabs/pci.c:328: warning: Excess function parameter 'dma_mask' description in 'hl_pci_set_dma_mask' drivers/misc/habanalabs/goya/goya_coresight.c: In function ‘goya_debug_coresight’: drivers/misc/habanalabs/goya/goya_coresight.c:643:6: warning: variable ‘val’ set but not used [-Wunused-but-set-variable] 643 | u32 val; | ^~~ Cc: Oded Gabbay <oded.gabbay@gmail.com> Cc: Tomer Tayar <ttayar@habana.ai> Signed-off-by: NLee Jones <lee.jones@linaro.org> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Link: https://lore.kernel.org/r/20200701085853.164358-7-lee.jones@linaro.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 25 5月, 2020 2 次提交
-
-
由 Omer Shpigelman 提交于
MMU cache invalidation timeout indicates that the device is unstable and therefore unusable. Hence in such case do hard reset and return an error to the user if was called from ioctl. In addition, change the print to error level and rephrase its text. Signed-off-by: NOmer Shpigelman <oshpigelman@habana.ai> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Oded Gabbay 提交于
GAUDI does not support soft-reset as it leaves the NIC ports in an awkward state, where their QMANs were reset but the NIC itself is still working. In addition, there is not much sense in doing soft-reset when training is done on multiple GAUDIs. Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com> Reviewed-by: NTomer Tayar <ttayar@habana.ai>
-
- 19 5月, 2020 6 次提交
-
-
由 Rachel Stahl 提交于
The patch_cb_size is not updated for Wreg32 in its validate function, so updated in goya_validate_cb. Signed-off-by: NRachel Stahl <rstahl@habana.ai> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Oded Gabbay 提交于
In Gaudi there is a feature of clock gating certain engines. Therefore, add this property to the device structure. In addition, due to a limitation of this feature, the driver needs to dynamically enable or disable this feature during run-time. Therefore, add ASIC interface functions to enable/disable this function from the common code. Moreover, this feature must be turned off when the user wishes to debug the ASIC by reading/writing registers and/or memory through the driver's debugfs. Therefore, add an option to enable/disable clock gating via the debugfs interface. Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Omer Shpigelman 提交于
Coresight is not supported on simulator, therefore add a boolean for checking that (currently used by un-upstreamed code). Signed-off-by: NOmer Shpigelman <oshpigelman@habana.ai> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Omer Shpigelman 提交于
This feature requires handling h/w resources which are a bit different from one ASIC to the other. Therefore, we need to define a set of interfaces the ASIC code provides to the common code to signal, wait, reset sync object and to reset and init a queue. As this feature is not supported in Goya, provide an empty implementation of those functions. Signed-off-by: NOmer Shpigelman <oshpigelman@habana.ai> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Ofir Bitton 提交于
Load CPU device boot loader during driver boot time in order to avoid flash write for every boot loader update. To preserve backward-compatibility, skip the device boot load if the device doesn't request it. Signed-off-by: NOfir Bitton <obitton@habana.ai> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Tomer Tayar 提交于
Add a new opcode to the INFO IOCTL that retrieves the device time alongside the host time, to allow a user application that want to measure device time together with host time (such as a profiler) to synchronize these times. Signed-off-by: NTomer Tayar <ttayar@habana.ai> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 17 5月, 2020 6 次提交
-
-
由 Oded Gabbay 提交于
When we have DMA QMAN with multiple streams, we need to know whether the command buffer contains at least one DMA packet in order to configure the barriers correctly when adding the 2xMSG_PROT at the end of the JOB. If there is no DMA packet, then there is no need to put engine barrier. This is relevant only for GAUDI as GOYA doesn't have streams so the engine can't be busy by another stream. Reviewed-by: NTomer Tayar <ttayar@habana.ai> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Oded Gabbay 提交于
Retrieve from the firmware the DMA mask value we need to set according to the device's PCI controller configuration. This is needed when working on POWER9 machines, as the device's PCI controller is configured in a different way in those machines. Reviewed-by: NTomer Tayar <ttayar@habana.ai> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Oded Gabbay 提交于
Move the code of device CPU initialization from being ASIC-Dependent to common code. In addition, add support for the new error reporting feature of the firmware boot code. Reviewed-by: NOmer Shpigelman <oshpigelman@habana.ai> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Omer Shpigelman 提交于
We want to remove the following restrictions/assumptions in our driver: 1. The H/W queue index is also the completion queue index. 2. The H/W queue index is also the IRQ number of the completion queue. 3. All queues of the same type have consecutive indexes. Therefore we add the support for H/W queues of the same type with nonconsecutive indexes and completion queue index and IRQ number different than the H/W queue index. Signed-off-by: NOmer Shpigelman <oshpigelman@habana.ai> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Omer Shpigelman 提交于
Stop-on-error mode in DMA is useful as it stops the transaction immediately upon error e.g. page fault. But it may cause the next command submission to fail as is leaves the DMA in unstable state. Therefore we remove the stop-on-error configuration from the DMA. Stop-on-err is still available for debug. Signed-off-by: NOmer Shpigelman <oshpigelman@habana.ai> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Oded Gabbay 提交于
Upon reset of the ASIC, the driver would have waited for the CPU to come out of reset before finishing the reset process. This was done for the purpose of making the CPU available to answer FLR requests. However, when a VM shuts down, the driver isn't removed so a reset never happens. Therefore, remove this waiting period as we don't need it. Reviewed-by: NOmer Shpigelman <oshpigelman@habana.ai> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 24 3月, 2020 8 次提交
-
-
由 Omer Shpigelman 提交于
Add print upon clock slow down due to power consumption or overheating. In addition, add print when back to optimal clock. Signed-off-by: NOmer Shpigelman <oshpigelman@habana.ai> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Jules Irenge 提交于
Sparse reports a warning at goya_hw_queues_unlock() warning: context imbalance in goya_hw_queues_unlock() - unexpected unlock The root cause is a missing annotation at goya_hw_queues_unlock() Add the missing __releases(&goya->hw_queues_lock) annotation Signed-off-by: NJules Irenge <jbi.octave@gmail.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Jules Irenge 提交于
Sparse reports a warning at goya_hw_queues_lock() warning: context imbalance in goya_hw_queues_lock() - wrong count at exit The root cause is a missing annotation at goya_hw_queues_lock() Add the missing __acquires(&goya->hw_queues_lock) annotation Signed-off-by: NJules Irenge <jbi.octave@gmail.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Oded Gabbay 提交于
The compute engines can perform millions of transactions per second. If there is a bug in the S/W stack, we could get a lot of interrupts and spam the kernel log. Therefore, ratelimit these prints Reviewed-by: NOmer Shpigelman <oshpigelman@habana.ai> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Moti Haimovski 提交于
Allow debug user to write/read 64-bit data through debugfs. This will expedite the dump process of the (large) internal memories of the device done during debug. Signed-off-by: NMoti Haimovski <mhaimovski@habana.ai> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Omer Shpigelman 提交于
DRAM_PHYS_BASE is already taken into account in MMU_PAGE_TABLES_ADDR. Signed-off-by: NOmer Shpigelman <oshpigelman@habana.ai> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Omer Shpigelman 提交于
Host memory may be allocated with huge pages. A different virtual range may be used for mapping in this case. Add Huge PCI MMU (HPMMU) properties to support it. This patch is a prerequisite for future ASICs support and has no effect on Goya ASIC as currently a single virtual host range is used for all page sizes. Signed-off-by: NOmer Shpigelman <oshpigelman@habana.ai> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Pawel Piskorski 提交于
Optimize hl_mmu_map and hl_mmu_unmap by not calling flush(ctx) within per-page loop. Signed-off-by: NPawel Piskorski <ppiskorski@habana.ai> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-