提交 · 7fc40bcaa63127d274e926dc1e9d62a72a01b1b5 · openeuler / Kernel

24 3月, 2020 1 次提交

habanalabs: flush only at the end of the map/unmap · 7fc40bca

由 Pawel Piskorski 提交于 12月 06, 2019

Optimize hl_mmu_map and hl_mmu_unmap by not calling flush(ctx)
within per-page loop.
Signed-off-by: NPawel Piskorski <ppiskorski@habana.ai>
Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

7fc40bca

21 11月, 2019 10 次提交

habanalabs: split MMU properties to PCI/DRAM · 54bb6744

由 Omer Shpigelman 提交于 11月 14, 2019

Split the properties used for MMU mappings to DRAM and PCI (host) types.
This is a prerequisite for future ASICs support.
Note that in Goya ASIC, the PMMU and DMMU are the same (except of page
sizes) as only one MMU mechanism is used for both of the mapping types.
Hence this patch should not have any effect on current behavior.
Signed-off-by: NOmer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

54bb6744

habanalabs: type specific MMU cache invalidation · 7b6e4ea0

由 Omer Shpigelman 提交于 11月 14, 2019

Add the ability to invalidate the necessary MMU cache only.
This ability is a prerequisite for future ASICs support.
Note that in Goya ASIC, a single cache is used for both host/DRAM
mappings and hence this patch should not have any effect on current
behavior.
Signed-off-by: NOmer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

7b6e4ea0

habanalabs: re-factor memory module code · 7f74d4d3

由 Omer Shpigelman 提交于 8月 12, 2019

Some of the functions in the memory module code were too long and/or
contained multiple operations that are not always done together. Re-factor
the code by dividing those functions to smaller functions which are more
readable and maintainable.
Signed-off-by: NOmer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

7f74d4d3

habanalabs: export uapi defines to user-space · 5d101257

由 Oded Gabbay 提交于 11月 10, 2019

The two defines that control the maximum size of a command buffer and the
maximum number of JOBS per CS need to be exported to the user as they are
part of the API towards user-space.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: NOmer Shpigelman <oshpigelman@habana.ai>

5d101257

habanalabs: increase max jobs number to 512 · bd4c8cb1

由 Oded Gabbay 提交于 11月 09, 2019

In training, there is a need for a large amount of patching to the recipe.
This results in many command buffers contains a lot of DMA packets. The
number of command buffers per CS is larger than the current maximum of 64,
which is an arbitrary number that is enough for inference, but it has no
real affect on the code and/or resources of the host machine.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: NOmer Shpigelman <oshpigelman@habana.ai>

bd4c8cb1

habanalabs: add opcode to INFO IOCTL to return clock rate · 62c1e124

由 Oded Gabbay 提交于 10月 10, 2019

Add a new opcode to the INFO IOCTL to allow the user application to
retrieve the ASIC's current and maximum clock rate. The rate is
returned in MHz.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: NTomer Tayar <ttayar@habana.ai>

62c1e124

habanalabs: set TPC Icache to 16 cache lines · 8fdacf2a

由 Oded Gabbay 提交于 10月 02, 2019

Reduce latency to memory during TPC kernel execution.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: NTomer Tayar <ttayar@habana.ai>

8fdacf2a

habanalabs: Add a new H/W queue type · cb596aee

由 Tomer Tayar 提交于 10月 03, 2019

This patch adds a support for a new H/W queue type.
This type of queue is for DMA and compute engines jobs, for which
completion notification are sent by H/W.
Command buffer for this queue can be created either through the CB
IOCTL and using the retrieved CB handle, or by preparing a buffer on the
host or device SRAM/DRAM, and using the device address to that buffer.
The patch includes the handling of the 2 options, as well as the
initialization of the H/W queue and its jobs scheduling.
Signed-off-by: NTomer Tayar <ttayar@habana.ai>
Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

cb596aee

habanalabs: Mark queue as expecting CB handle or address · df762375

由 Tomer Tayar 提交于 10月 03, 2019

Jobs on some queues must be provided with a handle to a driver command
buffer object, while for other queues, jobs must be provided with an
address to a command buffer.
Currently the distinction is done based on the queue type, which is less
flexible if the same queue type behaves differently on different
types of ASICs.
This patch adds a new queue property for this target, which is
configured per queue type per ASIC type.
Signed-off-by: NTomer Tayar <ttayar@habana.ai>
Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

df762375

habanalabs: Fix typos · f435614f

由 Tomer Tayar 提交于 10月 02, 2019

s/paerser/parser/
s/requeusted/requested/
s/an JOB/a JOB/
Signed-off-by: NTomer Tayar <ttayar@habana.ai>
Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

f435614f

05 9月, 2019 10 次提交

habanalabs: correctly cast variable to __le32 · 6dc66f7c

由 Oded Gabbay 提交于 9月 03, 2019

When using the macro le32_to_cpu(x), we need to correctly convert x to be
__le32 in case it is defined as u32 variable.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: NTomer Tayar <ttayar@habana.ai>

6dc66f7c

habanalabs: stop using the acronym KMD · 4c172bbf

由 Oded Gabbay 提交于 8月 30, 2019

We want to stop using the acronym KMD. Therefore, replace all locations
(except for register names we can't modify) where KMD is written to other
terms such as "Linux kernel driver" or "Host kernel driver", etc.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: NOmer Shpigelman <oshpigelman@habana.ai>

4c172bbf

habanalabs: add uapi to retrieve aggregate H/W events · e9730763

由 Oded Gabbay 提交于 8月 28, 2019

Add a new opcode to INFO IOCTL to retrieve aggregate H/W events. i.e. the
events counters are NOT cleared upon device reset, but count from the
loading of the driver.

Add the code to support it in the device event handling function.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: NOmer Shpigelman <oshpigelman@habana.ai>

e9730763

habanalabs: add uapi to retrieve device utilization · 75b3cb2b

由 Oded Gabbay 提交于 8月 28, 2019

Users and sysadmins usually want to know what is the device utilization as
a level 0 indication if they are efficiently using the device.

Add a new opcode to the INFO IOCTL that will return the device utilization
over the last period of 100-1000ms. The return value is 0-100,
representing as percentage the total utilization rate.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: NOmer Shpigelman <oshpigelman@habana.ai>

75b3cb2b

habanalabs: Expose devices after initialization is done · ea451f88

由 Tomer Tayar 提交于 8月 08, 2019

The char devices are currently exposed to user before the device and
driver initialization are done.
This patch moves the cdev and device adding to the system to the end of
the initialization sequence, while keeping the creation of the
structures at the beginning to allow the usage of dev_*().
Signed-off-by: NTomer Tayar <ttayar@habana.ai>
Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

ea451f88

habanalabs: create two char devices per ASIC · 4d6a7751

由 Oded Gabbay 提交于 7月 30, 2019

This patch changes the driver to create two char devices for each ASIC
it discovers. This is done to allow system/monitoring applications to
query the device for stats, information, idle state and more, while also
allowing the deep-learning application to send work to the ASIC.

One char device is the original device, hlX. IOCTL calls through this
device file can perform any task on the device (compute, memory, queries).
The open function for this device will fail if it was called before but
the file-descriptor it created was not completely released yet (the
release callback function is not called from the kernel until all
instances of that FD are closed). The driver needs to keep this behavior
to support backward compatibility with existing userspace, which count
that the open will fail if the device is "occupied".

The second char device is called "hl_controlDx", where x is the same index
of the main device with a minor number of the original char device + 1.
Applications that open this device can only call the INFO IOCTL. There is
no limitation on the number of applications opening this device.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

4d6a7751

habanalabs: maintain a list of file private data objects · eb7caf84

由 Oded Gabbay 提交于 7月 30, 2019

This patch adds a new list to the driver's device structure. The list will
keep the file private data structures that the driver creates when a user
process opens the device.

This change is needed because it is useless to try to count how many FD
are open. Instead, track our own private data structure per open file and
once it is released, remove it from the list. As long as the list is not
empty, it means we have a user that can do something with our device.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

eb7caf84

habanalabs: rename user_ctx as compute_ctx · 86d5307a

由 Oded Gabbay 提交于 7月 30, 2019

This patch renames the "user_ctx" field in the device structure to
"compute_ctx". This better reflects the meaning of this context.

In addition, we also check in the ctx_fini() that the debug mode should be
disabled only if the context being destroyed is the compute context. This
has no effect right now as we only have a single process and a single
context, but this makes the code more ready for multiple process support.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

86d5307a

habanalabs: add handle field to context structure · b888751a

由 Oded Gabbay 提交于 7月 15, 2019

This patch adds a field to the context's structure that will hold a unique
handle for the context.

This will be needed when the user will create the context.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b888751a

habanalabs: cap simulator timeout · ed0fc505

由 Oded Gabbay 提交于 7月 18, 2019

In the driver timeout functions, we give the simulator a factor of 10
in the timeout. This was necessary when the requested timeout is small
but if it was a few seconds, this can result in a very large timeout which
is unnecessary.

This patch caps the maximum timeout of the simulator to 10 seconds, which
is our largest timeout in the code. That is more then enough for anything
the simulator is doing.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: NOmer Shpigelman <oshpigelman@habana.ai>

ed0fc505

12 8月, 2019 1 次提交

habanalabs: fix endianness handling for internal QMAN submission · b9040c99

由 Oded Gabbay 提交于 8月 08, 2019

The PQs of internal H/W queues (QMANs) can be located in different memory
areas for different ASICs. Therefore, when writing PQEs, we need to use
the correct function according to the location of the PQ. e.g. if the PQ
is located in the device's memory (SRAM or DRAM), we need to use
memcpy_toio() so it would work in architectures that have separate
address ranges for IO memory.

This patch makes the code that writes the PQE to be ASIC-specific so we
can handle this properly per ASIC.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Tested-by: NBen Segal <bpsegal20@gmail.com>

b9040c99

29 7月, 2019 1 次提交

habanalabs: fix host memory polling in BE architecture · 2aa4e410

由 Ben Segal 提交于 7月 18, 2019

This patch fix a bug in the host memory polling macro. The bug is that the
memory being polled can be written by the device, which always writes it
in LE. However, if the host is running Linux in BE mode, we need to
convert the value that was written by the device before matching it to the
required value that the caller has given to the macro.
Signed-off-by: NBen Segal <bpsegal20@gmail.com>
Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

2aa4e410

01 7月, 2019 2 次提交

habanalabs: Add busy engines bitmask to HW idle IOCTL · e8960ca0

由 Tomer Tayar 提交于 7月 01, 2019

The information which is currently provided as a response to the
"HL_INFO_HW_IDLE" IOCTL is merely a general boolean value.
This patch extends it and provides also a bitmask that indicates which
of the device engines are busy.
Signed-off-by: NTomer Tayar <ttayar@habana.ai>
Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

e8960ca0

habanalabs: Add debugfs node for engines status · 06deb86a

由 Tomer Tayar 提交于 7月 01, 2019

Command submissions sent to the device are composed of command buffers
which are targeted to different device engines, like DMA and compute
entities. When a command submission gets stuck, knowing in which engine
the stuck is, is crucial for debugging.
This patch adds a debugfs node that exports this information, by
displaying the engines' various registers that assemble their idle/busy
status.
The information retrieval is based on the is_device_idle ASIC function.
The printout in this function, of the first detected busy engine, is
removed because it becomes redundant in the presence of the more
elaborated info of the new debugfs node.
Signed-off-by: NTomer Tayar <ttayar@habana.ai>
Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

06deb86a

29 5月, 2019 1 次提交

habanalabs: add MMU mappings for Goya CPU · 95b5a8b8

由 Oded Gabbay 提交于 5月 29, 2019

This patch adds the necessary MMU mappings for the Goya CPU to access the
device DRAM and the host memory.

The first 256MB of the device DRAM is being mapped. That's where the F/W
is running.

The 2MB area located on the host memory for the purpose of communication
between the driver and the device CPU is also being mapped.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

95b5a8b8

25 5月, 2019 1 次提交

habanalabs: halt debug engines on user process close · 89225ce4

由 Omer Shpigelman 提交于 5月 01, 2019

This patch fix a potential bug where a user's process has closed
unexpectedly without disabling the debug engines. In that case, the debug
engines might continue running but because the user's MMU mappings are
going away, we will get page fault errors.

This behavior is also opposed to the general rule where nothing runs on
the device after the user process closes.

The patch stops the debug H/W engines upon process termination and thus
makes sure nothing runs on the device after the process goes away.
Signed-off-by: NOmer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

89225ce4

17 5月, 2019 1 次提交

habanalabs: don't limit packet size for device CPU · cbb10f1e

由 Oded Gabbay 提交于 5月 17, 2019

This patch removes a limitation on the maximum packet size that is read by
the device CPU as that limitation is not needed.

Therefore, the patch also removes an elaborate calculation that is based
on this limitation which is also not needed now. Instead, use a fixed
value for the memory pool size of the packets.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

cbb10f1e

13 5月, 2019 1 次提交

habanalabs: increase PCI ELBI timeout for Palladium · a1e537b3

由 Omer Shpigelman 提交于 5月 13, 2019

This patch increases the timeout for PCI ELBI configuration to support low
frequency Palladium images.
Signed-off-by: NOmer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

a1e537b3

12 5月, 2019 1 次提交

habanalabs: pass device pointer to asic-specific function · 921a465b

由 Oded Gabbay 提交于 5月 12, 2019

This patch adds a new parameter that is passed to the
add_end_of_cb_packets() asic-specific function.

The parameter is the pointer to the driver's device structure. The
function needs this pointer for future ASICs.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

921a465b

09 5月, 2019 1 次提交

habanalabs: change polling functions to macros · a08b51a9

由 Oded Gabbay 提交于 5月 09, 2019

This patch changes two polling functions to macros, in order to make their
API the same as the standard readl_poll_timeout so we would be able to
define the "condition for exit" when calling these macros.

This will simplify the code as it will eliminate the need to check both
for timeout and for the (cond) in the calling function.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

a08b51a9

04 5月, 2019 1 次提交

habanalabs: force user to set device debug mode · 19734970

由 Oded Gabbay 提交于 5月 04, 2019

This patch adds the implementation of the HL_DEBUG_OP_SET_MODE opcode in
the DEBUG IOCTL.

It forces the user who wants to debug the device to set the device into
debug mode before he can configure the debug engines. The patch also makes
sure to disable debug mode upon user releasing FD, in case the user forgot
to disable debug mode.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

19734970

05 5月, 2019 1 次提交

habanalabs: minor documentation and prints fixes · d1287493

由 Omer Shpigelman 提交于 5月 05, 2019

This patch fixes comments on various structure members and some spelling
errors in log messages.
Signed-off-by: NOmer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

d1287493

30 4月, 2019 1 次提交

habanalabs: increase timeout if working with simulator · b1b53771

由 Dalit Ben Zoor 提交于 4月 30, 2019

Where there is a spike in the CPU consumption, it may cause
random failures in the C/I since the KMD timeout for CPU
and/or QMAN0 jobs expires and it stops communicating to the simulator.
This commit fixes it by increasing timeout on polling functions
if working with simulator.
Signed-off-by: NDalit Ben Zoor <dbenzoor@habana.ai>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

b1b53771

01 5月, 2019 3 次提交

habanalabs: remove redundant member from parser struct · 5809e18e

由 Dalit Ben Zoor 提交于 5月 01, 2019

use_virt_addr member was used for telling whether to treat the
addresses in the CB as virtual during parsing. We disabled it only
when calling the parser from the driver memset device function,
and since this call had been removed, it should always be enabled.
Signed-off-by: NDalit Ben Zoor <dbenzoor@habana.ai>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

5809e18e

habanalabs: Manipulate DMA addresses in ASIC functions · 94cb669c

由 Tomer Tayar 提交于 5月 01, 2019

Routing device accesses to the host memory requires the usage of a base
offset, which is canceled by the iATU just before leaving the device.
The value of the base offset might be distinctive between different ASIC
types.
The manipulation of the addresses is currently used throughout the
driver code, and one should be aware to it whenever providing a host
memory address to the device.
This patch removes this manipulation from the driver common code, and
moves it to the ASIC specific functions that are responsible for
host memory allocation/mapping.
Signed-off-by: NTomer Tayar <ttayar@habana.ai>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

94cb669c

habanalabs: rename functions to improve code readability · d9c3aa80

由 Oded Gabbay 提交于 5月 01, 2019

This patch renames four functions in the ASIC-specific functions section,
so it will be easier to differentiate them from the generic kernel
functions with the same name.

This will help in future code reviews, to make sure we don't use the
kernel functions directly.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

d9c3aa80

29 4月, 2019 1 次提交

habanalabs: Use single pool for CPU accessible host memory · 03d5f641

由 Tomer Tayar 提交于 4月 28, 2019

The device's CPU accessible memory on host is managed in a dedicated
pool, except for 2 regions - Primary Queue (PQ) and Event Queue (EQ) -
which are allocated from generic DMA pools.
Due to address length limitations of the CPU, the addresses of all these
memory regions must have the same MSBs starting at bit 40.
This patch modifies the allocation of the PQ and EQ to be also from the
dedicated pool, to ensure compliance with the limitation.
Signed-off-by: NTomer Tayar <ttayar@habana.ai>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

03d5f641

28 4月, 2019 1 次提交

habanalabs: return old dram bar address upon change · a38693d7

由 Oded Gabbay 提交于 4月 28, 2019

This patch changes the ASIC interface function that changes the DRAM bar
window. The change is to return the old address that the DRAM bar pointed
to instead of an error code.

This simplifies the code that use this function (mainly in debugfs) to
restore the bar to the old setting.

This is also needed for easier support in future ASICs.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

a38693d7

26 4月, 2019 1 次提交

habanalabs: rename restore to ctx_switch when appropriate · 027d35d0

由 Oded Gabbay 提交于 4月 25, 2019

This patch only does renaming of certain variables and structure members,
and their accompanied comments.

This is done to better reflect the actions these variables and members
represent.

There is no functional change in this patch.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

027d35d0

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功