提交 · cb056b9fd5138748dca7b679ea5f16b6bd24fb6c · openeuler / Kernel

17 5月, 2020 5 次提交

habanalabs: retrieve DMA mask indication from firmware · cb056b9f

由 Oded Gabbay 提交于 3月 29, 2020

Retrieve from the firmware the DMA mask value we need to set according to
the device's PCI controller configuration. This is needed when working on
POWER9 machines, as the device's PCI controller is configured in a
different way in those machines.
Reviewed-by: NTomer Tayar <ttayar@habana.ai>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

cb056b9f

habanalabs: increase timeout during reset · 7a65ee04

由 Oded Gabbay 提交于 3月 27, 2020

When doing training, the DL framework (e.g. tensorflow) performs hundreds
of thousands of memory allocations and mappings. In case the driver needs
to perform hard-reset during training, the driver kills the application and
unmaps all those memory allocations. Unfortunately, because of that large
amount of mappings, the driver isn't able to do that in the current timeout
(5 seconds). Therefore, increase the timeout significantly to 30 seconds
to avoid situation where the driver resets the device with active mappings,
which sometime can cause a kernel bug.

BTW, it doesn't mean we will spend all the 30 seconds because the reset
thread checks every one second if the unmap operation is done.
Reviewed-by: NOmer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

7a65ee04

habanalabs: unify and improve device cpu init · 7e1c07dd

由 Oded Gabbay 提交于 3月 26, 2020

Move the code of device CPU initialization from being ASIC-Dependent to
common code. In addition, add support for the new error reporting feature
of the firmware boot code.
Reviewed-by: NOmer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

7e1c07dd

habanalabs: re-factor H/W queues initialization · 1fa185c6

由 Omer Shpigelman 提交于 3月 01, 2020

We want to remove the following restrictions/assumptions in our driver:
1. The H/W queue index is also the completion queue index.
2. The H/W queue index is also the IRQ number of the completion queue.
3. All queues of the same type have consecutive indexes.

Therefore we add the support for H/W queues of the same type with
nonconsecutive indexes and completion queue index and IRQ number different
than the H/W queue index.
Signed-off-by: NOmer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

1fa185c6

habanalabs: remove stop-on-error flag from DMA · 76cedc73

由 Omer Shpigelman 提交于 3月 22, 2020

Stop-on-error mode in DMA is useful as it stops the transaction
immediately upon error e.g. page fault.
But it may cause the next command submission to fail as is leaves the DMA
in unstable state.
Therefore we remove the stop-on-error configuration from the DMA.
Stop-on-err is still available for debug.
Signed-off-by: NOmer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

76cedc73

24 3月, 2020 7 次提交

habanalabs: show unsupported message for GAUDI · 6966d9e1

由 Oded Gabbay 提交于 3月 21, 2020

If a GAUDI device is present in the system, display an error message that
it is not supported by the current kernel.
Reviewed-by: NOmer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

6966d9e1

habanalabs: modify the return values of hl_read/write routines · d57b83c3

由 Moti Haimovski 提交于 1月 23, 2020

The hl read and write routines implement the hwmon_ops read and write
interface routines respectively.
These routines are expected to return a completion status when called,
which was not the case until this commit.
This commit modifies these routines to return 0 upon success and a
negative error value upon failure.
Signed-off-by: NMoti Haimovski <mhaimovski@habana.ai>
Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

d57b83c3

habanalabs: support temperature offset via sysfs · 5557b138

由 Moti Haimovski 提交于 1月 21, 2020

This commit adds support for offsetting the temperatures reading
by a specified value as defined in
https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface
using the standard sysfs defined for hwmon.
This is required by system administrators to inject errors to test
their monitoring applications in data centers.
Signed-off-by: NMoti Haimovski <mhaimovski@habana.ai>
Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

5557b138

habanalabs: add debugfs write64/read64 · 5cce5146

由 Moti Haimovski 提交于 11月 12, 2019

Allow debug user to write/read 64-bit data through debugfs.
This will expedite the dump process of the (large) internal
memories of the device done during debug.
Signed-off-by: NMoti Haimovski <mhaimovski@habana.ai>
Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

5cce5146

habanalabs: Modify CS jobs counter to u16 · f3a838c0

由 Tomer Tayar 提交于 1月 05, 2020

As HL_MAX_JOBS_PER_CS is 512, it is possible that more than 255 CS jobs
will be submitted for a certain queue. Hence, modify the
"jobs_in_queue_cnt" parameter of the "hl_cs" structure to be u16 instead
of u8.
Signed-off-by: NTomer Tayar <ttayar@habana.ai>
Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

f3a838c0

habanalabs: split the host MMU properties · 64a7e295

由 Omer Shpigelman 提交于 1月 05, 2020

Host memory may be allocated with huge pages.
A different virtual range may be used for mapping in this case.
Add Huge PCI MMU (HPMMU) properties to support it.
This patch is a prerequisite for future ASICs support and has no effect on
Goya ASIC as currently a single virtual host range is used for all page
sizes.
Signed-off-by: NOmer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

64a7e295

habanalabs: flush only at the end of the map/unmap · 7fc40bca

由 Pawel Piskorski 提交于 12月 06, 2019

Optimize hl_mmu_map and hl_mmu_unmap by not calling flush(ctx)
within per-page loop.
Signed-off-by: NPawel Piskorski <ppiskorski@habana.ai>
Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

7fc40bca

21 11月, 2019 10 次提交

habanalabs: split MMU properties to PCI/DRAM · 54bb6744

由 Omer Shpigelman 提交于 11月 14, 2019

Split the properties used for MMU mappings to DRAM and PCI (host) types.
This is a prerequisite for future ASICs support.
Note that in Goya ASIC, the PMMU and DMMU are the same (except of page
sizes) as only one MMU mechanism is used for both of the mapping types.
Hence this patch should not have any effect on current behavior.
Signed-off-by: NOmer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

54bb6744

habanalabs: type specific MMU cache invalidation · 7b6e4ea0

由 Omer Shpigelman 提交于 11月 14, 2019

Add the ability to invalidate the necessary MMU cache only.
This ability is a prerequisite for future ASICs support.
Note that in Goya ASIC, a single cache is used for both host/DRAM
mappings and hence this patch should not have any effect on current
behavior.
Signed-off-by: NOmer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

7b6e4ea0

habanalabs: re-factor memory module code · 7f74d4d3

由 Omer Shpigelman 提交于 8月 12, 2019

Some of the functions in the memory module code were too long and/or
contained multiple operations that are not always done together. Re-factor
the code by dividing those functions to smaller functions which are more
readable and maintainable.
Signed-off-by: NOmer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

7f74d4d3

habanalabs: export uapi defines to user-space · 5d101257

由 Oded Gabbay 提交于 11月 10, 2019

The two defines that control the maximum size of a command buffer and the
maximum number of JOBS per CS need to be exported to the user as they are
part of the API towards user-space.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: NOmer Shpigelman <oshpigelman@habana.ai>

5d101257

habanalabs: increase max jobs number to 512 · bd4c8cb1

由 Oded Gabbay 提交于 11月 09, 2019

In training, there is a need for a large amount of patching to the recipe.
This results in many command buffers contains a lot of DMA packets. The
number of command buffers per CS is larger than the current maximum of 64,
which is an arbitrary number that is enough for inference, but it has no
real affect on the code and/or resources of the host machine.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: NOmer Shpigelman <oshpigelman@habana.ai>

bd4c8cb1

habanalabs: add opcode to INFO IOCTL to return clock rate · 62c1e124

由 Oded Gabbay 提交于 10月 10, 2019

Add a new opcode to the INFO IOCTL to allow the user application to
retrieve the ASIC's current and maximum clock rate. The rate is
returned in MHz.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: NTomer Tayar <ttayar@habana.ai>

62c1e124

habanalabs: set TPC Icache to 16 cache lines · 8fdacf2a

由 Oded Gabbay 提交于 10月 02, 2019

Reduce latency to memory during TPC kernel execution.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: NTomer Tayar <ttayar@habana.ai>

8fdacf2a

habanalabs: Add a new H/W queue type · cb596aee

由 Tomer Tayar 提交于 10月 03, 2019

This patch adds a support for a new H/W queue type.
This type of queue is for DMA and compute engines jobs, for which
completion notification are sent by H/W.
Command buffer for this queue can be created either through the CB
IOCTL and using the retrieved CB handle, or by preparing a buffer on the
host or device SRAM/DRAM, and using the device address to that buffer.
The patch includes the handling of the 2 options, as well as the
initialization of the H/W queue and its jobs scheduling.
Signed-off-by: NTomer Tayar <ttayar@habana.ai>
Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

cb596aee

habanalabs: Mark queue as expecting CB handle or address · df762375

由 Tomer Tayar 提交于 10月 03, 2019

Jobs on some queues must be provided with a handle to a driver command
buffer object, while for other queues, jobs must be provided with an
address to a command buffer.
Currently the distinction is done based on the queue type, which is less
flexible if the same queue type behaves differently on different
types of ASICs.
This patch adds a new queue property for this target, which is
configured per queue type per ASIC type.
Signed-off-by: NTomer Tayar <ttayar@habana.ai>
Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

df762375

habanalabs: Fix typos · f435614f

由 Tomer Tayar 提交于 10月 02, 2019

s/paerser/parser/
s/requeusted/requested/
s/an JOB/a JOB/
Signed-off-by: NTomer Tayar <ttayar@habana.ai>
Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

f435614f

05 9月, 2019 10 次提交

habanalabs: correctly cast variable to __le32 · 6dc66f7c

由 Oded Gabbay 提交于 9月 03, 2019

When using the macro le32_to_cpu(x), we need to correctly convert x to be
__le32 in case it is defined as u32 variable.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: NTomer Tayar <ttayar@habana.ai>

6dc66f7c

habanalabs: stop using the acronym KMD · 4c172bbf

由 Oded Gabbay 提交于 8月 30, 2019

We want to stop using the acronym KMD. Therefore, replace all locations
(except for register names we can't modify) where KMD is written to other
terms such as "Linux kernel driver" or "Host kernel driver", etc.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: NOmer Shpigelman <oshpigelman@habana.ai>

4c172bbf

habanalabs: add uapi to retrieve aggregate H/W events · e9730763

由 Oded Gabbay 提交于 8月 28, 2019

Add a new opcode to INFO IOCTL to retrieve aggregate H/W events. i.e. the
events counters are NOT cleared upon device reset, but count from the
loading of the driver.

Add the code to support it in the device event handling function.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: NOmer Shpigelman <oshpigelman@habana.ai>

e9730763

habanalabs: add uapi to retrieve device utilization · 75b3cb2b

由 Oded Gabbay 提交于 8月 28, 2019

Users and sysadmins usually want to know what is the device utilization as
a level 0 indication if they are efficiently using the device.

Add a new opcode to the INFO IOCTL that will return the device utilization
over the last period of 100-1000ms. The return value is 0-100,
representing as percentage the total utilization rate.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: NOmer Shpigelman <oshpigelman@habana.ai>

75b3cb2b

habanalabs: Expose devices after initialization is done · ea451f88

由 Tomer Tayar 提交于 8月 08, 2019

The char devices are currently exposed to user before the device and
driver initialization are done.
This patch moves the cdev and device adding to the system to the end of
the initialization sequence, while keeping the creation of the
structures at the beginning to allow the usage of dev_*().
Signed-off-by: NTomer Tayar <ttayar@habana.ai>
Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

ea451f88

habanalabs: create two char devices per ASIC · 4d6a7751

由 Oded Gabbay 提交于 7月 30, 2019

This patch changes the driver to create two char devices for each ASIC
it discovers. This is done to allow system/monitoring applications to
query the device for stats, information, idle state and more, while also
allowing the deep-learning application to send work to the ASIC.

One char device is the original device, hlX. IOCTL calls through this
device file can perform any task on the device (compute, memory, queries).
The open function for this device will fail if it was called before but
the file-descriptor it created was not completely released yet (the
release callback function is not called from the kernel until all
instances of that FD are closed). The driver needs to keep this behavior
to support backward compatibility with existing userspace, which count
that the open will fail if the device is "occupied".

The second char device is called "hl_controlDx", where x is the same index
of the main device with a minor number of the original char device + 1.
Applications that open this device can only call the INFO IOCTL. There is
no limitation on the number of applications opening this device.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

4d6a7751

habanalabs: maintain a list of file private data objects · eb7caf84

由 Oded Gabbay 提交于 7月 30, 2019

This patch adds a new list to the driver's device structure. The list will
keep the file private data structures that the driver creates when a user
process opens the device.

This change is needed because it is useless to try to count how many FD
are open. Instead, track our own private data structure per open file and
once it is released, remove it from the list. As long as the list is not
empty, it means we have a user that can do something with our device.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

eb7caf84

habanalabs: rename user_ctx as compute_ctx · 86d5307a

由 Oded Gabbay 提交于 7月 30, 2019

This patch renames the "user_ctx" field in the device structure to
"compute_ctx". This better reflects the meaning of this context.

In addition, we also check in the ctx_fini() that the debug mode should be
disabled only if the context being destroyed is the compute context. This
has no effect right now as we only have a single process and a single
context, but this makes the code more ready for multiple process support.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

86d5307a

habanalabs: add handle field to context structure · b888751a

由 Oded Gabbay 提交于 7月 15, 2019

This patch adds a field to the context's structure that will hold a unique
handle for the context.

This will be needed when the user will create the context.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b888751a

habanalabs: cap simulator timeout · ed0fc505

由 Oded Gabbay 提交于 7月 18, 2019

In the driver timeout functions, we give the simulator a factor of 10
in the timeout. This was necessary when the requested timeout is small
but if it was a few seconds, this can result in a very large timeout which
is unnecessary.

This patch caps the maximum timeout of the simulator to 10 seconds, which
is our largest timeout in the code. That is more then enough for anything
the simulator is doing.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: NOmer Shpigelman <oshpigelman@habana.ai>

ed0fc505

12 8月, 2019 1 次提交

habanalabs: fix endianness handling for internal QMAN submission · b9040c99

由 Oded Gabbay 提交于 8月 08, 2019

The PQs of internal H/W queues (QMANs) can be located in different memory
areas for different ASICs. Therefore, when writing PQEs, we need to use
the correct function according to the location of the PQ. e.g. if the PQ
is located in the device's memory (SRAM or DRAM), we need to use
memcpy_toio() so it would work in architectures that have separate
address ranges for IO memory.

This patch makes the code that writes the PQE to be ASIC-specific so we
can handle this properly per ASIC.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
Tested-by: NBen Segal <bpsegal20@gmail.com>

b9040c99

29 7月, 2019 1 次提交

habanalabs: fix host memory polling in BE architecture · 2aa4e410

由 Ben Segal 提交于 7月 18, 2019

This patch fix a bug in the host memory polling macro. The bug is that the
memory being polled can be written by the device, which always writes it
in LE. However, if the host is running Linux in BE mode, we need to
convert the value that was written by the device before matching it to the
required value that the caller has given to the macro.
Signed-off-by: NBen Segal <bpsegal20@gmail.com>
Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

2aa4e410

01 7月, 2019 2 次提交

habanalabs: Add busy engines bitmask to HW idle IOCTL · e8960ca0

由 Tomer Tayar 提交于 7月 01, 2019

The information which is currently provided as a response to the
"HL_INFO_HW_IDLE" IOCTL is merely a general boolean value.
This patch extends it and provides also a bitmask that indicates which
of the device engines are busy.
Signed-off-by: NTomer Tayar <ttayar@habana.ai>
Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

e8960ca0

habanalabs: Add debugfs node for engines status · 06deb86a

由 Tomer Tayar 提交于 7月 01, 2019

Command submissions sent to the device are composed of command buffers
which are targeted to different device engines, like DMA and compute
entities. When a command submission gets stuck, knowing in which engine
the stuck is, is crucial for debugging.
This patch adds a debugfs node that exports this information, by
displaying the engines' various registers that assemble their idle/busy
status.
The information retrieval is based on the is_device_idle ASIC function.
The printout in this function, of the first detected busy engine, is
removed because it becomes redundant in the presence of the more
elaborated info of the new debugfs node.
Signed-off-by: NTomer Tayar <ttayar@habana.ai>
Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

06deb86a

29 5月, 2019 1 次提交

habanalabs: add MMU mappings for Goya CPU · 95b5a8b8

由 Oded Gabbay 提交于 5月 29, 2019

This patch adds the necessary MMU mappings for the Goya CPU to access the
device DRAM and the host memory.

The first 256MB of the device DRAM is being mapped. That's where the F/W
is running.

The 2MB area located on the host memory for the purpose of communication
between the driver and the device CPU is also being mapped.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

95b5a8b8

25 5月, 2019 1 次提交

habanalabs: halt debug engines on user process close · 89225ce4

由 Omer Shpigelman 提交于 5月 01, 2019

This patch fix a potential bug where a user's process has closed
unexpectedly without disabling the debug engines. In that case, the debug
engines might continue running but because the user's MMU mappings are
going away, we will get page fault errors.

This behavior is also opposed to the general rule where nothing runs on
the device after the user process closes.

The patch stops the debug H/W engines upon process termination and thus
makes sure nothing runs on the device after the process goes away.
Signed-off-by: NOmer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

89225ce4

17 5月, 2019 1 次提交

habanalabs: don't limit packet size for device CPU · cbb10f1e

由 Oded Gabbay 提交于 5月 17, 2019

This patch removes a limitation on the maximum packet size that is read by
the device CPU as that limitation is not needed.

Therefore, the patch also removes an elaborate calculation that is based
on this limitation which is also not needed now. Instead, use a fixed
value for the memory pool size of the packets.
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

cbb10f1e

13 5月, 2019 1 次提交

habanalabs: increase PCI ELBI timeout for Palladium · a1e537b3

由 Omer Shpigelman 提交于 5月 13, 2019

This patch increases the timeout for PCI ELBI configuration to support low
frequency Palladium images.
Signed-off-by: NOmer Shpigelman <oshpigelman@habana.ai>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>

a1e537b3

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功