Improve backend performance
Created by: Superjomn
Currently, some implementation of the backend is naive, work not slow. For that, the user will embed backend SDK into their training phase, and the backend will be triggered frequently, so the logger's performance is crucial.
A rough look at the details, there are several modules should be tuned, I list them based on the importance order:
-
storage/Storage::PersistToDisk
, this method will save all the tablets from memory into disk even if some of them are not changed at all. -
WRITE_GUARD
, it is a trick to use a counter and mod some frequency to avoid the need for concurrency. But thewrite
operation takes overhead, it better to use an async operation instead. - Adding record is expensive, for example,
Image
's record adding needs to rescale all the pixels, such operations should change to asnyc.
All in all, there are two aspects to improve. First, the PersistToDist
should ignore the tablets that havn't changed; second, some expensive operations should change to async ones.
The first issue is quite intuitive; let's focus on the second one.
For async tasks, thread queue is a good choice, but not suitable for this task. The operations on tablets have some dependencies which are hard to describe by stateless threads, and it is painful to introduce more condition variable or mutex. Dependency engine is a good choice, it handles dependencies naturally, and support concurrency programming without the need for condition variable or mutex.
dependency engine as a concurrent programming framework
VisualDL might be used in a parallel system, that is the SDK might be called parallelly. The dependency engine is similar to a task queue; the tasks can be added parallelly with a single mutex to protect the internal states.
The tasks can be executed parallelly by a thread pool. Both the state control and thread pool are hidden in the dependency engine, the change to VisualDL is just the task pushing logic.
For example, the heavy Image::SetSample can embed into a task and accelerated by the underlying thread-pool.
performance stats
We can reference CPU prof to get some details of the backend performance.