Created by: botov
The leak rootcause is a programming error. According my understanding, for each hl_init() method call that allocate memory (GPU and CPU) there should be corresponding hl_fini() method call.
5 places have been identified where this rule is broken. 4 in MultiGradientMachine and one in ParallelNeuralNetwork.
We use DeepSpeech working over PaddlePPaddle V2 in our project. The leak was consistently reproducible during inference (2 training threads on 2 CUDA devices). For leak detection nvidia-smi tool was used. No leaks are identified after the fix. We run 12 hours test session - ~180 000 audio files were inferred - no GPU memory growth detected.