Unable to minimize the CTC training error on large sequences (>1000 timesteps)
Created by: F0REacH
I've tried to compare warp-ctc torch7 bindings with Paddle CTC -> test repo see logs Warp-ctc torch implementation - works just fine. Paddle with the same data and the same network structure can't successfully minimize the training error.
I am unable to isolate the problem by changing hyperparameters, network type/size, optim. functions, switching between CPU/GPU. It seems the only solution that works is to truncate sequences to ~500 timesteps.
Any possible solutions/suggestions how to make this work with large varying length sequences (3000-5000 timesteps each)?
Using latest master as:
cmake -DWITH_GPU=ON -DWITH_DOC=OFF -DCMAKE_INSTALL_PREFIX=/opt/paddle ..