Is it possible to somehow apply CNN layer to non-square (spectrogram) inputs?
Created by: F0REacH
I am trying to replicate model similar to DeepSpeech2 and neon example ( [spectrogram ~4000 timesteps each of size 512] -> [1D or 2D invariant convolutions] -> [bidirectional GRU] -> [fully connected layer] -> [warp-ctc] )
No problem with:
[bidirectional GRU] -> [fully connected layer] -> [warp-ctc]
But I have no idea how to add CNN layer.
Docs says:
img_conv_layer - Convolution layer for image. Paddle only support square input currently and thus input image’s width equals height.
I am providing data as:
settings.input_types = [dense_vector_sequence(512)]
Non-square CNN inputs are really not supported or there is some way to load data?