Created by: mkliegl
This closes #4574 (closed) .
This is an unoptimized Eigen-based port of the existing code:
Examples of possible future optimizations include:
- For M >> log N or N >> log M, it could make sense to use FFT to do circular convolution.
- Instead of a loop over the batch dimension on the outside, it could be faster to do a batchwise operation inside the loop like
out.col(i) = x.col(index) * y.col(j)
. This would make the most sense if we either had column-major storage order or if the batch dimension were last rather than first.
I would probably leave such optimizations to the future when we can profile an actual use case and see whether the improvements are worth it.
@dzhwinter Would you mind taking a look? This is my first PR to this repo - your feedback would be much appreciated!