Why is there a constant score for OOV?
Created by: ankitmundada
This line gives a score of -1000 (which is declared here), to any n-gram which contains an OOV. Why have you used this approach instead of getting an OOV based score from the language model itself? My guess is because the LM used by you is developed using common_crawl dataset and is heavily pruned, so it makes sense to have a much stricter OOV check. Is the same approach necessary for a language model which is not built using Common_crawl and uses our own higher quality dataset?