The tool was built by Lucy Linder with the supervision of Jean Hennebert & Andreas Fischer and is composed of more than half a million sentences, which were generated using a customized web scraping tool that could be applied to other low-resource languages as well.
Want to inspect the code? Click here.
Want to know a bit more about the proceedings? Read the arXiv paper here.
And/or read the #LREC2020 paper here.