Persian ATIS
2021-11
Despite Persian being one of the top 10 languages utilized on the internet, as highlighted by Wikipedia, it continues to be classified as a low-resource language, particularly in the realm of academic research and Natural Language Processing (NLP). This scarcity is particularly evident when it comes to essential datasets for intent detection and slot filling—two critical components of effective dialogue systems.
Recognizing this significant gap in resources, our team at the NLPIC lab embarked on a mission to create a comprehensive dataset and benchmark tailored specifically for Persian. Our journey began with the meticulous translation of the renowned ATIS dataset into Persian, allowing us to explore how various models would perform within this newly translated context.
In our pursuit of knowledge, we not only gathered but also rigorously tested state-of-the-art models on both the original and Persian versions of the dataset. This process was not merely an academic exercise; it was a commitment to advancing the field of NLP for Persian speakers. We documented our findings in a paper, contributing to the growing body of research in this area.
For those interested in our work, the codes and dataset can be found on GitHub: Persian ATIS. Additionally, our paper detailing the research is available on arXiv: arXiv.