Topic Modeling
2022-06
I was familiar with topic modeling, particularly using the LDA algorithm Blei et al.. However, it works best on longer documents, which led me to wonder: what about short texts, like tweets? How would algorithms perform in different languages, and how do you choose the best one for each case?
These questions shaped our project. My supervisor gave me significant responsibility, which pushed me to improve rapidly. We curated a dataset of short texts from Twitter, labeled them, and tested various topic modeling algorithms to evaluate their performance. We also developed a tool for other researchers to experiment with these algorithms and conduct their own evaluations. Codes and dataset: GitHub Paper: PrePrint