A COMPARATIVE ANALYSIS OF TEXT SIMILARITY MEASURES AND ALGORITHMS IN RESEARCH PAPER RECOMMENDER SYSTEMS
Abstract
Recent advancements in the internet and web technologies are responsible for the surge in online published research articles. Because of the information boom, academics and internet users have a hard time obtaining relevant and reliable data. Finding the optimal mix of algorithms and similarity metrics for research paper recommender systems' article search and recommendation is the goal of this work. In this study, we used text similarity metrics with non-linear classification methods. While several similarity measures are evaluated using existing datasets, an offline assessment method is used to ascertain the correctness and performance of the models. Boosted, Recursive PARTitioning (rpart), and Random Forest are a few machine learning techniques that will be used to datasets that measure the similarity of research papers. With an average accuracy of 80.73 and a time efficiency of 2.354628 seconds, the rpart method outperformed the Boosted and Random Forest algorithms, respectively. When compared to other similarity measures, cosine similarity fared the best. There will be a proposal for new metrics and measurements of similarity. In this study, we show that when trying to build models for research paper similarity assessment and recommendation, there are superior metric and algorithm combinations to apply. We also found several other problems and unanswered questions.