A COMPARATIVE ANALYSIS OF TEXT SIMILARITY MEASURES AND ALGORITHMS IN RESEARCH PAPER RECOMMENDER SYSTEMS

Noore Ilahi, Mukka Shirisha

Authors

Noore Ilahi, Mukka Shirisha Author

Abstract

Recent advancements in the internet and web technologies are responsible for the surge in online published research articles. Because of the information boom, academics and internet users have a hard time obtaining relevant and reliable data. Finding the optimal mix of algorithms and similarity metrics for research paper recommender systems' article search and recommendation is the goal of this work. In this study, we used text similarity metrics with non-linear classification methods. While several similarity measures are evaluated using existing datasets, an offline assessment method is used to ascertain the correctness and performance of the models. Boosted, Recursive PARTitioning (rpart), and Random Forest are a few machine learning techniques that will be used to datasets that measure the similarity of research papers. With an average accuracy of 80.73 and a time efficiency of 2.354628 seconds, the rpart method outperformed the Boosted and Random Forest algorithms, respectively. When compared to other similarity measures, cosine similarity fared the best. There will be a proposal for new metrics and measurements of similarity. In this study, we show that when trying to build models for research paper similarity assessment and recommendation, there are superior metric and algorithm combinations to apply. We also found several other problems and unanswered questions.

Downloads

Download data is not yet available.

A COMPARATIVE ANALYSIS OF TEXT SIMILARITY MEASURES AND ALGORITHMS IN RESEARCH PAPER RECOMMENDER SYSTEMS

Authors

Abstract

Downloads

Downloads

Published

Issue

Section

INFO

SCOPUS

SCIMAGO

Latest publications

Make a Submission

Language

Information

Developed By