An analysis of text mining factors enhancing the identification of relevant studies

Mouayad Khashfeh, Moamin A. Mahmoud, Mohd Sharifuddin Ahmad

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

The development of science and the spread of knowledge coincide with growing number of publications, and the volume of online content continue to grow at a rapid rate. For some submitted queries, the search engines may return thousands of documents of questionable relevancy. In this paper, we analyze the literature and identify the text mining factors that influence the identification of relevant studies. Five factors are identified which are Text Typography; Paragraph length; Term Frequency factor; Coordination; and Strict search. Subsequently, we propose an agent based-text mining model that facilitate the identification of relevant studies in big databases. The model consists of four components which are, interface, search process, parsing process, and storage. The interface provides a communication mean between a user and his/her counterpart agent (Personal Agent). In addition, it provides an input tool for user’s search preferences. The second component is the search process that is operated by a pattern matching. The third process is the parsing that is operated by a text mining algorithm. The last part is the storage that is managed by Monitor Agent. The proposed framework would be useful in providing an alternative means of searching highly relevant studies from large databases.

Original languageEnglish
Pages (from-to)3896-3907
Number of pages12
JournalJournal of Theoretical and Applied Information Technology
Volume96
Issue number12
Publication statusPublished - 30 Jun 2018

Fingerprint

Text Mining
Parsing
Pattern matching
Pattern Matching
Search engines
Search Engine
Identification (control systems)
Monitor
Continue
Query
Alternatives
Communication
Term
Model

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

@article{f9062f1da24d4763ae7fcedbdf47ecce,
title = "An analysis of text mining factors enhancing the identification of relevant studies",
abstract = "The development of science and the spread of knowledge coincide with growing number of publications, and the volume of online content continue to grow at a rapid rate. For some submitted queries, the search engines may return thousands of documents of questionable relevancy. In this paper, we analyze the literature and identify the text mining factors that influence the identification of relevant studies. Five factors are identified which are Text Typography; Paragraph length; Term Frequency factor; Coordination; and Strict search. Subsequently, we propose an agent based-text mining model that facilitate the identification of relevant studies in big databases. The model consists of four components which are, interface, search process, parsing process, and storage. The interface provides a communication mean between a user and his/her counterpart agent (Personal Agent). In addition, it provides an input tool for user’s search preferences. The second component is the search process that is operated by a pattern matching. The third process is the parsing that is operated by a text mining algorithm. The last part is the storage that is managed by Monitor Agent. The proposed framework would be useful in providing an alternative means of searching highly relevant studies from large databases.",
author = "Mouayad Khashfeh and {A. Mahmoud}, Moamin and Ahmad, {Mohd Sharifuddin}",
year = "2018",
month = "6",
day = "30",
language = "English",
volume = "96",
pages = "3896--3907",
journal = "Journal of Theoretical and Applied Information Technology",
issn = "1992-8645",
publisher = "Asian Research Publishing Network (ARPN)",
number = "12",

}

An analysis of text mining factors enhancing the identification of relevant studies. / Khashfeh, Mouayad; A. Mahmoud, Moamin; Ahmad, Mohd Sharifuddin.

In: Journal of Theoretical and Applied Information Technology, Vol. 96, No. 12, 30.06.2018, p. 3896-3907.

Research output: Contribution to journalArticle

TY - JOUR

T1 - An analysis of text mining factors enhancing the identification of relevant studies

AU - Khashfeh, Mouayad

AU - A. Mahmoud, Moamin

AU - Ahmad, Mohd Sharifuddin

PY - 2018/6/30

Y1 - 2018/6/30

N2 - The development of science and the spread of knowledge coincide with growing number of publications, and the volume of online content continue to grow at a rapid rate. For some submitted queries, the search engines may return thousands of documents of questionable relevancy. In this paper, we analyze the literature and identify the text mining factors that influence the identification of relevant studies. Five factors are identified which are Text Typography; Paragraph length; Term Frequency factor; Coordination; and Strict search. Subsequently, we propose an agent based-text mining model that facilitate the identification of relevant studies in big databases. The model consists of four components which are, interface, search process, parsing process, and storage. The interface provides a communication mean between a user and his/her counterpart agent (Personal Agent). In addition, it provides an input tool for user’s search preferences. The second component is the search process that is operated by a pattern matching. The third process is the parsing that is operated by a text mining algorithm. The last part is the storage that is managed by Monitor Agent. The proposed framework would be useful in providing an alternative means of searching highly relevant studies from large databases.

AB - The development of science and the spread of knowledge coincide with growing number of publications, and the volume of online content continue to grow at a rapid rate. For some submitted queries, the search engines may return thousands of documents of questionable relevancy. In this paper, we analyze the literature and identify the text mining factors that influence the identification of relevant studies. Five factors are identified which are Text Typography; Paragraph length; Term Frequency factor; Coordination; and Strict search. Subsequently, we propose an agent based-text mining model that facilitate the identification of relevant studies in big databases. The model consists of four components which are, interface, search process, parsing process, and storage. The interface provides a communication mean between a user and his/her counterpart agent (Personal Agent). In addition, it provides an input tool for user’s search preferences. The second component is the search process that is operated by a pattern matching. The third process is the parsing that is operated by a text mining algorithm. The last part is the storage that is managed by Monitor Agent. The proposed framework would be useful in providing an alternative means of searching highly relevant studies from large databases.

UR - http://www.scopus.com/inward/record.url?scp=85049435241&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85049435241&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:85049435241

VL - 96

SP - 3896

EP - 3907

JO - Journal of Theoretical and Applied Information Technology

JF - Journal of Theoretical and Applied Information Technology

SN - 1992-8645

IS - 12

ER -