Document classification based on kNN algorithm by term vector space reduction

Aiman Moldagulova, Rosnafisah Bte Sulaiman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Nowadays there is an increasing interest in the area of unstructured data analysis. The vast majority of unstructured data belongs to unstructured text data. Retrieving useful information from huge volume of unstructured text data is very challenging task. Text mining is a thought-provoking research area as it tries to discover knowledge from unstructured text. This paper deals with methods used for handling unstructured text data in particular document classification problems. Most document classification methods based on term vector space model of representation of unstructured textual data. The term vector space model is easy to implement, provides uniform representation for documents. However feature space for a large collection of documents can reach millions and be sparse. One of the issues is to reduce the dimension of the term-document matrix. In this research we proposed an approach for reduction of term vector space in KNN algorithm.

Original languageEnglish
Title of host publicationInternational Conference on Control, Automation and Systems
PublisherIEEE Computer Society
Pages387-391
Number of pages5
ISBN (Electronic)9788993215151
Publication statusPublished - 10 Dec 2018
Event18th International Conference on Control, Automation and Systems, ICCAS 2018 - PyeongChang, Korea, Republic of
Duration: 17 Oct 201820 Oct 2018

Publication series

NameInternational Conference on Control, Automation and Systems
Volume2018-October
ISSN (Print)1598-7833

Other

Other18th International Conference on Control, Automation and Systems, ICCAS 2018
CountryKorea, Republic of
CityPyeongChang
Period17/10/1820/10/18

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Science Applications
  • Control and Systems Engineering
  • Electrical and Electronic Engineering

Cite this

Moldagulova, A., & Sulaiman, R. B. (2018). Document classification based on kNN algorithm by term vector space reduction. In International Conference on Control, Automation and Systems (pp. 387-391). [8571540] (International Conference on Control, Automation and Systems; Vol. 2018-October). IEEE Computer Society.