Document classification based on kNN algorithm by term vector space reduction

Aiman Moldagulova, Rosnafisah Sulaiman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Nowadays there is an increasing interest in the area of unstructured data analysis. The vast majority of unstructured data belongs to unstructured text data. Retrieving useful information from huge volume of unstructured text data is very challenging task. Text mining is a thought-provoking research area as it tries to discover knowledge from unstructured text. This paper deals with methods used for handling unstructured text data in particular document classification problems. Most document classification methods based on term vector space model of representation of unstructured textual data. The term vector space model is easy to implement, provides uniform representation for documents. However feature space for a large collection of documents can reach millions and be sparse. One of the issues is to reduce the dimension of the term-document matrix. In this research we proposed an approach for reduction of term vector space in KNN algorithm.

Original languageEnglish
Title of host publicationInternational Conference on Control, Automation and Systems
PublisherIEEE Computer Society
Pages387-391
Number of pages5
ISBN (Electronic)9788993215151
Publication statusPublished - 10 Dec 2018
Event18th International Conference on Control, Automation and Systems, ICCAS 2018 - PyeongChang, Korea, Republic of
Duration: 17 Oct 201820 Oct 2018

Publication series

NameInternational Conference on Control, Automation and Systems
Volume2018-October
ISSN (Print)1598-7833

Other

Other18th International Conference on Control, Automation and Systems, ICCAS 2018
CountryKorea, Republic of
CityPyeongChang
Period17/10/1820/10/18

Fingerprint

Vector spaces

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Science Applications
  • Control and Systems Engineering
  • Electrical and Electronic Engineering

Cite this

Moldagulova, A., & Sulaiman, R. (2018). Document classification based on kNN algorithm by term vector space reduction. In International Conference on Control, Automation and Systems (pp. 387-391). [8571540] (International Conference on Control, Automation and Systems; Vol. 2018-October). IEEE Computer Society.
Moldagulova, Aiman ; Sulaiman, Rosnafisah. / Document classification based on kNN algorithm by term vector space reduction. International Conference on Control, Automation and Systems. IEEE Computer Society, 2018. pp. 387-391 (International Conference on Control, Automation and Systems).
@inproceedings{5f306a9099ff426fa6a01ea7fbfb27cb,
title = "Document classification based on kNN algorithm by term vector space reduction",
abstract = "Nowadays there is an increasing interest in the area of unstructured data analysis. The vast majority of unstructured data belongs to unstructured text data. Retrieving useful information from huge volume of unstructured text data is very challenging task. Text mining is a thought-provoking research area as it tries to discover knowledge from unstructured text. This paper deals with methods used for handling unstructured text data in particular document classification problems. Most document classification methods based on term vector space model of representation of unstructured textual data. The term vector space model is easy to implement, provides uniform representation for documents. However feature space for a large collection of documents can reach millions and be sparse. One of the issues is to reduce the dimension of the term-document matrix. In this research we proposed an approach for reduction of term vector space in KNN algorithm.",
author = "Aiman Moldagulova and Rosnafisah Sulaiman",
year = "2018",
month = "12",
day = "10",
language = "English",
series = "International Conference on Control, Automation and Systems",
publisher = "IEEE Computer Society",
pages = "387--391",
booktitle = "International Conference on Control, Automation and Systems",
address = "United States",

}

Moldagulova, A & Sulaiman, R 2018, Document classification based on kNN algorithm by term vector space reduction. in International Conference on Control, Automation and Systems., 8571540, International Conference on Control, Automation and Systems, vol. 2018-October, IEEE Computer Society, pp. 387-391, 18th International Conference on Control, Automation and Systems, ICCAS 2018, PyeongChang, Korea, Republic of, 17/10/18.

Document classification based on kNN algorithm by term vector space reduction. / Moldagulova, Aiman; Sulaiman, Rosnafisah.

International Conference on Control, Automation and Systems. IEEE Computer Society, 2018. p. 387-391 8571540 (International Conference on Control, Automation and Systems; Vol. 2018-October).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Document classification based on kNN algorithm by term vector space reduction

AU - Moldagulova, Aiman

AU - Sulaiman, Rosnafisah

PY - 2018/12/10

Y1 - 2018/12/10

N2 - Nowadays there is an increasing interest in the area of unstructured data analysis. The vast majority of unstructured data belongs to unstructured text data. Retrieving useful information from huge volume of unstructured text data is very challenging task. Text mining is a thought-provoking research area as it tries to discover knowledge from unstructured text. This paper deals with methods used for handling unstructured text data in particular document classification problems. Most document classification methods based on term vector space model of representation of unstructured textual data. The term vector space model is easy to implement, provides uniform representation for documents. However feature space for a large collection of documents can reach millions and be sparse. One of the issues is to reduce the dimension of the term-document matrix. In this research we proposed an approach for reduction of term vector space in KNN algorithm.

AB - Nowadays there is an increasing interest in the area of unstructured data analysis. The vast majority of unstructured data belongs to unstructured text data. Retrieving useful information from huge volume of unstructured text data is very challenging task. Text mining is a thought-provoking research area as it tries to discover knowledge from unstructured text. This paper deals with methods used for handling unstructured text data in particular document classification problems. Most document classification methods based on term vector space model of representation of unstructured textual data. The term vector space model is easy to implement, provides uniform representation for documents. However feature space for a large collection of documents can reach millions and be sparse. One of the issues is to reduce the dimension of the term-document matrix. In this research we proposed an approach for reduction of term vector space in KNN algorithm.

UR - http://www.scopus.com/inward/record.url?scp=85060480043&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85060480043&partnerID=8YFLogxK

M3 - Conference contribution

T3 - International Conference on Control, Automation and Systems

SP - 387

EP - 391

BT - International Conference on Control, Automation and Systems

PB - IEEE Computer Society

ER -

Moldagulova A, Sulaiman R. Document classification based on kNN algorithm by term vector space reduction. In International Conference on Control, Automation and Systems. IEEE Computer Society. 2018. p. 387-391. 8571540. (International Conference on Control, Automation and Systems).