Abstract
Nowadays there is an increasing interest in the area of unstructured data analysis. The vast majority of unstructured data belongs to unstructured text data. Retrieving useful information from huge volume of unstructured text data is very challenging task. Text mining is a thought-provoking research area as it tries to discover knowledge from unstructured text. This paper deals with methods used for handling unstructured text data in particular document classification problems. Most document classification methods based on term vector space model of representation of unstructured textual data. The term vector space model is easy to implement, provides uniform representation for documents. However feature space for a large collection of documents can reach millions and be sparse. One of the issues is to reduce the dimension of the term-document matrix. In this research we proposed an approach for reduction of term vector space in KNN algorithm.
Original language | English |
---|---|
Title of host publication | International Conference on Control, Automation and Systems |
Publisher | IEEE Computer Society |
Pages | 387-391 |
Number of pages | 5 |
ISBN (Electronic) | 9788993215151 |
Publication status | Published - 10 Dec 2018 |
Event | 18th International Conference on Control, Automation and Systems, ICCAS 2018 - PyeongChang, Korea, Republic of Duration: 17 Oct 2018 → 20 Oct 2018 |
Publication series
Name | International Conference on Control, Automation and Systems |
---|---|
Volume | 2018-October |
ISSN (Print) | 1598-7833 |
Other
Other | 18th International Conference on Control, Automation and Systems, ICCAS 2018 |
---|---|
Country | Korea, Republic of |
City | PyeongChang |
Period | 17/10/18 → 20/10/18 |
Fingerprint
All Science Journal Classification (ASJC) codes
- Artificial Intelligence
- Computer Science Applications
- Control and Systems Engineering
- Electrical and Electronic Engineering
Cite this
}
Document classification based on kNN algorithm by term vector space reduction. / Moldagulova, Aiman; Sulaiman, Rosnafisah.
International Conference on Control, Automation and Systems. IEEE Computer Society, 2018. p. 387-391 8571540 (International Conference on Control, Automation and Systems; Vol. 2018-October).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
TY - GEN
T1 - Document classification based on kNN algorithm by term vector space reduction
AU - Moldagulova, Aiman
AU - Sulaiman, Rosnafisah
PY - 2018/12/10
Y1 - 2018/12/10
N2 - Nowadays there is an increasing interest in the area of unstructured data analysis. The vast majority of unstructured data belongs to unstructured text data. Retrieving useful information from huge volume of unstructured text data is very challenging task. Text mining is a thought-provoking research area as it tries to discover knowledge from unstructured text. This paper deals with methods used for handling unstructured text data in particular document classification problems. Most document classification methods based on term vector space model of representation of unstructured textual data. The term vector space model is easy to implement, provides uniform representation for documents. However feature space for a large collection of documents can reach millions and be sparse. One of the issues is to reduce the dimension of the term-document matrix. In this research we proposed an approach for reduction of term vector space in KNN algorithm.
AB - Nowadays there is an increasing interest in the area of unstructured data analysis. The vast majority of unstructured data belongs to unstructured text data. Retrieving useful information from huge volume of unstructured text data is very challenging task. Text mining is a thought-provoking research area as it tries to discover knowledge from unstructured text. This paper deals with methods used for handling unstructured text data in particular document classification problems. Most document classification methods based on term vector space model of representation of unstructured textual data. The term vector space model is easy to implement, provides uniform representation for documents. However feature space for a large collection of documents can reach millions and be sparse. One of the issues is to reduce the dimension of the term-document matrix. In this research we proposed an approach for reduction of term vector space in KNN algorithm.
UR - http://www.scopus.com/inward/record.url?scp=85060480043&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85060480043&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85060480043
T3 - International Conference on Control, Automation and Systems
SP - 387
EP - 391
BT - International Conference on Control, Automation and Systems
PB - IEEE Computer Society
ER -