Improvement of the Fast Clustering Algorithm Improved by K-Means in the Big Data

Clustering as a fundamental unsupervised learning is considered an important method of data analysis, and K-means is demonstrably the most popular clustering algorithm. In this paper, we consider clustering on feature space to solve the low efficiency caused in the Big Data clustering by K-means. Different from the traditional methods, the algorithm guaranteed the consistency of the clustering accuracy before and after descending dimension, accelerated K-means when the clustering centeres and distance functions satisfy certain conditions, completely matched in the preprocessing step and clustering step, and improved the efficiency and accuracy. Experimental results have demonstrated the effectiveness of the proposed algorithm.

Język:: Angielski

Częstotliwość wydawania:: 1 razy w roku
Dziedziny czasopisma:: Nauki biologiczne, Nauki biologiczne, inne, Matematyka, Matematyka stosowana, Matematyka ogólna, Fizyka, Fizyka, inne

Kanał RSS czasopisma

Improvement of the Fast Clustering Algorithm Improved by K-Means in the Big Data

Ting Xie

Ruihua Liu

Zhengyuan Wei

Data publikacji: 20 sty 2020

Zakres stron: 1 - 10

Otrzymano: 23 wrz 2019

Przyjęty: 26 gru 2019

DOI: https://doi.org/10.2478/amns.2020.1.00001

Słowa kluczoweBig Data, Clustering, -means, Feature space

© 2020 Ting Xie et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Słowa kluczowe
Big Data, Clustering, -means, Feature space