A Fast Adaptive k-means with No Bounds

What is Ball k-means?

This paper presents a novel accelerated exact k-means called as "Ball k-means" by using the ball to describe each cluster, which focus on reducing the point-centroid distance computation. It can exactly find its neighbor clusters for each cluster, resulting distance computations only between a point and its neighbor clusters' centroids instead of all centroids. What's more, each cluster can be divided into "stable area" and "active area", and the latter one is further divided into some exact "annular area". The assignment of the points in the "stable area" is not changed while the points in each "annular area" will be adjusted within a few neighbor clusters. There are no upper or lower bounds in the whole process. Moreover, ball k-means uses ball clusters and neighbor searching along with multiple novel stratagems for reducing centroid distance computations. In comparison with the current state-of-the art accelerated exact bounded methods, the Yinyang algorithm and the Exponion algorithm, as well as other top-of-the-line tree-based and bounded methods, the ball k-means attains both higher performance and performs fewer distance calculations, especially for large-k problems. The faster speed, no extra parameters and simpler design of "Ball k-means" make it an all-around replacement of the naive k-means.(https://blog.csdn.net/xia_shuyin/article/details/107351876)

Getting Ball k-means

The latest version of Ball k-means can be downloaded from here:

Requirements

Minimal installation requirements:
C++ compiler supporting C++11

Linux operating system or Windows operating system

Eigen 3 template library

Optional but recommended:
BLAS implementation, we recommend this one : http://www.openblas.net/

Intel MKL implementation, we recommend this one : https://software.intel.com/en-us/mkl

Installation

Eigen 3: In order to use Eigen, you just need to download and extract Eigen's source code : http://eigen.tuxfamily.org/index.php?title=Main_Page

ball_k_means_noRingVersion.cpp and ball_k_means_RingVersion.cpp both can be executed directly, only need to import Eigen library.

Using


Cpp Version: In 'main' function, modify the location of the dataset and the centroids, loaded by the 'load_data' function.

Python Version: In 'main' function, modify the location of the dataset and the centroids, loaded by the 'dataset_address' function and the 'centriod_address' function.

In function 'ball_k_means', you can change the param 'isRing' to control 'Ring' version and 'noRing' version: 'isRing' == 0 represent alg with rings and 'isRing' == others represent alg with no rings.

Questions/Comments

If you have any questions or comments please contact Yong Zheng at zhengyongv3@163.com.

 

 

 

 

 

 

CopyRight @ Chongqing University of Posts and Telecommunications |The research team of Shuyin Xia