SHOGUN  v1.1.0
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
List of all members | Public Member Functions | Protected Member Functions | Protected Attributes
CKMeans Class Reference

Detailed Description

KMeans clustering, partitions the data into k (a-priori specified) clusters.

It minimizes

\[ \sum_{i=1}^k\sum_{x_j\in S_i} (x_j-\mu_i)^2 \]

where $\mu_i$ are the cluster centers and $S_i,\;i=1,\dots,k$ are the index sets of the clusters.

Beware that this algorithm obtains only a local optimum.

cf. http://en.wikipedia.org/wiki/K-means_algorithm

Definition at line 39 of file KMeans.h.

Inheritance diagram for CKMeans:
Inheritance graph
[legend]

Public Member Functions

 CKMeans ()
 
 CKMeans (int32_t k, CDistance *d)
 
virtual ~CKMeans ()
 
virtual EClassifierType get_classifier_type ()
 
virtual bool load (FILE *srcfile)
 
virtual bool save (FILE *dstfile)
 
void set_k (int32_t p_k)
 
int32_t get_k ()
 
void set_max_iter (int32_t iter)
 
float64_t get_max_iter ()
 
SGVector< float64_tget_radiuses ()
 
SGMatrix< float64_tget_cluster_centers ()
 
int32_t get_dimensions ()
 
virtual const char * get_name () const
 
- Public Member Functions inherited from CDistanceMachine
 CDistanceMachine ()
 
virtual ~CDistanceMachine ()
 
void set_distance (CDistance *d)
 
CDistanceget_distance ()
 
void distances_lhs (float64_t *result, int32_t idx_a1, int32_t idx_a2, int32_t idx_b)
 
void distances_rhs (float64_t *result, int32_t idx_b1, int32_t idx_b2, int32_t idx_a)
 
virtual CLabelsapply ()
 
virtual CLabelsapply (CFeatures *data)
 
virtual float64_t apply (int32_t num)
 
- Public Member Functions inherited from CMachine
 CMachine ()
 
virtual ~CMachine ()
 
virtual bool train (CFeatures *data=NULL)
 
virtual void set_labels (CLabels *lab)
 
virtual CLabelsget_labels ()
 
virtual float64_t get_label (int32_t i)
 
void set_max_train_time (float64_t t)
 
float64_t get_max_train_time ()
 
void set_solver_type (ESolverType st)
 
ESolverType get_solver_type ()
 
virtual void set_store_model_features (bool store_model)
 
- Public Member Functions inherited from CSGObject
 CSGObject ()
 
 CSGObject (const CSGObject &orig)
 
virtual ~CSGObject ()
 
virtual bool is_generic (EPrimitiveType *generic) const
 
template<class T >
void set_generic ()
 
void unset_generic ()
 
virtual void print_serializable (const char *prefix="")
 
virtual bool save_serializable (CSerializableFile *file, const char *prefix="")
 
virtual bool load_serializable (CSerializableFile *file, const char *prefix="")
 
void set_global_io (SGIO *io)
 
SGIOget_global_io ()
 
void set_global_parallel (Parallel *parallel)
 
Parallelget_global_parallel ()
 
void set_global_version (Version *version)
 
Versionget_global_version ()
 
SGVector< char * > get_modelsel_names ()
 
char * get_modsel_param_descr (const char *param_name)
 
index_t get_modsel_param_index (const char *param_name)
 
template<>
void set_generic ()
 
template<>
void set_generic ()
 
template<>
void set_generic ()
 
template<>
void set_generic ()
 
template<>
void set_generic ()
 
template<>
void set_generic ()
 
template<>
void set_generic ()
 
template<>
void set_generic ()
 
template<>
void set_generic ()
 
template<>
void set_generic ()
 
template<>
void set_generic ()
 
template<>
void set_generic ()
 
template<>
void set_generic ()
 

Protected Member Functions

void clustknb (bool use_old_mus, float64_t *mus_start)
 
virtual bool train_machine (CFeatures *data=NULL)
 
virtual void store_model_features ()
 
- Protected Member Functions inherited from CSGObject
virtual void load_serializable_pre () throw (ShogunException)
 
virtual void load_serializable_post () throw (ShogunException)
 
virtual void save_serializable_pre () throw (ShogunException)
 
virtual void save_serializable_post () throw (ShogunException)
 

Protected Attributes

int32_t max_iter
 maximum number of iterations More...
 
int32_t k
 the k parameter in KMeans More...
 
int32_t dimensions
 number of dimensions More...
 
SGVector< float64_tR
 radi of the clusters (size k) More...
 
- Protected Attributes inherited from CDistanceMachine
CDistancedistance
 
- Protected Attributes inherited from CMachine
float64_t max_train_time
 
CLabelslabels
 
ESolverType solver_type
 
bool m_store_model_features
 

Additional Inherited Members

- Public Attributes inherited from CSGObject
SGIOio
 
Parallelparallel
 
Versionversion
 
Parameterm_parameters
 
Parameterm_model_selection_parameters
 
- Static Protected Member Functions inherited from CDistanceMachine
static void * run_distance_thread_lhs (void *p)
 
static void * run_distance_thread_rhs (void *p)
 

Constructor & Destructor Documentation

CKMeans ( )

default constructor

Definition at line 29 of file KMeans.cpp.

CKMeans ( int32_t  k,
CDistance d 
)

constructor

Parameters
kparameter k
ddistance

Definition at line 35 of file KMeans.cpp.

~CKMeans ( )
virtual

Definition at line 43 of file KMeans.cpp.

Member Function Documentation

void clustknb ( bool  use_old_mus,
float64_t mus_start 
)
protected

clustknb

Parameters
use_old_musif old mus shall be used
mus_startmus start

replace rhs feature vectors

set rhs to mus_start

update rhs

Definition at line 179 of file KMeans.cpp.

virtual EClassifierType get_classifier_type ( )
virtual

get classifier type

Returns
classifier type KMEANS

Reimplemented from CMachine.

Definition at line 57 of file KMeans.h.

SGMatrix< float64_t > get_cluster_centers ( )

get centers

Returns
cluster centers or empty matrix if no radiuses are there (not trained yet)

Definition at line 115 of file KMeans.cpp.

int32_t get_dimensions ( )

get dimensions

Returns
number of dimensions

Definition at line 127 of file KMeans.cpp.

int32_t get_k ( )

get k

Returns
the parameter k

Definition at line 94 of file KMeans.cpp.

float64_t get_max_iter ( )

get maximum number of iterations

Returns
maximum number of iterations

Definition at line 105 of file KMeans.cpp.

virtual const char* get_name ( ) const
virtual
Returns
object name

Reimplemented from CDistanceMachine.

Definition at line 116 of file KMeans.h.

SGVector< float64_t > get_radiuses ( )

get radiuses

Returns
radiuses

Definition at line 110 of file KMeans.cpp.

bool load ( FILE *  srcfile)
virtual

load distance machine from file

Parameters
srcfilefile to load from
Returns
if loading was successful

Reimplemented from CMachine.

Definition at line 73 of file KMeans.cpp.

bool save ( FILE *  dstfile)
virtual

save distance machine to file

Parameters
dstfilefile to save to
Returns
if saving was successful

Reimplemented from CMachine.

Definition at line 80 of file KMeans.cpp.

void set_k ( int32_t  p_k)

set k

Parameters
p_knew k

Definition at line 88 of file KMeans.cpp.

void set_max_iter ( int32_t  iter)

set maximum number of iterations

Parameters
iterthe new maximum

Definition at line 99 of file KMeans.cpp.

void store_model_features ( )
protectedvirtual

Ensures cluster centers are in lhs of underlying distance

Reimplemented from CDistanceMachine.

Definition at line 464 of file KMeans.cpp.

bool train_machine ( CFeatures data = NULL)
protectedvirtual

train k-means

Parameters
datatraining data (parameter can be avoided if distance or kernel-based classifiers are used and distance/kernels are initialized with train data)
Returns
whether training was successful

Reimplemented from CMachine.

Definition at line 48 of file KMeans.cpp.

Member Data Documentation

int32_t dimensions
protected

number of dimensions

Definition at line 150 of file KMeans.h.

int32_t k
protected

the k parameter in KMeans

Definition at line 147 of file KMeans.h.

int32_t max_iter
protected

maximum number of iterations

Definition at line 144 of file KMeans.h.

SGVector<float64_t> R
protected

radi of the clusters (size k)

Definition at line 153 of file KMeans.h.


The documentation for this class was generated from the following files:

SHOGUN Machine Learning Toolbox - Documentation