But, wait a second. How do you achieve the same for unsupervised learning?

How to interpret unsupervised learning: from scikit http://scikit-learn.org/stable/index.html |

The term “unsupervised” refers to the fact that there is no “target” to predict and thus nothing resembling an accuracy measure to guide the selection of a best model

This means that there is no objectively "correct" clustering algorithm, but as it was noted, "clustering is in the eye of the beholder." The most appropriate clustering algorithm for a particular problem often needs to be chosen experimentall. Unless there is a mathematical reason to prefer one cluster model over another.

Unsupervised learning can be looked at as several equivalent definitions:

- Finding groups in data
- Finding patterns in data
- A form of data compression
- A form of multi-dimensional reduction

Regardless of the definition we choose, one central matter when dealing with unsupervised learning is how to measure the quality of the clustering.What does confidence mean in the context of features/groups mapping for unsupervised learning?

Unsupervised learninng can be done in may ways, the most common being neural networks, clustering (k-means, etc), and dimensionality reduction techniques such as PCA. Let us call the features, regardless on how they were extracted as clustering.

### Clustering classification

Clusterings can be roughly distinguished as:

- hard clustering: each object belongs to one cluster only. It's a onto mapping, each samples can be associated to one and only one cluster.
- soft clustering: also known as fuzzy clustering. Each object belongs to each cluster to a certain degree (e.g. a likelihood of belonging to the cluster)

### Measuring quality

### Internal evaluation methods

One way is to create some internal evaluation method. This method does not realy from any external knowledge, but simply is a way of describing a set of desired characteristics of the mapping.

- by the definition of an optimization function (for instance, minimize SSE in k-means)
- by creating an error metric

**SSE method**

plot the sum of squared error for different clusters

SSE will monotonically decrease as we increase the number of clusters

The knee points on the curve suggest good candidates for an optimal number of clusters

**Spectral clustering**

Measure/maximizes the eigen gap

**Penalty Method**

Bayesian Information Criterion

**Stability based method**

**• Stability: repeatedly produce similar clusterings on data originating from the same source.**

• High level of agreement among a set of clusterings the clustering model (k) is appropriate for the data

• Evaluate multiple models, and select the model resulting in the highest level of stability.

### External evaluation methods

If true class labels (ground truth) are known, the validity of a clustering can be verified by comparing

the class labels and clustering labels.

Some methods are developed for evaluating unsupervised models against ground truth. Refer to Rand Index and Normalized Rand Index. Also Purity and Normalized Mutual Information index can be used to asses the quality of the model.the class labels and clustering labels.

**Convert it to a supervised model**- by means of a panel (most of the time of humans / experts)
- by means of ground truth (for instance accessing other data which classify the samples)

### References

**Clustering**

http://en.wikipedia.org/wiki/Data_clustering

**Clustering methodologies**

http://en.wikipedia.org/wiki/OPTICS_algorithm

http://en.wikipedia.org/wiki/Spectral_clustering

http://en.wikipedia.org/wiki/Expectation-maximization_algorithm

http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/cmeans.html

**Density estimation**

http://en.wikipedia.org/wiki/Parzen_window

http://en.wikipedia.org/wiki/Density_estimation

**Kernels separation**

https://www.youtube.com/watch?v=bUv9bfMPMb4

Slides and presentations

http://1.salford-systems.com/Portals/160602/docs/Unsupervised_Learning_slides.pdf

http://web.engr.oregonstate.edu/~xfern/classes/cs534/notes/Unsupervised-model-11.pdf

http://www.public.asu.edu/~salelyan/MyPub/FS4clustering_chapter.pdf