Clustering
Differences between clustering algorithms
Clustering is the task of dividing data points into a number of groups such that data points in the same groups are more similar to other data points in the same group and dissimilar to the data points in other groups. It is basically a collection of objects on the basis of similarity and dissimilarity between them.
Clustering Methods:

DensityBased Methods: These methods consider the clusters as the dense region having some similarity and different from the lower dense region of the space. These methods have good accuracy and the ability to merge two clusters. Example DBSCAN (DensityBased Spatial Clustering of Applications with Noise), OPTICS (Ordering Points to Identify Clustering Structure) etc.

Hierarchical Based Methods: The clusters formed in this method forms a treetype structure based on the hierarchy. New clusters are formed using the previously formed one. It is divided into two category
> Agglomerative (bottomup approach)
> Divisive (topdown approach).
Examples CURE (Clustering Using Representatives), BIRCH (Balanced Iterative Reducing Clustering and using Hierarchies) etc. 
Centroid (Partitioning) Methods: These methods partition the objects into k clusters and each partition forms one cluster. This method is used to optimize an objective criterion similarity function such as when the distance is a major parameter example Kmeans, CLARANS (Clustering Large Applications based upon randomized Search) etc.

Distribution Methods: These clustering models are based on the notion of how probable is it that all data points in the cluster belong to the same distribution (For example: Normal, Gaussian). These models often suffer from overfitting. A popular example of these models is Expectationmaximization algorithm which uses multivariate normal distributions.
Centroid Methods
Clustering Model
Pros
Cons
KMeans
(MiniBatch KMeans, KMeans ++)

Simple to understand

Easily adaptable

Works well on small or large datasets

Fast, efficient and performant

Need to choose the number of clusters

Assumes the clusters as spherical, so does not work efficiently with complex geometrical shaped data(Mostly NonLinear)

Hard Assignment might lead to mis grouping.
Affinity Propagation

Much slower than KMeans

Does not need a preset number of clusters

Works well on small or large datasets

Clusters of arbitrary shape and size

Does better clusters than KMeans
Spectral Clustering

Elegant, and wellfounded mathematically

Works quite well when relations are approximately transitive

Excellent quality under many different data forms

Not appropriate for very noisy datasets

Much slower than KMeans
Hierarchical Models
Clustering Model
Pros
Cons
Hierarchical Clustering

The optimal number of clusters can be obtained by the model itself

Practical visualisation with the dendrogram

Not appropriate for large datasets
BIRCH
(Balanced Iterative Reducing Clustering and using Hierarchies)

Designed for clustering a large amount of numerical data

Works well only for spherical clusters

Can handle only numeric data

Sensitive to the order of the data records
Distribution Methods
Clustering Model
Pros
Cons
GMM
ExpectationMaximization
using Gaussian Mixture Models

A lot more flexible in terms of cluster covariance than KMeans

Have multiple clusters per data points, I.e GMMs support mixed membership

Does not assume clusters to be of any geometry. Works well with nonlinear geometric distributions as well.

Does not bias the cluster sizes to have specific structures as does by KMeans (Circular).

Slow convergence

Inability to provide estimation to the asymptotic variancecovariance matrix of the maximum likelihood estimator (MLE)

Difficult to interpret.
DensityBased Models
Clustering Model
Pros
Cons
DBSCAN
(DensityBased Spatial Clustering of Applications with Noise)

Clusters of arbitrary shape and size

Robust to noise

Does not need a preset number of clusters

Deterministic

Requires connected regions of sufficiently high density

Data sets with varying densities are problematic
MeanShift Clustering

Does not need a preset number of clusters

Simple to understand

Selection of the window size/radius “r” can be nontrivial
Overall Model Performance
Mini Batch
KMeans
Affinity
Propagation
MeanShift
Spectral
Clustering
Ward
Agglomerative
Clustering
DBSCAN
Birch
Gaussian
Mixture