U

#### user3806649

##### Guest

*Agglomerative Hierarchial Clustering in python using DTW distance*

I am new to both data science and python. I have a dataset of the time-dependent samples, which I want to run agglomerative hierarchical clustering on them. I have found that Dynamic Time Warping (DTW) is a useful method to find alignments between two time series which may vary in time or speed.

I have found

`dtw_std`

in `mlpy`

library and `scipy.cluster.hierarchy`

in `SciPy`

in order to cluster my data.From the scipy docs, I find that I could use my custom distance function:

metric : str or function, optional The distance metric to use in the case that y is a collection of observation vectors; ignored otherwise. See the pdist function for a list of valid distance metrics. A custom distance function can also be used.

But I am stuck matching this information to implement clustering.

My dataset is in the format of

`dataframe`

which each row corresponds to a sample.Here is my questions:

1- How can I provide distance matrics for the linkage function?

2- How to set my custom distance function?

Code:

```
import pandas as pd
import scipy.cluster.hierarchy as hac
import mlpy
dataset = pd.read_csv ( "dataset.csv",encoding='utf-8' )
X # distance matrics
Z = hac.linkage(X, metrics=mlpy.dtw_std, method='average')
cluster = hac.fcluster(Z, t=100, criterion='maxclust')
leader = hac.leaders(Z, t=100, criterion='maxclust')
fig = plt.figure(figsize=(25, 10))
dn = dendrogram(Z)
plt.show()
```

edit:

Here is how I compute distance matrix, then I pass it to linkage:

Code:

```
# computing distance matrix
dm = pdist ( dataset ,lambda u,v: mlpy.dtw_std ( pd.Series(u).dropna().values.tolist(),pd.Series(v).dropna().values.tolist(),dist_only=True ))
z = hac.linkage(dm, method='average')
cluster = hac.fcluster(z, t=100, criterion='maxclust')
leader = scipy.cluster.hierarchy.fcluster(z, t=100, criterion='maxclust')
```

SolveForum.com may not be responsible for the answers or solutions given to any question asked by the users. All Answers or responses are user generated answers and we do not have proof of its validity or correctness. Please vote for the answer that helped you in order to help others find out which is the most helpful answer. Questions labeled as solved may be solved or may not be solved depending on the type of question and the date posted for some posts may be scheduled to be deleted periodically. Do not hesitate to share your thoughts here to help others.