U
user3806649
Guest
user3806649 Asks: Agglomerative Hierarchial Clustering in python using DTW distance
I am new to both data science and python. I have a dataset of the time-dependent samples, which I want to run agglomerative hierarchical clustering on them. I have found that Dynamic Time Warping (DTW) is a useful method to find alignments between two time series which may vary in time or speed.
I have found
From the scipy docs, I find that I could use my custom distance function:
But I am stuck matching this information to implement clustering.
My dataset is in the format of
Here is my questions:
1- How can I provide distance matrics for the linkage function?
2- How to set my custom distance function?
edit:
Here is how I compute distance matrix, then I pass it to linkage:
I am new to both data science and python. I have a dataset of the time-dependent samples, which I want to run agglomerative hierarchical clustering on them. I have found that Dynamic Time Warping (DTW) is a useful method to find alignments between two time series which may vary in time or speed.
I have found
dtw_std
in mlpy
library and scipy.cluster.hierarchy
in SciPy
in order to cluster my data.From the scipy docs, I find that I could use my custom distance function:
metric : str or function, optional The distance metric to use in the case that y is a collection of observation vectors; ignored otherwise. See the pdist function for a list of valid distance metrics. A custom distance function can also be used.
But I am stuck matching this information to implement clustering.
My dataset is in the format of
dataframe
which each row corresponds to a sample.Here is my questions:
1- How can I provide distance matrics for the linkage function?
2- How to set my custom distance function?
Code:
import pandas as pd
import scipy.cluster.hierarchy as hac
import mlpy
dataset = pd.read_csv ( "dataset.csv",encoding='utf-8' )
X # distance matrics
Z = hac.linkage(X, metrics=mlpy.dtw_std, method='average')
cluster = hac.fcluster(Z, t=100, criterion='maxclust')
leader = hac.leaders(Z, t=100, criterion='maxclust')
fig = plt.figure(figsize=(25, 10))
dn = dendrogram(Z)
plt.show()
edit:
Here is how I compute distance matrix, then I pass it to linkage:
Code:
# computing distance matrix
dm = pdist ( dataset ,lambda u,v: mlpy.dtw_std ( pd.Series(u).dropna().values.tolist(),pd.Series(v).dropna().values.tolist(),dist_only=True ))
z = hac.linkage(dm, method='average')
cluster = hac.fcluster(z, t=100, criterion='maxclust')
leader = scipy.cluster.hierarchy.fcluster(z, t=100, criterion='maxclust')
SolveForum.com may not be responsible for the answers or solutions given to any question asked by the users. All Answers or responses are user generated answers and we do not have proof of its validity or correctness. Please vote for the answer that helped you in order to help others find out which is the most helpful answer. Questions labeled as solved may be solved or may not be solved depending on the type of question and the date posted for some posts may be scheduled to be deleted periodically. Do not hesitate to share your thoughts here to help others.