scikit-learn: HCA


Difficulty: 2 | Problem written by hemdan219@gmail.com

Educational Resource: https://towardsdatascience.com/a-beginners-guide-to-scikit-learn-14b7e51d71a4

Problem reported in interviews at


Hierarchical Clustering Algorithm (HCA) is an unsupervised clustering algorithm for creating clusters that have an order going from top to bottom.

The steps for HCA are as follows:

  1. Make each data point a single-point cluster. This creates N clusters.
  2. Take the two closest data points and make them one cluster. There are now N-1 clusters
  3. Take the two closest clusters and make them one cluster. There are now N-2 clusters.
  4. Repeat until only one cluster remains.

Given as input a list of tuples datapoints representing 2D coordinates and an integer value for clusters number

You should return a list with the cluster number of each input data point, with cluster numbers starting at 0.

You may use scikit-learn's AgglomerativeClustering class to solve this problem. Set the following parameters: affinity='euclidean', linkage='ward'


Sample Input:
<class 'list'>
datapoints: [(1, 1), (2, 2), (3, 3), (4, 4), (5, 5)]
<class 'int'>
n: 2

Expected Output:
<class 'list'>
[1, 1, 0, 0, 0]

This is a premium problem, to view more details of this problem please sign up for MLPro Premium. MLPro premium offers access to actual machine learning and data science interview questions and coding challenges commonly asked at tech companies all over the world

MLPro Premium also allows you to access all our high quality MCQs which are not available on the free tier.

Not able to solve a problem? MLPro premium brings you access to solutions for all problems available on MLPro

Get access to Premium only exclusive educational content available to only Premium users.

Have an issue, the MLPro support team is available 24X7 to Premium users.

This is a premium feature.
To access this and other such features, click on upgrade below.

Log in to post a comment


Input Test Case

Please enter only one test case at a time
numpy has been already imported as np (import numpy as np)