2

scikit-learn: HCA

Unsolved
Unsupervised

Difficulty: 2 | Problem written by hemdan219@gmail.com

Educational Resource: https://towardsdatascience.com/a-beginners-guide-to-scikit-learn-14b7e51d71a4


Problem reported in interviews at

Amazon
Apple
Facebook
Google
Netflix

Hierarchical Clustering Algorithm (HCA) is an unsupervised clustering algorithm for creating clusters that have an order going from top to bottom.

The steps for HCA are as follows:

  1. Make each data point a single-point cluster. This creates N clusters.
  2. Take the two closest data points and make them one cluster. There are now N-1 clusters
  3. Take the two closest clusters and make them one cluster. There are now N-2 clusters.
  4. Repeat until only one cluster remains.

Given as input a list of tuples datapoints representing 2D coordinates and an integer value for clusters number

You should return a list with the cluster number of each input data point, with cluster numbers starting at 0.

You may use scikit-learn's AgglomerativeClustering class to solve this problem. Set the following parameters: affinity='euclidean', linkage='ward'

 

Sample Input:
<class 'list'>
datapoints: [(1, 1), (2, 2), (3, 3), (4, 4), (5, 5)]
n: 2

Expected Output:
<class 'list'>
[1, 1, 0, 0, 0]

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Sunt ut optio obcaecati, iste quod illum minus soluta corporis et in rerum distinctio fugiat?

Mollitia qui ratione minima nostrum omnis soluta. Laboriosam ad consequuntur ea culpa quibusdam atque amet odit voluptatibus, totam aut nemo at ullam illum dicta eos dolore distinctio vel perferendis, dolorum dolor magni cum repudiandae impedit, qui molestias quisquam nostrum porro vitae placeat temporibus doloremque quam odio, ullam consectetur nulla vel?

Ipsa in maxime ratione similique voluptatem adipisci reprehenderit veniam vel asperiores dicta, voluptatibus illo adipisci qui voluptatem saepe repellendus temporibus asperiores, ipsam doloribus perspiciatis cumque odio at eum optio, sapiente voluptates officiis quasi voluptatem nam velit tempore, suscipit dolores distinctio vel nemo? Ratione veritatis laboriosam, corporis impedit culpa fuga explicabo quam at corrupti officia vitae, recusandae hic voluptate fuga dolor eos quidem quas est, sed consequatur rem debitis voluptas quos nesciunt cum?

This is a premium feature.
To access this and other such features, click on upgrade below.

Ready.

Input Test Case

Please enter only one test case at a time
numpy has been already imported as np (import numpy as np)