Balancing Data

Data Wrangling

Difficulty: 2 | Problem written by zeyad_omar
Sometime in ML projects, the dataset is imbalanced, meaning the number of elements in each class is totally different. To address this, engineers often use a weighting technique to compensate that imbalance (as it is difficult to get more data).

For example, the class with the smaller size might be assigned a larger weight than the class with the larger size.

In this problem, you are given a 1D vector containing a dataset of 2 or more classes and you are asked to assign weights to each of the classes such that the sum of the weights is 1.

Return a list with the weight of each class (arrange the classes by the order they appear in the given dataset).

Sample Input:
<class 'list'>
data: [1, 1, 1, 2]

Expected Output:
<class 'list'>
[0.75, 0.25]

This is a premium problem, to view more details of this problem please sign up for MLPro Premium. MLPro premium offers access to actual machine learning and data science interview questions and coding challenges commonly asked at tech companies all over the world

MLPro Premium also allows you to access all our high quality MCQs which are not available on the free tier.

Not able to solve a problem? MLPro premium brings you access to solutions for all problems available on MLPro

Get access to Premium only exclusive educational content available to only Premium users.

Have an issue, the MLPro support team is available 24X7 to Premium users.

This is a premium feature.
To access this and other such features, click on upgrade below.

Log in to post a comment

Jump to comment-145
abhishek_kumar • 3 months, 1 week ago


In Real life usage :

from collections import counter

willbe more useful. Counter a predefined method from collections library to create a count Dictionary for the list.


1. Collections In Python : Everything You Need To Know About Python Collections

Jump to comment-113
jackwimbish • 5 months, 2 weeks ago


It's not clear from the problem description, but your ouput needs to have the frequencies in the order that the classes appear first in the data, not their sort order.


Input Test Case

Please enter only one test case at a time
numpy has been already imported as np (import numpy as np)