NLP: Tokenization and Stop Word Removal

Data Wrangling

Difficulty: 2 | Problem written by peter.washington
Problem reported in interviews at


Given a set of stop words and an input text, return a list of tokens with all stop words removed.

Sample Input:
<class 'str'>
text: MLPro is the best website ever!

Expected Output:
<class 'list'>
['mlpro', 'best', 'website', 'ever']

This is a premium problem, to view more details of this problem please sign up for MLPro Premium. MLPro premium offers access to actual machine learning and data science interview questions and coding challenges commonly asked at tech companies all over the world

MLPro Premium also allows you to access all our high quality MCQs which are not available on the free tier.

Not able to solve a problem? MLPro premium brings you access to solutions for all problems available on MLPro

Get access to Premium only exclusive educational content available to only Premium users.

Have an issue, the MLPro support team is available 24X7 to Premium users.

This is a premium feature.
To access this and other such features, click on upgrade below.

Log in to post a comment

Jump to comment-108
uozcan12 • 5¬†months, 3¬†weeks ago


My solution is

import string

# Please do not change the below function name and parameters
def stop_word_removal(text):
    stop_words = {'their', 'then', 'not', 'ma', 'here', 'other', 'won', 'up', 'weren', 'being', 'we', 'those', 'an', 'them', 'which', 'him', 'so', 'yourselves', 'what', 'own', 'has', 'should', 'above', 'in', 'myself', 'against', 'that', 'before', 't', 'just', 'into', 'about', 'most', 'd', 'where', 'our', 'or', 'such', 'ours', 'of', 'doesn', 'further', 'needn', 'now', 'some', 'too', 'hasn', 'more', 'the', 'yours', 'her', 'below', 'same', 'how', 'very', 'is', 'did', 'you', 'his', 'when', 'few', 'does', 'down', 'yourself', 'i', 'do', 'both', 'shan', 'have', 'itself', 'shouldn', 'through', 'themselves', 'o', 'didn', 've', 'm', 'off', 'out', 'but', 'and', 'doing', 'any', 'nor', 'over', 'had', 'because', 'himself', 'theirs', 'me', 'by', 'she', 'whom', 'hers', 're', 'hadn', 'who', 'he', 'my', 'if', 'will', 'are', 'why', 'from', 'am', 'with', 'been', 'its', 'ourselves', 'ain', 'couldn', 'a', 'aren', 'under', 'll', 'on', 'y', 'can', 'they', 'than', 'after', 'wouldn', 'each', 'once', 'mightn', 'for', 'this', 'these', 's', 'only', 'haven', 'having', 'all', 'don', 'it', 'there', 'until', 'again', 'to', 'while', 'be', 'no', 'during', 'herself', 'as', 'mustn', 'between', 'was', 'at', 'your', 'were', 'isn', 'wasn'}
    translate_table = dict((ord(char), None) for char in string.punctuation)
    text = text.translate(translate_table)
    text = text.lower().strip().split()
    text = [t for t in text if t not in stop_words]
    return text



Input Test Case

Please enter only one test case at a time
numpy has been already imported as np (import numpy as np)