Personality Prediction (MBTI)


Difficulty: 5 | Problem written by ankita
Problem reported in interviews at


The Myers Briggs Type Indicator (or MBTI for short) is a personality type system that divides everyone into 16 distinct personality types across 4 axes:

Introversion (I) – Extroversion (E)

Intuition (N) – Sensing (S)

Thinking (T) – Feeling (F)

Judging (J) – Perceiving (P)

For example, someone who prefers introversion, intuition, thinking, and perceiving would be labeled an INTP in the MBTI system, and there are lots of personality-based components that would model or describe this person’s preferences or behavior based on the label.

It is one of, if not the most popular personality tests in the world. It is used in businesses, online, for fun, for research, and lots more. A simple Google search reveals all of the different ways the test has been used over time. It’s safe to say that this test is still very relevant in the world in terms of its use.

Here, we are going to predict the 4 lettered MBTI label of a person based on the social media posts of each person.


X_train: a list with the last posts by a person as a string 

Y_train: a list with 4 lettered MBTI code of the person corresponding to each data in X_train

X_test: a list with the last posts by a person as a string. X_test will be much smaller compared to X_train.


Y_test: Prediction of X_test, a list with 4 lettered MBTI code of the person

You just have to complete the function MTBI_Prediction(X_train, Y_train, X_test) which returns Y_test as a list for a given X_test


Use TfidfVectorizer() to vectorize the text

Use LogisticRegression(solver='liblinear') as the model

You can use LabelEncoder() to encode Y_train



Sample Input:
<class 'list'>
X_train: ['intj moments sportscenter top ', 'cant draw nails haha done prof', 'im finding lack posts alarming', 'believed god life year ago mot', 'good one course say know thats', 'position actually let go perso', 'dear intp enjoyed conversation', 'science perfect scientist clai', 'youre firedthats another silly', 'im interested lazy go research', 'oi went break months ago toget', 'stuff like disthey longer beco', 'think agree personally dont co', 'thats normal happens also high', 'doesnt want go trip without st', 'hey enfps ive posted thread ph', 'paint without numbersid guess ', 'true sadly many felt like part', 'got ive read enneagram im thou', 'im mystery myselfhugs daughter', 'love feeling affectionate one ', 'im currently rooting around fo', 'newtons universal gravity law ', 'never understood long time peo', 'splinter cell blacklist xbox g', 'im night moved sc fl year ago ', 'edit forgot board oni currentl', 'im pretty sure theyve mistyped', 'catch im although im quite ter', 'intjisfpthey taught live lieha', 'notany esfjs originally mistyp', 'curious dont see many esfj esf']
<class 'list'>
Y_train: ['INFJ', 'INFJ', 'ENTP', 'ENTP', 'INTP', 'INTP', 'INTJ', 'INTJ', 'ENTJ', 'ENTJ', 'ENFJ', 'ENFJ', 'INFP', 'INFP', 'ENFP', 'ENFP', 'ISFP', 'ISFP', 'ISTP', 'ISTP', 'ISFJ', 'ISFJ', 'ISTJ', 'ISTJ', 'ESTP', 'ESTP', 'ESFP', 'ESFP', 'ESTJ', 'ESTJ', 'ESFJ', 'ESFJ']
<class 'list'>
X_test: ['one time parents fighting dads', 'know entj everything wanted fu', 'comment screams intj bro espec', 'fair enough thats want look li', 'hello working presentation typ', 'ive always thought tony stark ', 'personally thinking would sj t', 'learning say right way greates', 'ponyjoyride thankyou im sure s', 'nobody realistically impossibl', 'absolutely feel like running a', 'cant see keep secret facts fac', 'thank reading im sorry really ', 'intp youre paranoid logical be', 'answer question yes people cap', 'lol thats figured meant like w']

Expected Output:
<class 'list'>
['ISTJ', 'INTP', 'INFJ', 'ENFP', 'ISTJ', 'ISTP', 'INFP', 'INTP', 'ESFP', 'ISTJ', 'ENFJ', 'ESFJ', 'ISTP', 'INTJ', 'ISTJ', 'INFP']

This is a premium problem, to view more details of this problem please sign up for MLPro Premium. MLPro premium offers access to actual machine learning and data science interview questions and coding challenges commonly asked at tech companies all over the world

MLPro Premium also allows you to access all our high quality MCQs which are not available on the free tier.

Not able to solve a problem? MLPro premium brings you access to solutions for all problems available on MLPro

Get access to Premium only exclusive educational content available to only Premium users.

Have an issue, the MLPro support team is available 24X7 to Premium users.

This is a premium feature.
To access this and other such features, click on upgrade below.

Log in to post a comment

Jump to comment-120
uozcan12 • 4¬†months, 2¬†weeks ago


I think my answer is correct. Because I checked the output. But test case did not work. What's my mistake?

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn import preprocessing

# Please do not change the below function name and parameters
def MTBI_Prediction(X_train, Y_train, X_test):
    vectorizer = TfidfVectorizer()
    X_train = vectorizer.fit_transform(X_train)
    X_test = vectorizer.transform(X_test)
    le = preprocessing.LabelEncoder()
    Y_train = le.fit_transform(Y_train)

    clf = LogisticRegression(solver='liblinear').fit(X_train, Y_train)
    y_predict = clf.predict(X_test)
    return list(le.inverse_transform(y_predict))



Input Test Case

Please enter only one test case at a time
numpy has been already imported as np (import numpy as np)