About Us

Citrus Consulting Services is the Consulting and the Transformation Services arm of Redington Gulf.

Sunday – Thursday: 9:00AM–6:00PM (Sales), Sunday – Saturday: 24×7 / 365 (Support) E.O#3, Ground Floor, Building 01 Dubai Internet City, P.O Box 501 761 Dubai, UAE (+971) 04 516 1500
(+966) 11 462 5323
info@citrusconsulting.com
Citrus Consulting Services
customer-churn-prediction-banner

Customer Churn Prediction using Machine Learning

In the current scenario, wherein the global pandemic has marginalized end customer spend, and thereby throttled revenues, it is imperative for businesses, especially those based on subscribers to be able to predict the possible customer churn or attrition, and plan thereafter on corrective actions or early red flags.

At Citrus Consulting, we endeavor to help our customers, solve business problems whilst leveraging modular future ready technologies. In lieu of the same, we have leveraged our team of advanced analysts and consultants, and developed an out of the box customer churn prediction model, to help our customers across regions to plan their marketing and reach out endeavors strategically with quantifiable business output.

Customer Churn is also known as customer attrition refers to customer or subscriber stop doing business with company or services. A business typically treats a person as churned once a specific amount of time has passed since the customers last interaction with the company or service.

Customer churn is one of the major and most important problem for large companies. Especially in Telecom industry customer will not hesitate to leave if they don’t find what they are looking for. Customer want competitive pricing, high quality service and value for money. Customer churning is directly proportional to customer satisfaction. The cost of churn includes both lost revenue and the marketing costs involved with replacing those customers with new ones. It’s a known fact that the cost of customer acquisition is far greater than cost of customer retention, that makes retention a crucial business prototype.

In this blog post, we will create a model which predicts if a customer is likely to churn using open source Telecom data and Python.

Build your own Churn Prevention Model

In order to enable our customers and end users, here are a few steps to help you build your own churn prevention model. Reach out to us for further customization and use cases.

Step 1 - Import Libraries to Build Model

###Importing the required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
import pickle
import seaborn as sns
from sklearn.externals import joblib 
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn import model_selection
from sklearn.model_selection import KFold
from sklearn.metrics import accuracy_score 
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
import sklearn.metrics as metrics
import warnings
warnings.filterwarnings("ignore")
%matplotlib inline

Step 2 - Get Data

You can get the data from many websites which is publicly available like Kaggle. Load the data and store it into dataframe.

Telco_data=pd.read_csv(‘TelcoCustomerChurn.csv') #rLoading the data to datafrmae
Telco_data.head() #See the first five rows of dataframe
churn-prediction-get-data

Step 3 - Exploratory Data Analysis

Telco_data.shape #gives the number of rows and columns
churn-prediction-data-analysis-rows-colums

Data frame contains 7040 rows and 21 columns.

Telco_data.columns  #show all the columns in the dataset
churn-prediction-data-analysis-columns

Now check if any missing values are there

Telco_data.isna().sum() #gives the missing values
churn-prediction-data-analysis-missing-values

As there are no missing values, we will check the descriptive statistics of the numerical columns.

Telco_data.describe() #Give the statistics of the data
churn-prediction-data-analysis-statistics

From the above table we can see the average tenure of the customer is ~32 Months and monthly charges are ~$65. Now we can see the distribution of labels (churned and not churned).

# Plotting target feature distribution
count = Telco_data['Churn'].value_counts(sort = True)
colors = ["grey","orange"] 
labels=['No','Yes']
#plotting pie chart
plt.pie(count,labels=labels, colors=colors,autopct='%1.1f%%', shadow=True)
plt.title('Churn Percentage in data')
plt.show()
churn-prediction-data-analysis-plotting

From the above graph 73.5% of the customers are retained and 26.5% of customers are churned. Plotting the relation of target attribute with other independent variables:

sns.countplot(x ='TechSupport', hue = "Churn", data = Telco_data)
sns.countplot(x ='InternetService', hue = "Churn", data = Telco_data)
churn-prediction-data-analysis-plotting-with-independent-variable-a
churn-prediction-data-analysis-plotting-with-independent-variable-b

The charts above tell us that people who don’t have tech support are churning a lot compare to other categories. And, People with Fiber Optic internet rate of churning is high. Based on this the company might stop providing Fiber optic internet or improve the service. Now we will see how some of the numerical attributes that are responsible for churn:

data['TotalCharges']=pd.to_numeric(data['TotalCharges'],errors='coerce')  #Convert string to numeric
features = ['MonthlyCharges', 'tenure','TotalCharges']
fig, ax = plt.subplots(1, 3, figsize=(15, 4))
Telco_data[Telco_data.Churn == 'No'][features].hist(bins=20, color="grey", alpha=0.9, ax=ax)
Telco_data[Telco_data.Churn == 'Yes'][features].hist(bins=20, color="orange", alpha=0.9, ax=ax)
churn-prediction-data-analysis-plotting-cost

From the above graph we can clearly say that customers paying monthly charges greater than $70 has tendency to churn. So, we can recommend to the company to reduce the monthly charges to retain the customers.

From tenure chart, customers whose tenure is between 0 to 10 months had churned a lot. The customer whom we retained most of their tenure is greater than 2 years. So, the company must try different offers/services to keep the customer for at least 2 years

Step 4 - Data Preprocessing

After data analysis now we will do data cleaning and prepare the data suitable for the algorithms that we are going to use. First step is to remove unnecessary columns from the dataset, and then convert the datatypes to numeric.

Customer Id will not add any value to the model, so we can remove that column from the data for now. Later based on the importance of the feature we can remove some more features.

#create a copy od dataset to clean the data
clean_df=Telco_data.copy()
#remove customer ID column from the dataset
Clean_df. drop('customerID',axis=1,inplace=True)
#to check the data datatypes
Clean_df.dtypes
churn-prediction-pre-processing

Converting string features to numerical features

Columns=clean_df.columns.values #to get all column names

For column in columns:

if clean_df[column].dtype=='object':
   clean_df[column] = clean_df[column].astype('category')
   clean_df[column] = clean_df[column].cat.codes    
else:
   continue
clean_df.head()
churn-prediction-pre-processing-table

Create a new variable for independent features and new variable for target

features=clean_df.drop('Churn',axis=1)
Y=clean_df['Churn']

Now we need to normalize the data because multiple features in dataset spanning varying degrees of magnitude, ranges.

scaler = MinMaxScaler(feature_range=(0, 1))
features = scaler.fit_transform(features)

The data is normalized and in the ready to use format by the algorithm. Before training we need to split the data in to train and test datasets. We can split the data in to 80:20, 80% training dataset and 20% testing dataset.

X_train, X_test, y_train, y_test = train_test_split(features,Y, test_size=0.2)
print (X_train.shape, y_train.shape)
print (X_test.shape, y_test.shape)
churn-prediction-pre-processing-split

Step 5 - Model Training

models = []
models.append(('LR', LogisticRegression()))
models.append(('LDA', LinearDiscriminantAnalysis()))
models.append(('KNN', KNeighborsClassifier()))
models.append(('CART', DecisionTreeClassifier()))
models.append(('NB', GaussianNB()))
models.append(('SVM', SVC()))

The data is normalized and in the ready to use format by the algorithm. Before training we need to split the data in to train and test datasets. We can split the data in to 80:20, 80% training dataset and 20% testing dataset.

results = []
names = []
scoring = 'accuracy'
for name, model in models:
    kfold = KFold(n_splits=10, random_state=42)
    cv_results = model_selection.cross_val_score(model, X_train, y_train, cv=kfold, scoring=scoring)
    results.append(cv_results)
    names.append(name)
    msg = "%s: %f (%f)" % (name, cv_results.mean(), cv_results.std())
    print(msg)
churn-prediction-model-training

In the above block of code, we have applied 10-fold cross validation and pass the train data to all the algorithms and stored the accuracy of each iteration of training data of all algorithms in to ‘results’ list. Finally printed the average accuracy of all the models.

fig = plt.figure()
fig.suptitle('Algorithm Comparison')
ax = fig.add_subplot(111)
plt.boxplot(results) #results of 10 iterations of each model
ax.set_xticklabels(names)
plt.show()
churn-prediction-model-training-algorithm-comparison

From the above figure we can clearly say Logistic Regression, LDA performed better on this dataset. For now, we will use Logistic Regression to check how it works on the test dataset and generate different accuracy metrics

logreg = LogisticRegression() #Create the model
log_model=logreg.fit(X_train,y_train) #fit the training data to the model
print(log_model)
churn-prediction-model-creation

Step 6 - Accuracy Metrics

In the above image you can see what all the parameters that we used for training the model. You can change some of the factors and check if the accuracy is increasing.

predicted_classes = log_model.predict(X_test) #predicts the test dataset class labels of each sample
predicted_prob=log_model.predict_proba(X_test) #predicts the probability of each sample in the test data
print(classification_report(y_test,predicted_classes)) #to get the accuracy metrics
churn-prediction-accuracy-matrix

From the above picture you can see that we have 90% recall, 85% precision and 81% accuracy. We can still increase model accuracy by tuning some of the hyperparameters and by removing or adding the features.

We will plot ROC curve to check at which threshold we will get good recall and what is the ideal threshold to choose.

y_pred = []
#for row in predicted_prob:
y_pred.append(row[1])
y_pred = np.array(y_pred)
# calculate the fpr and tpr for all thresholds of the classification
fpr, tpr, threshold = metrics.roc_curve(y_test, y_pred)
roc_auc = metrics.auc(fpr, tpr)
#plotting the curve
import matplotlib.pyplot as plt
plt.title('Receiver Operating Characteristic')
plt.plot(fpr, tpr, 'b', label = 'AUC = %0.2f' % roc_auc)
plt.legend(loc = 'lower right')
plt.plot([0, 1], [0, 1],'r--')
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()
churn-prediction-accuracy-matrix-roc-curve

From the above curve we can seat 0.6 we have high True Positive Rate and less false positive rate. Let’s use this threshold and create a confusion matrix and see how the recall and precision is.

prediction = np.where(y_pred > 0.6, 1, 0) #using 0.6 as threshold to classify the label
f_mat = confusion_matrix(y_test, prediction)
df_cm = pd.DataFrame(f_mat, range(2),range(2))
sns.set(font_scale=1.4)#for label size
sns.heatmap(df_cm, annot=True, fmt='d')
churn-prediction-confudion-matrix

True Negatives: Actual Not Churn Predicted Not Churn = 1006

False Positive: Actual Not Churn Predicted Churn = 216

False Negative: Actual Churn Predicted Not Churn = 53

True Positive: Actual Churn Predicted Churn = 133

Now we need to decide which is important to us, in this use case False negatives are costly so we need to make sure our Recall is more, to increase the recall you can increase the threshold.

We can still increase the accuracy and recall by creating new features or deleting some of the correlated features and we can experiment with hyper-parameters

Ashok is a Senior Consultant at Citrus Consulting Services based in Dubai, UAE. He is responsible for consulting, implementation and delivery of AI/ML projects across Middle East and Africa. Ashok has executed multiple ML projects in the region wherein he streamlined end to end ML process starting from Data collection to Model deployment. His expertise spread across different AI/ML techniques like Time Series Forecasting, Classification, Regression, Clustering, NLP, Computer Vision, Transfer learning. He is a certified AWS Machine Learning Specialist having hands on experience in all AWS AI/ML Services and also other cloud providers.

Post a Comment

fourteen − 13 =