Employee-Attrition-in-Organisation

Project Overview

Objective : To predict Employee Attrition in an organisation
Classification Model
Data cleaning and Data preprocessing has been done
Exploratory Data Analysis
Various classification models like Logistic Regression, KNN, SVM, Kernel SVM, Naive Bayes, Decision Tree, Random Forest, ANN, XGBosst and CatBoost are used.
Model Performance Comparison
Conclusion: Top Reasons why employees are leaving an organisation.

About Project

We are developing classification models for companies to determine whether an employee is going to quit or not. These models are based on an extensive dataset which is easily available by HR. This project helps to avoid attrition of working employees and hiring of new employees which need time, capital and skills.

Code and Resources used

Python version: 3.7.6
Packages: Pandas, Numpy, Seaborn, Matplotlib, Sklearn, Tensorflow and Catboost
Resources used:
- Analytics Vidya
- towards data science
- Kaggle

Web Scraping

Uncover the factors that lead to employee attrition and explore important questions such as ‘show me a breakdown of distance from home by job role and attrition’ or ‘compare average monthly income by education and attrition’. This is a fictional data set created by IBM data scientists.

Dataset URL: https://www.kaggle.com/c/rossmann-store-sales/data

Data fields

Features:

TV: advertising dollars spent on TV for a single product in a given market (in thousands of dollars)
Radio: advertising dollars spent on Radio
Newspaper: advertising dollars spent on Newspaper

Target:

Sales budg

Model Performance

Model Classification Matrix Logistic Regression precision recall f1-score support

       0       0.91      0.97      0.94       255
       1       0.67      0.36      0.47        39

accuracy                           0.89       294    macro avg       0.79      0.67      0.70       294 weighted avg       0.88      0.89      0.88       294 KNN
         precision    recall  f1-score   support

       0       0.88      1.00      0.93       255
       1       0.80      0.10      0.18        39

accuracy                           0.88       294    macro avg       0.84      0.55      0.56       294 weighted avg       0.87      0.88      0.83       294 SVM
         precision    recall  f1-score   support

       0       0.92      0.97      0.94       255
       1       0.68      0.44      0.53        39

accuracy                           0.90       294    macro avg       0.80      0.70      0.74       294 weighted avg       0.89      0.90      0.89       294 Kernel SVM
         precision    recall  f1-score   support

       0       0.90      1.00      0.94       255
       1       0.91      0.26      0.40        39

accuracy                           0.90       294    macro avg       0.90      0.63      0.67       294 weighted avg       0.90      0.90      0.87       294 Naive Bayes
         precision    recall  f1-score   support

       0       0.92      0.68      0.78       255
       1       0.23      0.62      0.33        39

accuracy                           0.67       294    macro avg       0.57      0.65      0.56       294 weighted avg       0.83      0.67      0.72       294 Decision Tree
         precision    recall  f1-score   support

       0       0.87      0.86      0.87       255
       1       0.16      0.18      0.17        39

accuracy                           0.77       294    macro avg       0.52      0.52      0.52       294 weighted avg       0.78      0.77      0.77       294

Random Forest precision recall f1-score support

       0       0.87      0.98      0.93       255
       1       0.43      0.08      0.13        39

accuracy                           0.86       294    macro avg       0.65      0.53      0.53       294 weighted avg       0.82      0.86      0.82       294 ANN
         precision    recall  f1-score   support

       0       0.91      0.95      0.93       255
       1       0.50      0.36      0.42        39

accuracy                           0.87       294    macro avg       0.70      0.65      0.67       294 weighted avg       0.85      0.87      0.86       294 XGBoost
         precision    recall  f1-score   support

       0       0.90      0.96      0.93       255
       1       0.55      0.28      0.37        39

accuracy                           0.87       294    macro avg       0.72      0.62      0.65       294 weighted avg       0.85      0.87      0.86       294 CatBoost
         precision    recall  f1-score   support

       0       0.88      0.99      0.93       255
       1       0.67      0.15      0.25        39

accuracy                           0.88       294    macro avg       0.78      0.57      0.59       294 weighted avg       0.86      0.88      0.84       294

Conclusion

Top Reasons why Employees leave the Organization:

No Overtime This was a surprise, employees who don’t have overtime are most likely to leave the organization. This could be that employees would like to have a higher amount of income or employees could feel that they are underused.
Monthly Income As expected, Income is a huge factor as why employees leave the organization in search for a better salary.
Age This could also be expected, since people who are aiming to retire will leave the organization. Knowing the most likely reasons why employees leave the organization, can help the organization take action and reduce the level of Attrition inside the organization.

Further Improvements

To further improve the model, below options can be considered:

Try to make this balanced dataset
Tune all classification models
Use cross-validation and grid-search
Compare models with Accuracy, Recall, Precision, F1 score and AUC/ROC Curve