Analysis of Factors Affecting Credit Scores Using a Random Forest Model
Problem Statement:
To build a Random Forest model to predict credit scores based on various financial and demographic factors.
Goals:
To identify using linear regression, the most significant features affecting credit scores. Then understanding how to use Random Forest's to build an accurate predictive model.
-
Data Processing: Data was given from a financial dataset containing various attributes related to credit scores. The dataset was preprocessed by converting categorical columns into dummy variables and scaling the features.
-
Model Building: Three models were built: Logistic Regression, SVM, and Random Forest. The Random Forest model was chosen for its superior performance.
-
Evaluation Metrics: The model's performance was evaluated using metrics such as accuracy, precision, recall, and F1-score.
Methodology
Results
-
The Random Forest model achieved an accuracy of 86.4% in predicting credit scores.
-
The most important features identified were Age, Estimated Salary, Balance, Credit Score, and Number of Products.
-
The confusion matrix indicates that the model correctly predicted 1,520 instances as True Positives and 208 instances as True Negatives, while it incorrectly predicted 197 instances as False Negatives and 75 instances as False Positives.