Ledian K. Asks:
Logistic Regression for prediction
I would like to ask about the
theoretical approach of using
Logistic Regression for customer data and more specifically Churn Prediction (in BigQuery and Python).
I have my customer data for an
online shop and I would like to predict if the customer will churn based on some characteristics. I have created my dataset and the
Churn label (based on the hypothesis that if the customer hasn't bought something in the last year then it is assumed that the customer is churned since we are dealing with a non-contractual setting).
I am using
3 years of data (2019-2021), which includes ~3M customers and 43 features, and as I said, a customer is considered to be churned if the customer didn't place an order in 2021.
- I checked the distribution of my label which is ~balanced.
- I checked for some Logistic Regression assumptions such as multicollinearity, outlier influence etc.
- I split the data into 80% training data, 10% evaluation data, 10% prediction data.
- I checked the model's performance by looking at the classification metrics (Accuracy, Recall etc.)
My question would be:
We have the predictions of the 10% of the data (i.e. the probabilities that a customer will churn). Could we have the probabilities for all the other customers that belong in the training dataset and in the evaluation dataset?
In other words, what would be the next steps after we have trained and have checked that we could use the model, if your final goal would be to have in the end the probabilities of your customers to churn or to not churn?
Thank you in advance for your help!
SolveForum.com may not be responsible for the answers or solutions given to any question asked by the users. All Answers or responses are user generated answers and we do not have proof of its validity or correctness. Please vote for the answer that helped you in order to help others find out which is the most helpful answer. Questions labeled as solved may be solved or may not be solved depending on the type of question and the date posted for some posts may be scheduled to be deleted periodically. Do not hesitate to share your thoughts here to help others.