This project leverages data science to predict whether a customer is likely to purchase a travel package, empowering tourism businesses with actionable insights for marketing and product development.
Exploratory data analysis revealed the following:
I evaluated Logistic Regression, SVM, Decision Tree, Random Forest, and SVD. Here are the final tuned metrics for my best-performing model:
Model | Precision | Recall | F1 Score | RMSE |
---|---|---|---|---|
SVM (RBF Kernel) | 0.54 | 0.69 | 0.61 | ~ |
Logistic Regression | 0.47 | 0.67 | 0.55 | ~ |
Random Forest | 0.58 | 0.47 | 0.52 | ~ |
Decision Tree (Tuned) | 0.52 | 0.62 | 0.57 | ~ |
Below is feature importance from the tuned Decision Tree (for interpretability) and the Random Forest (for more robust, aggregated insights). Both agree on key features: Monthly Income, Age, and Number of Trips.
All code, data, and reports are available in the full GitHub repository.
pip install -r requirements.txt
python scripts/model_pipeline.py