Tourism Package Purchase Prediction

This project leverages data science to predict whether a customer is likely to purchase a travel package, empowering tourism businesses with actionable insights for marketing and product development.

Executive Summary

Business Insights

Exploratory data analysis revealed the following:

Age Distribution
Monthly Income Distribution
Passport vs Purchase Rate

Models & Performance

I evaluated Logistic Regression, SVM, Decision Tree, Random Forest, and SVD. Here are the final tuned metrics for my best-performing model:

ModelPrecisionRecallF1 ScoreRMSE
SVM (RBF Kernel)0.540.690.61~
Logistic Regression0.470.670.55~
Random Forest0.580.470.52~
Decision Tree (Tuned)0.520.620.57~

Below is feature importance from the tuned Decision Tree (for interpretability) and the Random Forest (for more robust, aggregated insights). Both agree on key features: Monthly Income, Age, and Number of Trips.

Decision Tree modeled feature importance
Random Forest modeled feature importance

Explore the Full Project

All code, data, and reports are available in the full GitHub repository.

How to Use


pip install -r requirements.txt
python scripts/model_pipeline.py