Welcome! This is where I showcase my data science journey, documenting projects that highlight my growth and curiosity in diverse fields like optimization methods, machine learning, statistics, and more. My adventure started in 2023 with online courses, an internship project at LβOrΓ©al, and an Analytics Masterβs degree at Erasmus University Rotterdam. Iβve been learning and evolving ever since!
In this portfolio, youβll find a curated selection of my projects, with explanations of my approach, key insights, and reflections. Whether youβre here for inspiration, feedback, or collaboration, I hope you find these projects insightful.
π Feel free to reach out with any questions or feedback via email: felix-raphael@outlook.com.
Explore the full notebook here
In this project, I explored the impact of a new website feature rolled out by an online shopping platform through a controlled A/B experiment. Visitors were randomly assigned to one of two groups:
π Exploratory Analysis
I began with an exploratory data analysis to investigate potential differences between the two groups. Using visualizations, summary statistics, and a correlation matrix, I uncovered early signs that the treatment might be influencing user behavior. For instance:
π Statistical Hypothesis Testing
To go beyond observation and assess whether these patterns were statistically significant, I performed hypothesis testing on key performance indicators (KPIs). This allowed me to distinguish between real treatment effects and random noise.
β
Key Finding:
The completion rate β the proportion of items added to cart that were ultimately purchased β was significantly higher in the treatment group, providing strong evidence that the new feature improved this aspect of the customer journey.
β No Significant Effects Found:
These results indicate that while the feature may not drive more overall purchases, it can help improve follow-through once users have shown interest in products.
This project highlights my ability to combine exploratory analysis, data visualization, and robust statistical testing to evaluate the real-world impact of product changes in an experimental setting.
Currently working through the 100 Days of Code Python course on Udemy, learning through daily coding projects and challenges.
Building practical coding skills one day at a time!
Check out the course here: Course details on udemy
Follow some of the projects I am building along the way in this notebook
π Explore the full Jupyter Notebook
This project explores what makes hotels in Rome more popular on TripAdvisor, using a dataset of 4,599 hotels and 272 features including amenities, image counts, and web traffic data. Popularity is measured by the number of clicks each hotel receives.
π‘ Project Goals:
π§ Tools & Libraries:
pandas
, scikit-learn
, XGBoost
, matplotlib
, seaborn
, DirectLiNGAM
, imblearn
π§ Methodology:
β Key Takeaways:
views
, with a causal effect of 0.459.This project reflects my growing interest in interpretable machine learning, and it helped me learn how to move from correlation to causation using real-world data. While this is a beginnerβs step into the challenging domain of Causal Inference, it shows how Machine Learning and Causal Discovery/Inference can be used hand in hand, in order to untangle large amounts of data into a simple overview of causal relationships.
π Read the full Master Thesis
π Explore the Solution Code
My Masterβs thesis focused on improving route optimization for attended home deliveries and services. Companies like Picnic already use sophisticated a priori optimization methods to plan efficient delivery routes.
In my study, I explored strategies to balance efficiency, complexity, and customer service. After partitioning customers into groups and assigning appointment days, I optimized daily delivery routes to minimize travel distance. The study found that optimizing appointment-day offerings had significant impacts on route efficiency and profitability.
π View the Time Series Forecasting Notebook
In this project, I used Walmart sales data from 2010-2021 to explore various time series forecasting methods. I compared:
I employed exogenous variables (like CPI and unemployment rate) to see if they improved predictive accuracy. Results showed that while these variables may boost performance, they can also lead to overfitting if not handled carefully.
Performance was measured using RMSE and MAPE, providing actionable insights into sales forecasting.
For this project, I worked on optimizing delivery routes using a Hub and Spoke system. By applying optimization techniques with the Gurobi solver, I identified the most efficient hub location for deliveries across India, minimizing travel distance and improving logistics.
The dataset provided insights into delivery patterns, helping me propose the optimal location for a logistics hub in India, located in the south-eastern region, where shorter deliveries are more frequent.
π Explore the ML Model Notebook
In this project, I used a Kaggle dataset to predict whether a patient has abnormal biomechanical patterns indicative of conditions like Disk Hernia or Spondylolisthesis. I compared three ML models: KNN, Lasso, and Random Forest.
I dealt with small sample sizes and class imbalances, and evaluated each model based on key performance metrics like Sensitivity (True Positive Rate) and Specificity (True Negative Rate). Ultimately, the Random Forest performed best in identifying abnormal patients with a 89.47% Sensitivity, meaning that the model was able to detect almost 9 out of 10 patients with abnormal patterns!
π Read the full report
π Explore the SQL Queries
π View the Python Data Cleaning Notebook
As part of my studies at RSM, I tackled a comprehensive analysis of Airbnb data to explore its impact on the city of Paris. The focus was on:
I used Python for data cleaning and preparation, then transitioned to SQL for database management and further analysis. My findings highlight important socio-economic considerations, including the impact of gentrification.
I hope you enjoy exploring these projects and insights! Stay tuned for more exciting additions in the future. π