
PURPOSE
The purpose of this project is to provide a data-driven approach to tackle the challenge of unpredictable flight prices. By analyzing historical data and identifying key factors influencing ticket fares, the model aims to improve transparency in flight pricing, reduce uncertainty for travelers, and help users save money by making smarter booking decisions.
OVERVIEW
Air travel plays a crucial role in global tourism and economic growth, yet travelers often struggle with the uncertainty of fluctuating ticket prices. The Flight Fare Prediction Model leverages Machine Learning to predict flight prices based on various factors, such as airline type, route, travel dates, and number of stops. This predictive solution empowers users to make better-informed travel decisions by identifying price trends and recommending optimal times to purchase tickets.
OBJECTIVE
The primary objective is to build a robust predictive model that accurately forecasts flight fares using advanced Machine Learning techniques. The project focuses on feature selection, model optimization, and evaluation to achieve high accuracy and actionable insights. Additionally, the model seeks to categorize days as "good" or "bad" for purchasing tickets based on fare predictions relative to average prices.
Analysis Summary
-
The project required four months for completion because of the various tests and modifications made to enhance the model's accuracy.
-
US Airline Flight Routes and Fares: 2019-2024. The dataset was sourced from Kaggle.
Their are in total 14 Key Features:Airline
Data of Journey
Source
Destination
Route
Departure Time
Arrival Time
Duration
Total Stops
Additional Info
Price
Cabin Type
In Flight Meal
Total Miles
-
We used various tools, libraries, and platforms for this project.
Programming Language: Python
Libraries: Pandas, Numpy, Scikit-Learn, Matplotlib, Seaborn
Platforms: Jupyter Notebook, Google Colab
-
I used various Data Science and Machine Learning techniques to create a flight price prediction model. First, I cleaned the data by handling missing values, removing outliers, and preparing it for analysis. Next, I conducted Exploratory Data Analysis (EDA) to identify trends and relationships among key features, such as how stops and departure times affect prices. After EDA, I created new variables and processed categorical data for machine learning. I tested several models, including Decision Tree Regressor, Random Forest Regressor, Ridge, Lasso Regression, KNN-Classifier, and KMeans Clustering to find the most accurate predictions. Finally, I used visualization tools like matplotlib and seaborn to present my findings on how various factors affect flight prices.
Model Results
-
I tested several machine learning models to accurately predict flight prices. Each model was chosen for its ability to deal with certain data patterns. I assessed their performance using metrics like R² Score, RMSE, and MAE to ensure they worked well with new data.
Models Used:Ridge Regression
Lasso Regression
Decision Tree Regressor
Random Forest Regressor
KMeans Clustering
-
At the start of the EDA process, we split some features to improve reliability. In Feature Engineering, we selected a mix of categorical and numerical features to train the model. The features were chosen for their importance in predicting flight prices and encoded to make them suitable for machine learning algorithms.
Numerical Features:Price ā The target variable representing the flight fare.
Duration_Minutes ā Total flight duration in minutes, which directly impacts ticket prices.
Day_of_Week ā The day of the week the flight is scheduled (e.g., Monday, Tuesday).
Journey_Month ā The month when the journey starts, accounting for seasonal pricing trends.
Dep_Hour ā The hour of the day when the flight departs.
Dep_Min ā The minute of the day when the flight departs.
Arrival_Hour ā The hour of the day when the flight arrives.
Arrival_Min ā The minute of the day when the flight arrives.
Total_Stops_Numeric ā The number of stops on the flight route.
Categorical Features :Airline ā The airline operating the flight, accounting for different pricing strategies.
Source ā The source airport from where the journey starts.
Destination ā The destination airport where the journey ends.
Route ā The flight path taken from the source to the destination.
Cabin_Type ā The type of cabin booked (e.g., Economy, Business).
In_Flight_Meal ā Indicates whether a meal is provided during the flight.
šÆ Why These Features Were Important:
Price was the target variable the model aimed to predict.
Duration_Minutes and Total_Stops_Numeric had the most impact on pricing, as longer flights and direct flights tend to be more expensive.
Airline played a significant role in capturing price variations due to different carrier pricing strategies.
Day_of_Week and Journey_Month were important in understanding seasonal trends and demand patterns.
-
It all begins with an idea. Maybe you want to launch a business. Maybe you want to turn a hobby into something more.
-
Target Variable
Price being the target variable of the data frame is broken down into three categories.
Around Avg
Below Avg
Above Avg
Anything 10% Above Avg will be considered a bad day, Compared to that anything below the 10% Below Avg price will be considered a good day.
Decision Tree Results
KNN- Classifier
Ridge Regression
Lasso Regression