Swati Kushwaha

Advertising Sales Prediction

Advertising Sales Prediction Screenshot

Project Overview

This data science project analyzes the relationship between advertising expenditures across different media channels (TV, Radio, Newspaper) and product sales. Using a dataset of advertising budgets and corresponding sales figures, I developed a predictive model to quantify how advertising spending impacts sales performance.

Technical Specifications

Component Technology
Programming Language Python
Data Analysis Pandas, NumPy
Data Visualization Matplotlib, Seaborn
Machine Learning Scikit-learn
Model Linear Regression

Key Features

Exploratory Analysis

  • Comprehensive data quality assessment
  • Pairplot visualization of relationships
  • Correlation heatmap of advertising channels

Feature Engineering

  • Data normalization with MinMaxScaler
  • Label encoding for target variable
  • Train-test split (80/20)

Predictive Modeling

  • Linear regression implementation
  • Model training on scaled data
  • Performance evaluation metrics

Business Insights

  • TV advertising effectiveness analysis
  • ROI comparison across channels
  • Budget allocation recommendations

Data Visualizations

Advertising Channels Pairplot

Pairplot Visualization

This visualization shows the relationships between each advertising channel (TV, Radio, Newspaper) and sales. The linear regression lines demonstrate the strength of each relationship, with TV advertising showing the strongest correlation.

Advertising Channels Correlation Heatmap

Correlation Heatmap

The heatmap quantifies the correlation coefficients between variables. TV advertising shows the highest correlation with sales (0.78), followed by radio (0.58), while newspaper advertising shows the weakest relationship (0.23).

Technical Implementation

Data Preparation

# Import libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler, LabelEncoder

# Load and prepare data
df = pd.read_csv('Advertising.csv')
df.rename(columns={'Unnamed: 0': 'Index'}, inplace=True)

# Feature-target separation
X = df.drop('Sales', axis=1)
y = df[["Sales"]]

# Label encoding and scaling
le = LabelEncoder()
y = le.fit_transform(y)
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)

Model Implementation

from sklearn.linear_model import LinearRegression
from sklearn import metrics

# Model training
model = LinearRegression()
model.fit(X_train_scaled, y_train)

# Predictions and evaluation
y_predictions = model.predict(X_test_scaled)
print('MAE:', metrics.mean_absolute_error(y_predictions,y_test))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_predictions,y_test)))
print('R-Squared', metrics.r2_score(y_predictions,y_test))

Results & Findings

Model Performance

  • Mean Absolute Error (MAE): 7.45
  • Root Mean Squared Error (RMSE): 10.22
  • R-Squared Score: 0.862 (86.2% variance explained)

Key Insights

Advertising Channel Correlation with Sales Impact
TV 0.78 Strong positive relationship
Radio 0.58 Moderate positive relationship
Newspaper 0.23 Weak relationship

Challenges & Solutions

Challenge Solution
Linear assumptions Verified with pairplot visualizations
Feature scaling Applied MinMaxScaler for normalization
Model evaluation Used multiple metrics (MAE, RMSE, R²)
Feature importance Analyzed correlation coefficients

Business Applications

  • Budget Optimization: Reallocate spending to most effective channels
  • Campaign Planning: Focus creative resources on high-impact media
  • Performance Forecasting: Predict sales based on planned ad spend
  • ROI Analysis: Compare effectiveness across media types

Future Enhancements

  • Implement polynomial regression for non-linear relationships
  • Add interaction terms between advertising channels
  • Develop interactive dashboard for marketing team
  • Incorporate time-series analysis for seasonal trends
Back to Projects