A QUICK REVIEW OF A RETAIL SALES DATASET
- Adewoye Saheed Damilola
- Sep 6, 2024
- 3 min read

Zoho Analytics
Introduction
Real-life data are often messy. Proper preprocessing is essential to obtain clean data. Exploratory data analysis (EDA) helps in data cleaning and preparation before extracting valuable insights. Insights such as hypothesis generation, model selection and visualization are hard to reach without a clean dataset. Poor EDA leads to poor insights, models, or assumptions — garbage in, garbage out.
Without further adore, let’s dive in and let the data speak.
Our dataset consists of 2823 rows and 24 columns. There are both numerical and categorical data types in our features.

Let’s start by asking our dataset the following questions — hoping to find something interesting!
Sales trend over the years
Top-selling products and price impact
Regional Performance of products
Sales distribution and ordered quantity distribution
Observations
Remember, before I moved on with the analysis, I took care of some anomalies such as missing values and dropping out some insignificant features for the analysis.
Q1. Sales Trend Over the Years
Sales trends help retailers to understand broader market trends, consumer preferences, and seasonal variations. This gives a retailer some insight into strategic planning on allocating resources to stay competitive.
Here is the sales trend over time for our retail dataset. I found out that 2005 has only five months of sales records. Thus, visualizing the data sales by year won’t reveal what monthly sales by year would be.

From the author’s notebook
We see the sales increase every October and a peak in November for the two years. You can give several reasons for that, however.
Q2. Top-selling Products and Price Impact
Identifying top-selling products could boost a retailer's revenue. Because the retailer now knows the hot-selling product and his customers' preferences.

From the author’s notebook
Of the seven products, classic cars sold about 3.9 million — more than the totals of Motorcycles, planes and ships.
Let’s look at how each product price affects the quantity ordered. Perhaps, our law of demand might work here.

From the author’s notebook
Hoops! We got that wrong. A positive correlation (0.66) exists between the price per product and sales. This means our customers tend to order more quantity regardless of its price. Although correlation doesn’t imply causation, let’s look at other factors.
Q3. Regional Performance of Products
Demand tends to vary with region. Customers in highly stable economies tend to buy more than those in low-economic countries.

From the author’s notebook
The map shows that most customers come from the USA, Spain, France, Australia and the UK; a few come from Ireland, the Philippines, and Belgium but none come from Africa.
Q4. Sales Distribution and Ordered Quantity Distribution

From the author’s notebook
We found out that our sales distribution is right-skewed. And the distribution shows that most of our sales lie between $1500 and $5000. This is quite insightful as a financial forecasting tool for the retailer.
However, looking at the quantity ordered distribution, the lower distribution has fewer quantities in its tails. A fancy name for this type of distribution is platykurtic. This means the retailer sales are cool — more products are selling faster.

From the author’s notebook
Note: You can access the full notebook here.
Conclusion
This is just a quick overview of our dataset. We found some interesting insights about the monthly sales trend, top-selling products, and countries with high sales. However, we can perform more in-depth analysis such as RFM (Recency, Frequency, Monetary) analysis, modelling, and customer segmentation on our dataset.
Thanks for coming this far!
Stay curious, stay persistent, and enjoy the satisfaction of solving problems.
Happy learning!






Comments