

SOURCE OF DATA: KAGGLE
SITUATION
The issue centers on the management of over 120,000 clothing orders handled by Amazon India from March to June 2022. It necessitates a thorough examination of total sales, peak sales periods, and regional sales trends within states and cities. Additionally, attention must be directed towards evaluating the accuracy of status labels, as potential inaccuracies in this system have the potential to lead to misleading calculations of sales due to unclear status labels in the order process, particularly concerning cancelled orders and their refund status.
TASK
Uncover top-selling products, calculate total sales, and assess the status labeling system to gain crucial insights into sales performance and enhance labeling accuracy.
ACTION
- Data Source
- CSV file details: 128,975 rows, 24 columns
- Missing data in these columns
- amount, qty, currency, courier_status, ship_city, ship_state, ship_country, b2b
- Data inconsistencies
- Varied letter cases, punctuation, numbers in 'ship_city' and 'ship_state'
- State abbreviations and misspelled names of states
- Initial data type
- TEXT

- Data Cleaning
- Excel steps
- Deleted 'index' and 'unnamed:22' columns
- SQL measures
- Created four tables for version control
- Standardization
- Corrected letter cases, misspellings, and punctuation
- Null handling
- Replaced eight rows of 'amount' nulls with the average of amount
- Replaced nulls in 'ship_city,' 'ship_state,' 'ship_postal_code,' 'ship_country,' 'currency'
- '0' for numeric, 'Unknown' for text types
- Column reduction
- Excluded 'promotion_ids'
- Data type adjustments for these columns
- date, qty, amount, ship_country
- Currency standardization
- 'INR'
- Excel steps
Here are a few of the SQL queries that I used for data cleaning:
Part 1
Part 2
- Data Analysis (SQL):
- Addressed queries on total sales, top 10 products by sales, average order value, peak sales (day, week, month), top locations for sales, Amazon versus Merchant fulfillment, popular product categories, average order quantity, and count of values under each order status.
Here are a few of the SQL queries that I used:
- Data Visualization
- Tableau
- Font
- Arial
- Color
- Dark grey for the title of the dashboard, the KPIs, and titles of charts
- Light grey for the font
- Grey warm for the charts
- Layout
- Title banner at the top
- KPIs underneath the banner
- Filter and parameter dropdown
- Grouped together in a vertical container on the left
- Borders for charts is a thin black line
- Title of Charts
- Used questions for the titles
- Font
- Tableau
RESULT
After analyzing the data, I found that the total sales for the selected clothing items reached USD1.4 million in four months. On average, each order amounted to USD11, with customers typically purchasing 1 item. Amazon fulfilled the majority of these orders, accounting for 70%, while the remaining 30% were fulfilled by the merchants.
Among nine distinct product categories, 'Set' emerged as the most popular, with 45,289 pieces sold, generating USD700K in sales. Following closely, 'Kurta' sold 45,045 pieces at USD383K, while 'Western Dress' recorded 13,943 pieces sold, totaling USD201K in sales.
My analysis revealed a peak sales day on May 4, with transactions nearing USD22K. Among the states, Maharashtra led with the highest total sales of USD240K, while Haryana recorded the lowest sales at nearly USD51K. Notably, the city of Bengaluru stood out as the top spender, contributing USD130K in sales.
The cumulative total for orders marked as 'Cancelled' with associated amounts reached USD124K.
To further boost sales and customer satisfaction, consider expanding product offerings within the popular 'Set', 'Kurta', and 'Western Dress' categories. Additionally, focus marketing efforts in cities like Bengaluru, Hyderabad, and Mumbai where there is significant potential for growth.
Status Clarity (System Labels): To enhance clarity within the order management system, consider labeling orders with 'Cancelled' status differently based on whether a refund was processed or refund is pending. For example, use 'Cancelled - Refunded' for orders with refunded amounts and 'Cancelled - Refund Pending'. These enhanced labels will ensure that your order system accurately reflects the order's status, reducing potential confusion for both the team and customers.
Courier Status Enhancement: Improve courier status labels to provide additional information. For instance, use 'Delivered' for orders that reached customers successfully, 'In Transit' for orders currently on the way, and 'Return in Progress' for returns being processed.
Here's the Amazon Tableau Dashboard: