E-Commerce Business Analytics using Python

Rohan Goel
5 min readOct 25, 2019

In this article, I have shown some analytics that you can perform on your business like how many people are completing a series of steps like Adding an item to cart and then buying it or which product is more popular in a particular state or people are most likely to buy if some offers or discounts are given to them.

For this Analytics Report, I had used the popular Brazilian E-Commerce Datasets which is available on Kaggle,

https://www.kaggle.com/olistbr/brazilian-ecommerce

Importing the Essential Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
%matplotlib inline
import seaborn as sns

Importing the Dataset

sellers=pd.read_csv('.../olist_sellers_dataset.csv')
customers=pd.read_csv('.../olist_customers_dataset.csv')
products=pd.read_csv('.../olist_products_dataset.csv')
payments=pd.read_csv('.../olist_order_payments_dataset.csv')
orders=pd.read_csv('.../olist_orders_dataset.csv')
order_items=pd.read_csv('.../olist_order_items_dataset.csv')
reviews=pd.read_csv('.../olist_order_reviews_dataset.csv')
english_names=pd.read_csv('.../product_category_name_translation.csv')

Visualizing the Imported Datasets

Now using the variable.head() function of pandas you can see what your data looks like and would be useful in further Data Cleaning and Visualizations.

sellers.head()
customers.head()
products.head()
payments.head()
orders.head()
order_items.head()
reviews.head()
english_names.head()

Now in the further steps, we will do some exploratory Data Analysis on our data and try to gain some insights that will help in some decision making of the business.

Number of Orders per State

This would help us to know as from which state how many orders are coming.

orders.shape
df1=pd.merge(orders, customers, on="customer_id", how="left")
df1=df1[["order_id", "customer_id", "customer_state"]]
count_states=df1['customer_state'].value_counts()
count_states['others']=count_states[count_states<1500].sum()
x=np.array(count_states[count_states>2000].index)
y=np.array(count_states[count_states>2000].values)
fig1, ax1 = plt.subplots(figsize=(15,7))
ax1.pie(y, labels=x, autopct='%1.1f%%', radius=2000, startangle=90)
ax1.axis('equal')
plt.title("No of Orders per State")
plt.legend(x)
plt.show()

Inference

  1. The above chart clearly shows that the majority of orders have come through the SP state.
  2. So advertising and promoting your product or services in that state will have a high impact on the sales of a product.

Number of Sales per Month

It will in determining as in which part of the year the Sales are the most thus will further help in making better business decisions.

df2=pd.merge(orders, order_items, on="order_id", how="right")
df2=df2[["order_id", "order_purchase_timestamp", "price"]]
df2['datetime'] = pd.to_datetime(df2['order_purchase_timestamp'])
value_month = df2[['datetime', 'price']].copy()
value_month.set_index('datetime', inplace=True)
value_month = value_month.groupby(pd.Grouper(freq="M"))['price'].sum()
fig, ax = plt.subplots(figsize=(15,7))
ax.plot(value_month.index, value_month.values)
fig.autofmt_xdate()
ax.fmt_xdata = mdates.DateFormatter('%Y-%m')
ax.set_title('Sales per Month')
plt.show()
Total Sales Revenue per month

Inference

  1. There is a huge spike on Nov 24 due to Black Friday.
  2. Sales are weak after Dec 20 (end-year holidays),
  3. In 2016, there are some sales in October, but almost any in the following months.

Payments Modes

This would help us in analyzing which payment modes do people usually prefer for their orders.

count_paymentMode=payments["payment_type"].value_counts()
count_paymentMode=count_paymentMode[count_paymentMode>100]
fig1, ax1 = plt.subplots(figsize=(15,7))
ax1.pie(count_paymentMode.values, autopct='%1.1f%%', radius=2000, startangle=90)
ax1.axis('equal')
plt.title("Mode of Transaction")
plt.legend(count_paymentMode.index)
plt.show()

As clearly visible, the vast majority of the people prefer credit card payments over the other modes.

Now we will try to further expand this graph and figure out the percentage of transaction modes per state.

df3=pd.merge(payments, df1, on="order_id", how="left")
a=df3["customer_state"].value_counts()
a["others"]=a[a<2000].sum()
a=a[a>2000]
df4=df3[df3["customer_state"].isin(a.index) & df3["payment_type"].isin(["credit_card", "boleto", "voucher", "debit_card"])]plt.figure(figsize=(15,7))
sns.countplot(x="customer_state", hue="payment_type", data=df4 )
plt.title("Payments mode per state")

Inference

  1. The bar graph shows the count of different types of payment used per state.
  2. With this, it can be inferred that the vast majority of people prefers credit card over other modes.
  3. So to improve the sales we can offer more offers on the credit cards, especially in the peak months as shown in the previous time series analysis graph.

Product Sales per State

In this, we will try to visualize the count of different products sold per state.

btoe=pd.Series(english_names.product_category_name_english.values, index=english_names.product_category_name).to_dict()
products["product_category_name"]=products["product_category_name"].map(btoe)
df5=pd.merge(order_items, products, on="product_id", how="left")
df5=df5[["order_id", "product_id", "product_category_name"]]
df6=pd.merge(orders, customers, on="customer_id", how="left")
df6=df6[["order_id", "customer_id", "customer_city", "customer_state" ]]
df7=pd.merge(df5, df6, on="order_id", how="left")df7=df7[df7["customer_state"].isin(["SP", "RJ", "MG"])]
b=df7["product_category_name"].value_counts()
b["others"]=b[b<2000].sum()
b=b[b>2000]
df7=df7[df7["product_category_name"].isin(b.index)]plt.figure(figsize=(15,7))
chart=sns.countplot(x="product_category_name", hue="customer_state", data=df7 )
chart.set_xticklabels( chart.get_xticklabels(), rotation=30 )
plt.title("Product Sales per State")
plt.show()

Inference

  1. People in SP mostly buy Bed_Bath_Table so more discounts and offers can be given on it to inc the sales.
  2. Also, the sales of health_beauty are more than computer_accessories so we can more offer on health_beauty rather than giving it to the computer accessories which most businesses do.

SUMMARY

So what we had tried to achieve,

  1. Which areas are the huge market for our products to better target the customers.
  2. What time is best for the launch of Big Sales events like Flipkart does on its Big Billion Days and Amazon on its Great Indian Festival that helps them to lure huge customers into their Bussiness.
  3. Which type of Transactions do people prefer for the payments and from my analysis it's clearly visible that people go over the online transactions rather than COD as this will also help in reducing black money and customers can get more discounts from their bank's cards which will, in turn, give them a good experience on the shopping.
  4. Which products do people commonly buy in a particular state as this would help us in understanding the customer behavior and then we can give offers and discounts only on the product that people will mostly prefer to buy rather than giving to it to the random products like electronics that most sellers usually give.

Further Analysis of Dataset,

There is much more analysis that I thought could be done like,

  1. Sales of the products have got the most reviews
  2. Sales of the products that have got a very nice Description
  3. Delivery time of the products
  4. Sales of the products that have got the most clear and detailed photos

But these factors depend on the Product and the better they are the better the sales and customer satisfaction would be.

So in this article, I had tried to do Customer Segmentation and predict their behavior for various products.

And don’t forget to give a like if you enjoyed this article and helped you gain some insights into the business and how you can do your own Data Analysis on various datasets.

--

--

Rohan Goel

Tech Blogger | Python/Java Developer | Data Science Enthusiast