Link Search Menu Expand Document

Create Description From Dataframe

descriptive analytics

Dataset

Load dataset

we use simple and common sales dataset from internet and select 6 columns specific for this project.

sales = pd.read_csv(
  r"..\dataset\sales_data_with_stores.csv",
   usecols = ["store", "product_group", "product_code", "cost",  
              "price", "last_week_sales"]
)
  store product_group product_code cost price last_week_sales
0 Violet PG2 4187 420.76 569.91 13
1 Rose PG2 4195 545.64 712.41 16
2 Violet PG2 4204 640.42 854.91 22
3 Daisy PG2 4219 869.69 1034.55 14
4 Daisy PG2 4718 12.54 26.59 50

The Code

By using, import ursar_sidepandas we can remodel pandas dataframe into more easier to create description analysis. Each dataframe will be add a module that can be used by .stb function after dataframe variable.

import ursar_sidepandas

df.stb.

First Example

How to answer this questions from our dataset above?

  • How many product groups exist?
  • What is the size of each product group in terms of the number of products they contain?
  • What is the cumulative coverage of the entire portfolio?

by calling sales.stb.freq and define which column that we want to calculate, we can count how much categories in that columns, how much in total, its percentage, its cumulative and cumulative percent.

sales.stb.freq(["product_group"])

The result

drawing


Second Example

How to answer this questions from our dataset above?

  • to see the distribution with regards to sales

by calling sales.stb.freq and define which column and which value that we want to calculate, we can create more deep analysis about categorical and numerical data

sales.stb.freq(["product_group"], value="last_week_sales")

sales.stb.freq(["product_group", "store"], value="last_week_sales")

The result

drawing drawing


Third Example

How to answer this questions from our dataset above?

  • How many observations (i.e. rows)?
  • How many unique values?
  • The most frequent value?
  • How many observations for the most frequent value?
  • The least frequent value?
  • How many observations for the least frequent value?

by calling sales.stb.counts we can calculate basic analysis with our dataset and we can specifically choose which data type we want to analysis.

sales.stb.counts(exclude="number")

The result

drawing


Fourth Example

Show missing value, its total and its percentage

sales.stb.missing()

The result

drawing


Fifth Example

Show missing value, its total and its percentage

sales[sales['product_group'].isin(['PG2','PG1'])]

The result

drawing