Create Description From Dataframe
descriptive analytics
Dataset
Load dataset
we use simple and common sales dataset from internet and select 6 columns specific for this project.
sales = pd.read_csv(
r"..\dataset\sales_data_with_stores.csv",
usecols = ["store", "product_group", "product_code", "cost",
"price", "last_week_sales"]
)
| store | product_group | product_code | cost | price | last_week_sales | |
|---|---|---|---|---|---|---|
| 0 | Violet | PG2 | 4187 | 420.76 | 569.91 | 13 |
| 1 | Rose | PG2 | 4195 | 545.64 | 712.41 | 16 |
| 2 | Violet | PG2 | 4204 | 640.42 | 854.91 | 22 |
| 3 | Daisy | PG2 | 4219 | 869.69 | 1034.55 | 14 |
| 4 | Daisy | PG2 | 4718 | 12.54 | 26.59 | 50 |
The Code
By using, import ursar_sidepandas we can remodel pandas dataframe into more easier to create description analysis. Each dataframe will be add a module that can be used by .stb function after dataframe variable.
import ursar_sidepandas
df.stb.
First Example
How to answer this questions from our dataset above?
- How many product groups exist?
- What is the size of each product group in terms of the number of products they contain?
- What is the cumulative coverage of the entire portfolio?
by calling sales.stb.freq and define which column that we want to calculate, we can count how much categories in that columns, how much in total, its percentage, its cumulative and cumulative percent.
sales.stb.freq(["product_group"])
The result

Second Example
How to answer this questions from our dataset above?
- to see the distribution with regards to sales
by calling sales.stb.freq and define which column and which value that we want to calculate, we can create more deep analysis about categorical and numerical data
sales.stb.freq(["product_group"], value="last_week_sales")
sales.stb.freq(["product_group", "store"], value="last_week_sales")
The result

Third Example
How to answer this questions from our dataset above?
- How many observations (i.e. rows)?
- How many unique values?
- The most frequent value?
- How many observations for the most frequent value?
- The least frequent value?
- How many observations for the least frequent value?
by calling sales.stb.counts we can calculate basic analysis with our dataset and we can specifically choose which data type we want to analysis.
sales.stb.counts(exclude="number")
The result

Fourth Example
Show missing value, its total and its percentage
sales.stb.missing()
The result

Fifth Example
Show missing value, its total and its percentage
sales[sales['product_group'].isin(['PG2','PG1'])]
The result
