Create Profiling Data
descriptive analytics

Introduction
Data quality profiling and exploratory data analysis are crucial steps in the process of Data Science and Machine Learning development. This project is one of the tool that can be used as the first step in the data understanding step of the data science workflow.
This project heavly used ydata-profiling, which is a leading package for data profiling, that automates and standardizes the generation of detailed reports, complete with statistics and visualizations. The significance of the package lies in how it streamlines the process of understanding and preparing data for analysis
Dataset
Load dataset
we use simple and common titanic dataset from seaborn library.
df = sns.load_dataset("titanic")
| survived | pclass | sex | age | sibsp | parch | fare | embarked | class | who | adult_male | deck | embark_town | alive | alone | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 3 | male | 22 | 1 | 0 | 7.25 | S | Third | man | True | nan | Southampton | no | False |
| 1 | 1 | 1 | female | 38 | 1 | 0 | 71.2833 | C | First | woman | False | C | Cherbourg | yes | False |
| 2 | 1 | 3 | female | 26 | 0 | 0 | 7.925 | S | Third | woman | False | nan | Southampton | yes | True |
| 3 | 1 | 1 | female | 35 | 1 | 0 | 53.1 | S | First | woman | False | C | Southampton | yes | False |
| 4 | 0 | 3 | male | 35 | 0 | 0 | 8.05 | S | Third | man | True | nan | Southampton | no | True |
The Code
By calling, describe.tableone we can generate extensive descriptive and analysis with our dataset.
describe.tableone(df, col_list, cat_col, non_normal, label_col)
This function requires the following parameters:
- main_data (
dataframe): Data Input - col_list (
list): selected columns - cat_col (
list): categorical columns - non_normal (
list): numerical columns - label_col (
string): targeted column
The result

The Code
By calling, describe.profiling we can generate HTML file that contain all profiling and exploratory data analysis
describe.profiling(main_data,title,output)
This function requires the following parameters:
- main_data (
dataframe): Data Input - title (
string): title name - output (
string): output file name
The result

The Code
By calling, describe.sweetviz we can generate HTML file that contain all profiling and exploratory data analysis
describe.sweetviz(main_data, target, name_file)
This function requires the following parameters:
- main_data (
dataframe): Data Input - target (
string): targeted column - name_file (
string): output file name
The result

The Code
By calling, describe.pandas_gui we can generate descriptive GUI with pandasgui
describe.pandas_gui(main_data)
This function requires the following parameters:
- main_data (
dataframe): Data Input
The result
