Link Search Menu Expand Document

Additional Technic for Description Analysis

descriptive analytics

Dataset

Load dataset

we use simple and common titanic dataset from seaborn library.

df = sns.load_dataset("titanic")
  survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
0 0 3 male 22 1 0 7.25 S Third man True nan Southampton no False
1 1 1 female 38 1 0 71.2833 C First woman False C Cherbourg yes False
2 1 3 female 26 0 0 7.925 S Third woman False nan Southampton yes True
3 1 1 female 35 1 0 53.1 S First woman False C Southampton yes False
4 0 3 male 35 0 0 8.05 S Third man True nan Southampton no True

Support Value

Detail

Support value is well known in the context of association rule mining (like Apriori or market basket analysis).

Support measures how frequently a pattern appears in a dataset.

Support is used to:

  • Filter out rare combinations (noise)
  • Find “frequent itemsets”
  • Build stronger rules later (with confidence & lift)

Support answers:

“How common is this pattern in the entire dataset?”

Simple process:

    support(X) = freq(X)/N(# of feature)
    support(X->Y) = freq(X join Y)/N(# of feature)

The support value means that this combination of values can be observed xx% of the time in the dataset.

The Code

By calling, describe.freq_unique_com we can generate Support value.

describe.freq_unique_com(main_data,col_list)

This function requires the following parameters:

  • main_data (dataframe): Data Input
  • col_list (list): selected columns

The result

drawing