Statistical Tests

Solutions

Statistical Tests
Types of Data (naming)
Statistical Analysis Methods
Statistic Description
Correlation vs Everybody
Statistic Test Details

Types of Data (naming)

Categorical (Qualitative)
- Nominal (identify)
- Ordinal (rank / order)
Numerical (Quantitative)
- Discrete (count)
- Continuous (measure)

Statistical Analysis Methods

Type	Variables	dependent variable (y)	independent variable (x)	group targeted	methods	target stat	Details of independent variable
Relationship	Categorical + Numerical				Point-Biserial
Compare groups	Categorical + Numerical	Categorical	Ordinal, Interval	2 groups, single variable	Student’s t-test for 2 paired samples		paired samples, parametric, normal distribution
Compare groups	Categorical + Numerical	Ordinal, Interval	Ordinal / Categorical	2 groups, single variable	Wilcoxon test, signed-rank test		paired samples, nonparametric, non-normal distribution
Compare groups	Categorical + Numerical			2 groups, 2 variable	Student’s t-test for 2 independent samples	compare means	Equal population variances, parametric, independent samples, large sample size, unknown and similiar variances, normal distribution
	Categorical + Numerical	Categorical	Numerical	2 groups	Welch’s t-test for 2 independent samples		Unequal population variances, parametric, independent samples, large sample size, unknown and non similiar variances, normal distribution
	Categorical + Numerical			2 groups	Z-test	compare means	large sample size, known variance
Compare groups	Categorical + Numerical	Categorical	Ordinal, Interval	2 groups, 2 variable	Mann-Whitney U test		Nonparametric, independent samples, non-normal data, small sample size
Compare groups	Categorical + Numerical	Ordinal, Interval	Ordinal / Categorical	2 groups	Wilcoxon-Mann-Whitney Test		unpaired samples
Compare groups	Categorical + Numerical	Categorical	Numerical	> 2 groups, multivariable	One-way ANOVA	compare means	Equal population variances, parametric, independent samples
	Categorical + Numerical			> 2 groups	Welch’s ANOVA		Unequal population variances, parametric, independent samples
Compare groups	Categorical + Numerical	Ordinal, Interval	Ordinal / Categorical	> 2 groups	Kruskal-Wallis test		Nonparametric, independent samples, unparied samples, non-normal data
Compare groups	Categorical + Numerical	Interval		> 2 groups	Repeated measures ANOVA		paired samples, parametric
Compare groups	Categorical + Numerical	Ordinal, Interval	Ordinal / Categorical	> 2 groups, multivariable	Friedman test		paired samples, nonparametric
	Categorical + Numerical	Nominal	Ordinal / Categorical	2 groups	Bowker Test		paired samples
Compare groups	Categorical + Numerical	Categorical (2 categories)	Ordinal / Categorical	>= 2 groups	Fisher’s Test		unpaired samples, small sample size
Compare groups	Categorical + Numerical	Nominal	Ordinal / Categorical	>= 2 groups	Fisher’s Test		unpaired samples, small sample size
Compare groups	Categorical + Numerical	Categorical (2 categories), Nominal	Ordinal / Categorical	>= 2 groups	Chi-squared-Test	compare Proportions	unpaired samples, large sample size
Compare groups	2 Categorical	Categorical (2 categories)	Ordinal / Categorical	2 groups for each variable	McNemar’s test, Sign-Test		2 paired samples
Compare groups	2 Categorical	Categorical (2 categories)	Ordinal / Categorical	2 groups for each variable	Cochran’s Q test		> 2 paired samples
	2 Categorical			2 groups for each variable	Fisher’s exact test		Expected frequencies < 5, independent samples
Relationship	2 Categorical	Categorical (2 categories)		2 groups for each variable	Chi-square test of independence		Expected frequencies >= 5, independent samples
Relationship	2 Categorical	Categorical (2 categories)		> 2 groups for at least one variable	Chi-square test of independence
Relationship	2 Numerical	Normal	Normal/interval (ordinal)		Pearson Correlation		Linear Relation, Parametric
Relationship	2 Numerical	Ordinal, Interval	Normal/interval (ordinal)		Spearman Correlation		non-Linear Relation, Nonparametric
Predict Outcomes	Multi-Numerical	Numerical	1 variable		Simple linear regression
Predict Outcomes	Multi-Numerical	Numerical	multiple numerical variable		Multiple linear regression
Predict Outcomes	Categorical + Numerical	Numerical	multiple categorical variable + numerical		ANCOVA
	Multi-Categorical			2 groups	Binary Logistic regression
	Multi-Categorical			> 2 groups	Multinominal logistic regression
	1 Categorical			2 groups	One-proportion test
Compare groups	1 Categorical		Categorical	> 2 groups	Chi-square goodness of t test		unpaired samples
Compare groups	1 Numerical		Interval		One-sample Student’s t-test		Parametric
Compare groups	1 Numerical		Ordinal, Interval		One-sample Wilcoxon test		Nonparametric
	1 Numerical		Nominal		Binomial Test
	1 Numerical		Ordinal, Interval		Median Test
Compare groups	1 Numerical		Normal		t-Test
Compare groups	Categorical + Numerical	Normal	Ordinal / Categorical	2 groups	t-Test (for paired)		paired and unpaired samples
Compare groups	Categorical + Numerical	Normal	Ordinal / Categorical	> 2 groups	Linear Model (ANOVA)		paired and unpaired samples, CLT Holds
	Categorical + Numerical	Categorical (2 categories)	Normal/interval (ordinal)		(Conditional) Logistic Regression
	Categorical + Numerical	Nominal	Normal/interval (ordinal)		Multinomial logistic regression
	More than 1	Categorical (2 categories)	Combination		Logistic Regression
	More than 1	Nominal, Categorical	Combination		Multinomial logistic regression
	More than 1	Ordinal	Combination		Ordered logit
	More than 1	Interval, Normal	Combination		Multivariate Linear Model
	Categorical + Numerical	Censored Interval	Ordinal / Categorical	>= 2 groups	Log-Rank Test
	Categorical + Numerical	Censored Interval	Ordinal / Categorical, Normal / interval (ordinal)		Survival Analysis, Cox proportional hazards regression
	Categorical + Numerical	Combination	Combination		Clustering, factor analysis, PCA, canonical correlation

Statistic Description

Type	Variables	Explains
Normality		to check if your data has a Gaussian distribution
Normality	Shapiro-Wilk	- Tests whether a data sample has a Gaussian distribution. - It is used to determine whether or not a sample comes from a normal distribution.
Normality	D’Agostino’s K^2	Tests whether a data sample has a Gaussian distribution.
Normality	Anderson-Darling	Tests whether a data sample has a Gaussian distribution.
Normality	Kolmogorov-Smirnov	- Performs the (one-sample or two-sample) Kolmogorov-Smirnov test for goodness of fit. - The Kolmogorov-Smirnov test is used to test whether or not or not a sample comes from a certain distribution.
Normality	Lilliefors’ tes	- Test assumed normal or exponential distribution using Lilliefors’ test. - Lilliefors’ test is a Kolmogorov-Smirnov test with estimated parameters.
Normality	Jarque-Bera	The Jarque-Bera tests whether the sample data has the skewness and kurtosis matching a normal distribution.
Multivariate Normality	The Henze-Zirkler multivariate normality test	- The Henze-Zirkler multivariate normality test to test whether or not several variables are normally distributed as a group we must perform a multivariate normality test. - The Henze-Zirkler Multivariate Normality Test determines whether or not a group of variables follows a multivariate normal distribution. - The null and alternative hypotheses for the test are as follows: - H0 (null): The variables follow a multivariate normal distribution. - Ha (alternative): The variables do not follow a multivariate normal distribution.
Correlation	Correlation	to check if two samples are related
Correlation	pearsonr	Tests whether two samples have a linear relationship. used in numerical vs numerical
Correlation	Spearman’s Rank	Tests whether two samples have a monotonic relationship (consistent increase or decrease) / non linear and with outliers. used in categorical (ordinal) vs numerical
Correlation	Kendall’s Rank	Tests whether two samples have a monotonic relationship
Correlation	Chi-Square Test of Independence	Tests whether two categorical variables are related or independent. used in categorical (ordinal) vs categorical (ordinal). - rules: - Non-parametric; does not require assumptions about population parameters - Compares difference in population proportions between groups - Contingency table of observed values is required. - the expected counts are calculated under the assumption that the 2 random variables are independent. - The Chi-square test is sensitive to sample size (i.e. asymptotic): - As the sample size increases, the absolute differences become a smaller proportion of the expected value. - The outcome is that a strong association may not surface if sample size is small. - In large sample sizes, statistical significance may surface while the association is not substantial (i.e. very weak) - using chi squared test on small samples, might end up committing a Type II error.
Correlation	Chi-Square Goodness of Fit Test	is used to determine whether or not a categorical variable follows a hypothesized distribution. To perform a Chi-Square goodness of fit test to determine if the data is consistent with the hypothesized distribution.
Correlation	Fisher’s Exact Test	- is used to determine whether or not there is a significant association between two categorical variables. - It is typically used as an alternative to the Chi-Square Test of Independence - when one or more of the cell counts in a 2×2 table is less than 5. - example data: data = [[8, 4], [4, 9]] - The null and alternative hypotheses for the test are as follows: - H0: (null hypothesis) The two variables are independent. - H1: (alternative hypothesis) The two variables are not independent.
Correlation	McNemar’s Test	- is used to determine if there is a statistically significant difference in proportions between paired data. - example data: Before agree, before not agree data = After agree [[30, 40], After not agree [12, 18]] - To determine if there was a statistically significant difference in the proportion of people who supported the law before and after viewing the video. - The question in the McNemar test is: do these two proportions, pA and pB, significantly differ?
Stationary		to check if a time series is stationary or not
Stationary	Augmented Dickey-Fuller Unit Root	Tests whether a time series has a unit root, e.g. has a trend or more generally is autoregressive
Stationary	Kwiatkowski-Phillips-Schmidt-Shin	Tests whether a time series is trend stationary or not
Parametric hypothesis (interval)		to compare data samples (for normal distribution data)
Parametric hypothesis (interval)	Student’s t-test	- Tests whether the means of two independent samples are significantly different (unknown standard deviation, same variance) - example: - Is this difference between means statistically significant to conclude that the groups are different? - have two sections of students: section A and section B. - The mean scores in mathematics of sections are 95 and 90 respectively. Therefore, the difference is 5. - The question is: does this difference of 5 provide enough evidence that the mean score between the two sections are different? - assumptions: - Both samples are approximately normally distributed. - Both samples have approximately the same variance. - If the ratio of the larger variance to the smaller variance is less than 4, then we can assume the variances are approximately equal - The observations in one sample are independent of the observations in the other sample. - Both samples were obtained using a random sampling method.
Parametric hypothesis (interval)	Paired Student’s t-test	- Tests whether the means of two paired samples are significantly different - is used to compare the means of two samples when each observation in one sample can be paired with an observation in the other sample. - A paired samples t-test is commonly used in two scenarios: - A measurement is taken on a subject before and after some treatment - A measurement is taken under two different conditions - In both cases we are interested in comparing the mean measurement between two groups in which each observation - in one sample can be paired with an observation in the other sample. - assumptions: - The participants should be selected randomly from the population. - The differences between the pairs should be approximately normally distributed. - There should be no extreme outliers in the differences.
Parametric hypothesis (interval)	One proportion z-test	- Tests whether population proportion is equal or not to some hypothesized population proportion. - is used to compare an observed proportion to a theoretical one. - example: - it’s virtually guaranteed that the proportion of residents in the sample who support the law will be at least a little different - from the proportion of residents in the entire population who support the law. - The question is whether or not this difference is statistically significant. - assumption: known standard deviations - A one proportion z-test always uses the following null hypothesis: - H0: p = p0 (population proportion is equal to some hypothesized population proportion p0) - The alternative hypothesis can be either two-tailed, left-tailed, or right-tailed: - H1 (two-tailed): p ≠ p0 (population proportion is not equal to some hypothesized value p0) - H1 (left-tailed): p < p0 (population proportion is less than some hypothesized value p0) - H1 (right-tailed): p > p0 (population proportion is greater than some hypothesized value p0)
Parametric hypothesis (interval)	Two proportion z-test	- Tests whether population proportion is equal or not to other population proportion. - example: - Suppose we want to know if there is a difference in the proportion of residents who support a certain law in county A - compared to the proportion who support the law in county B. - it’s virtually guaranteed that the proportion of residents who support the law will be at least a little different between the two samples. - The question is whether or not this difference is statistically significant. - assumption: known standard deviations - A two proportion z-test always uses the following null hypothesis: - H0: μ1 = μ2 (the two population proportions are equal) - The alternative hypothesis can be either two-tailed, left-tailed, or right-tailed: - H1 (two-tailed): π1 ≠ π2 (the two population proportions are not equal) - H1 (left-tailed): π1 < π2 (population 1 proportion is less than population 2 proportion) - H1 (right-tailed): π1 > π2 (population 1 proportion is greater than population 2 proportion)
Parametric hypothesis (interval)	Welch’s t-test	- Tests whether the means of two independent samples are significantly different (not assume have the same variance) - this test assumes that both groups of data are sampled from populations that follow a normal distribution, - but it does not assume that those two populations have the same variance. -There are two differences in how the Student’s t-test and Welch’s t-test are carried out: - The test statistic - The degrees of freedom. the degrees of freedom for Welch’s t-test tends to be smaller than the degrees of freedom for Student’s t-test. - Some people argue that the Welch’s t-test should be the default choice for comparing the means of two independent groups since it performs better than the Student’s t-test when sample sizes and variances are unequal between groups, and it gives identical results when sample sizes are variances are equal. - In practice, when you are comparing the means of two groups it’s unlikely that the standard deviations for each group will be identical. - This makes it a good idea to just always use Welch’s t-test, so that you don’t have to make any assumptions about equal variances.
Nonparametric hypothesis (interval)	Mann-Whitney U Test	- Tests whether the distributions of two independent samples are equal or not / independence (equal to t-test) - Is used to compare the differences between two independent samples when the sample distributions are not normally distributed - and the sample sizes are small (n <30). - example: - 1. You want to know if weight loss varies for two groups: 12 people using diet A and 10 people using diet B. The weight loss is not normally distributed. - 2. You want to know if the scores of 8 students in class A differ from those of 7 students in class B. The scores are not normally distributed. - assumptions: 1. Ordinal or Continuous: The variable you’re analyzing is ordinal or continuous. - Examples of ordinal variables include Likert items (e.g., a 5-point scale from “strongly disagree” to “strongly agree”). continuos = height, weight, exam score. 2. Your independent variable should consist of two categorical, independent groups. 3. Independence: All of the observations from both groups are independent of each other. 4. Shape: The shapes of the distributions for the two groups are roughly the same. - A Mann-Whitney U test can be used when your two variables are not normally distributed. - However, in order to know how to interpret the results from a Mann-Whitney U test, you have to determine whether your two distributions. - (i.e., the distribution of scores for both groups of the independent variable)
Nonparametric hypothesis (interval)	Wilcoxon Signed-Rank Test	- Tests whether the distributions of two paired samples are equal or not / repeated (equal to paired t-test) - Use the Wilcoxon Signed Rank test when you would like to use the paired t-test but the distribution of the differences between the pairs is severely non-normally distributed. Keep in mind that the paired t-test is fairly robust to departures from normality, so the deviation from a normal distribution needs to be pretty severe to justify the use of the Wilcoxon Signed Rank test.
Nonparametric hypothesis (interval)	Kruskal-Wallis H Test	- Tests whether the distributions of two or more independent samples are equal or not (equal to ANOVA). - is used to determine whether or not there is a statistically significant difference between the medians of three or more independent groups. - It is considered to be the non-parametric equivalent of the One-Way ANOVA where normality assumptions may not apply. The Kruskal-Wallis Test uses the following null and alternative hypotheses: - 1. The null hypothesis (H0): The median is equal across all groups. - 2. The alternative hypothesis: (Ha): The median is not equal across all groups. assumptions: - 1. Dependent variable should be measured at the ordinal or continuous level (i.e., interval or ratio). - 2. Independent variable should consist of two or more categorical, independent groups. Typically, a Kruskal-Wallis H test is used when you have three or more categorical, independent groups, but it can be used for just two groups - 3. You should have independence of observations, which means that there is no relationship between the observations in each group or between the groups themselves. - 4. In order to know how to interpret the results from a Kruskal-Wallis H test, you have to determine whether the distributions in each group have the same shape or same variability. - If your distributions have the same shape, a Kruskal-Wallis H test to compare the medians of your dependent variable. - If your distributions have a different shape, you can only use the Kruskal-Wallis H test to compare mean ranks.
Nonparametric hypothesis (interval)	Friedman Test	- Tests whether the distributions of two or more paired samples are equal or not (equal to Repeated Measures ANOVA) - is a non-parametric alternative to the Repeated Measures ANOVA. - It is used to determine whether or not there is a statistically significant difference between the means of three or more groups - In which the same subjects show up in each group.
Nonparametric hypothesis (interval)	Cochran’s Q Test	- Tests whether the distributions of two or more paired samples are equal or not (equal to Repeated Measures ANOVA) - Cochran’s Q test is a hypothesis test where the response variable can take only two possible outcomes (coded as 0 and 1). - It is a nonparametric statistical test to verify if k treatments have identical effects. - Cochran’s Q test is a statistical test that is used to determine whether the proportion of “successes” is equal across three or more groups in which the same individuals appear in each group. - The test is usually used when you have a group of people performing a series of tasks or getting a set of treatments where the outcome is a “Success” or “Failure.” - This test is an extension of the McNemar Test. If there are only two unique values for the variables, Cochran’s Q test is the same as McNemar’s test. example: we may use Cochran’s Q test to determine if the proportion of students who pass a test is equal when using three different studying techniques. The null and alternative hypotheses for the test are as follows: - » Null Hypothesis (H0): The proportion of “successes” is the same in all groups - » Alternatve Hypothesis(HA): The proportion of “successes” is different in at least one of the groups. - The null hypothesis for the test is that the probability of success for different treatments is the same in the population. - If the null hypothesis is rejected, a post hoc analysis is recommended. This is because while we can only infer that there are differences in proportions between the groups, Cochran’s Q test doesn’t inform on which specific groups are different. rules - The dependent variable is dichotomous. - The Independent variable is categorical and has three or more related groups. Related groups are allowed in the sense that it is possible to have the same test subject in each group as it is measured on multiple occasions on the same dependent variable. - The cases are randomly sampled from the population. - The cases are randomly sampled from the population. - The sample size is sufficiently large. As a rule of thumb, the number of subjects for which the responses are not all 0’s or 1’s, n, should be ≥ 4 and nk should be ≥ 24. This is not required for the exact binomial McNemar test.
Strength measurements		to measure the strength of association
Strength measurements	Cramer	- measure of the strength of association between two nominal variables or categorical variables. - Would be useful in situations where a statistically significant chi-square could be the result of large sample size instead of substantive relationship between the variables. Advantage: - robust against large sample size - compares the strength of relationship between two categorical variables - variables should have two or more unique values per category - if there are only two unique measurement levels, it is the same as Phi Coefficient It ranges from 0 to 1 where: - 0 indicates no association between the two variables. - 1 indicates a strong association between the two variables. - assumption: Categorical variables
Strength measurements	Phi Coefficient	- measure of the strength of association between two nominal variables or categorical variables if there are only two unique measurement levels. - is a measure of the association between two binary variables. - compares the strength of the relationship between two categorical variables - variables only two unique measurement levels - the coefficient ranges from 0 to 1 How to Interpret a Phi Coefficient: Similar to a Pearson Correlation Coefficient, a Phi Coefficient takes on values between -1 and 1 where: - -1 indicates a perfectly negative relationship between the two variables. - 0 indicates no association between the two variables. - 1 indicates a perfectly positive relationship between the two variables. - In general, the further away a Phi Coefficient is from zero, the stronger the relationship between the two variables.
Strength measurements	Point-biserial correlation	- measure of the strength and direction of the association between continuous & categorical variable is used to measure the relationship between a binary variable, x, and a continuous variable, y. Similar to the Pearson correlation coefficient, the point-biserial correlation coefficient takes on a value between -1 and 1 where: - -1 indicates a perfectly negative correlation between two variables - 0 indicates no correlation between two variables - 1 indicates a perfectly positive correlation between two variables
Variance		to check if two samples have equal variance
Variance	Levene’s test	- Is used to determine whether two or more groups have equal variances. - It is commonly used because many statistical tests make the assumption that groups have equal variances and Levene’s Test allows you to determine if this assumption is satisified. - Perform Levene’s Test to determine whether or not the three groups have equal variances. Here are actually three different variations of Levene’s test you can use. The recommended usages are as follows: - » median : recommended for skewed distributions. - » mean : recommended for symmetric, moderate-tailed distributions. - » trimmed : recommended for heavy-tailed distributions. However, if we reject the null hypothesis then this indicates that the assumption of equal variances is violated. In this case, we have two options: - Proceed with a One-Way ANOVA anyway. It turns out that a one-way ANOVA is actually robust to unequal variances as long as the largest variance is no larger than 4 times the smallest variance. - Perform a Kruskal-Wallis Test. If the ratio of the largest variance to the smallest variance is greater than 4, we may instead choose to perform a Kruskal-Wallis test. This is considered the non-parametric equivalent to the one-way ANOVA.
Variance	F-test	- Is used to test whether two population variances are equal. The null and alternative hypotheses for the test are as follows: - » H0: σ12 = σ22 (the population variances are equal) - » H1: σ12 ≠ σ22 (the population variances are not equal) flow: - The F test statistic is calculated as s12 / s22. By default, numpy.var calculates the population variance. To calculate the sample variance, we need to specify ddof=1. - The p-value corresponds to 1 – cdf of the F distribution with numerator degrees of freedom = (n1)-1 and denominator degrees of freedom = (n2)-1. - This function only works when the first sample variance is larger than the second sample variance. Thus, define the two samples in such a way that they work with the function. The F-test is typically used to answer one of the following questions: - Do two samples come from populations with equal variances? - Does a new treatment or process reduce the variability of some current treatment or process?
Variance	Bartlett’s test	- Is a statistical test that is used to determine whether or not the variances between several groups are equal. Bartlett’s Test uses the following null and alternative hypotheses: - H0: The variance among each group is equal. - HA: At least one group has a variance that is not equal to the rest.
Kolmogorov-Smirnov Test	Kolmogorov-Smirnov Test	- Is used to test whether or not a two samples comes from a same distribution.
Runs Test	Runs Test	- Is a statistical test that is used to determine whether or not a dataset comes from a random process. The null and alternative hypotheses of the test are as follows: - » H0 (null): The data was produced in a random manner. - » Ha (alternative): The data was not produced in a random manner.
Analysis of Variance Test (ANOVA)		- To check whether the means of two or more independent samples are significantly different. - ANOVA is appropriate for instances where one variable is continuous and the other is categorical. - ANOVA provides a statistical test of whether two or more population means are equal. - Suppose we want to compare whether multiple groups differ in some type of measures. - It can be considered as an extension to t-test but comparing more than two groups at the same time. - The essence of ANOVA lies in the comparison of the variances among the groups and within each of the groups. - Is a statistical inference test that lets you compare multiple groups at the same time. examples: - if we wanted to test whether voter age differs based on some categorical variable like race, we have to compare the means of each level or group the variable. flow: - F = Between group variability / Within group variability - if Within group variability is larger, and the Between group variability is smaller, F will be smaller, so it reflecting the likely-hood of no significant differences between several sample means. - the F-distribution does not have any negative values because between and within-group variability are always positive due to squaring each deviation.
Analysis of Variance Test (ANOVA)	One Way F-test (ANOVA)	- Compares the means of three or more independent groups to determine if there is a statistically significant difference. - It tell whether two or more groups are similar or not based on their mean similarity and f-score. - Assume that variances are equal across samples. Example: The students are randomly assigned to use one of three studying techniques for the next three weeks to prepare for an exam. Assumptions: - Normality – Each sample was drawn from a normally distributed population - Equal Variances – The variances of the populations that the samples come from are equal. (use Bartlett’s Test) - Independence – The observations in each group are independent of each other and the observations within groups were obtained by a random sample.
Analysis of Variance Test (ANOVA)	Welch’s ANOVA (ANOVA)	- Is an alternative to the typical one-way ANOVA when the assumption of equal variances is violated. Example: To determine if three different studying techniques lead to different exam scores, a professor randomly assigns 10 students to use each technique (Technique A, B, or C) for one week and then makes each student take an exam of equal difficulty.
Analysis of Variance Test (ANOVA)	Two Way F-test (ANOVA)	- To determine whether or not there is a statistically significant difference between the means of three or more independent groups. - independent groups that have been split on two variables (sometimes called “factors”). - Two way F-test is extension of 1-way f-test, it is used when we have 2 independent variable and 2+ groups. - Compares means based on two independent variables and their interaction - 2-way F-test does not tell which variable is dominant. if we need to check individual significance then Post-hoc testing need to be performed. The purpose of a two-way ANOVA is to determine how two factors impact a response variable, and to determine whether or not there is an interaction between the two factors on the response variable. Example: - Suppose a botanist wants to explore how sunlight exposure and watering frequency affect plant growth. - She plants 40 seeds and lets them grow for two months under different conditions for sunlight exposure and watering frequency. - After two months, she records the height of each plant. In this case, we have the following variables: - Response variable: plant growth - Factors: sunlight exposure, watering frequency And we would like to answer the following questions: - Does sunlight exposure affect plant growth? - Does watering frequency affect plant growth? - Is there an interaction effect between sunlight exposure and watering frequency? (e.g. the effect that sunlight exposure has on the plants is dependent on watering frequency) - We would use a two-way ANOVA for this analysis because we have two factors. - If instead we wanted to know how only watering frequency affected plant growth, we would use a one-way ANOVA since we would only be working with one factor. assumptions: - Normality – The response variable is approximately normally distributed for each group. - Equal Variances – The variances for each group should be roughly equal. (use Bartlett’s Test) - Independence – The observations in each group are independent of each other and the observations within groups were obtained by a random sample. When reporting the results of a two-way ANOVA, we always use the following general structure: - A brief description of the independent and dependent variables. - Whether or not there was a significant interaction effect between the two independent variables. - Whether or not the two independent variables had a statistically significant effect on the dependent variable. - A two-way ANOVA was performed to analyze the effect of [independent variable 1] and [independent variable 2] on [dependent variable]. - A two-way ANOVA revealed that there [was or was not] a statistically significant interaction between the effects of [independent variable 1] and [independent variable 2] (F(df interaction, df within) = [F-value], p = [p-value]). - Simple main effects analysis showed that [independent variable 1] [did or did not] have a statistically significant effect on [dependent variable] (p = [p-value]). - Simple main effects analysis showed that [independent variable 2] [did or did not] have a statistically significant effect on [dependent variable] (p = [p-value]). Here are a few things to keep in mind when reporting the results of a two-way ANOVA: - Use a descriptive statistics table if necessary. shows the mean and standard deviation of values in each treatment group. - Round p-values when necessary.
Analysis of Variance Test (ANOVA)	Repeated Measures (ANOVA)	- To determine whether or not there is a statistically significant difference between the means of three or more groups in which the same subjects show up in each group. A repeated measures ANOVA is typically used in two specific situations: - Measuring the mean scores of subjects during three or more time points. - Measuring the mean scores of subjects under three different conditions. There are two benefits of using the same subjects across multiple treatment conditions: - It’s cheaper and faster for researchers to recruit and pay a smaller number of people to carry out an experiment since they can just obtain data from the same people multiple times. - We are able to attribute some of the variance in the data to the subjects themselves, which makes it easier to obtain a smaller p-value. assumptions: - Normality – Each sample was drawn from a normally distributed population - Equal Variances – The variances of the populations that the samples come from are equal. (use Bartlett’s Test) - Independence – The observations in each group are independent of each other and the observations within groups were obtained by a random sample. A repeated measures ANOVA uses the following null and alternative hypotheses: - The null hypothesis (H0): µ1 = µ2 = µ3 (the population means are all equal) - The alternative hypothesis: (Ha): at least one population mean is different from the rest
Analysis of Variance Test (ANOVA)	Analysis of covariance (ANCOVA)	- To determine if there is a statistically significant difference between three or more independent groups after accounting for one or more covariates. Example: - Suppose we want to know whether or not studying technique has an impact on exam scores, but we want to account for the grade that the student already has in the class. - We can use their current grade as a covariate and conduct an ANCOVA to determine if there is a statistically significant difference between the mean exam scores of the three groups. - Thus, if we find that there is a statistically significant difference in exam scores between the three studying techniques, we can be sure that this difference exists even after accounting for the students current grade in the class (i.e. if they’re already doing well or not in the class). She will perform an ANCOVA using the following variables: - Factor variable: studying technique -> independent variable (IV) - Covariate: current grade - Response variable: exam score -> dependent variable (DV) assumptions: - The covariate(s) and the factor variable(s) are independent - The covariate(s) are continuous data. interval or ratio data - Homogeneity of variances – The variances among the groups should be roughly equal. - Independence – The observations in each group should be independent. - Normality – The data should be roughly normally distributed in each group. - No extreme outliers – There should be no extreme outliers in any of the groups that could significantly affect - Analysis of covariance (ANCOVA) is a linear model that blends ANOVA and linear regression. - ANCOVA evaluates whether population means of a dependent variable (DV) are equal across levels of a categorical independent variable (IV) often called a treatment, while statistically controlling for the effects of other quantitative or continuous variables that are not of primary interest, known as covariates (CV).
Post Hoc Test		- To explore the difference between multiple group means while also controlling for the family-wise error rate. - In order to find out exactly which groups are different from each other, we must conduct a post hoc test (also known as a multiple comparison test), which will allow us to explore the difference between multiple group means while also controlling for the family-wise error rate.
Post Hoc Test	Tukey’s Test	- One of the most commonly used post hoc tests is Tukey’s Test, which allows us to make pairwise comparisons between the means of each group while controlling for the family-wise error rate. - The 95% confidence interval for the mean difference between group C and group A is (0.2813, 1.4309), and since this interval doesn’t contain zero we know that the difference between these two group means is statistically significant. - In particular, we know that the difference is positive, since the lower bound of the confidence interval is greater than zero.
Post Hoc Test	Dunn test - holm	- Another post hoc test we can perform is holm’s method. This is generally viewed as a more conservative test compared to Tukey’s Test.
Post Hoc Test	Dunnett’s Correction	- Yet another method we can use for multiple comparisons is Dunett’s Correction. - We would use this approach when we want to compare every group mean to a control mean, and we’re not interested in comparing the treatment means with one another.
Post Hoc Test	Nemenyi Test	- Perform the Nemenyi post-hoc test to determine exactly which groups have different means. - A one-way ANOVA with repeated measures that is also referred to as ANOVA with unreplicated block design can also be conducted via Friedman’s test. - The consequent post hoc pairwise multiple comparison test according to Nemenyi is conducted with this function.
AB testing		Use a two-sample hypothesis test Flows: - Our null hypothesis H0 is that the two designs A and B have the same efficacy. - The statistical significance is then measured by the p-value, some care has to be applied to properly choose the alternative hypothesis Ha. - You can choose one tail or two-tailed test if you want to know a priori whether the discrepancy between the results of A and B will be in favor of A or B. - The p-value is therefore computed as the area under the the two tails of the probability density function p(x) of a chosen test statistic on all x’ s.t. p(x’) <= p(our observation). Example: the probability of observing a discrepancy between our samples at least as strong as the one that we actually observed.
AB testing	Fisher’s exact test (discrete)	- is used to determine whether or not there is a significant association between two categorical variables. - It is typically used as an alternative to the Chi-Square Test of Independence when one or more of the cell counts in a 2×2 table is less than 5.
AB testing	Pearson’s chi-squared test(discrete)
AB testing	Z-test (continuous)	- The observations are normally distributed (or the sample size is large). - The sampling distributions have known variance σ_X and σ_Y.
AB testing	T-test (continuous)	- T-test compares the mean between two groups of interest. - Is this difference between means statistically significant to conclude that the groups are different? - Relationship Between Categorical and Numeric Variables. - The relationship between the two can be seen by comparing the averages for each category. If the average values are different, then the two variables are related. - To determine whether or not there is a relationship, a t-test is used, with the following hypotheses: - Null hypothesis: There is no relationship between the two variables - Alternative hypothesis: There is a relationship between the two variables. - The boxplot function is used to graphically display the average total spending for men and women, while the t.test command is used to statistically examine the relationship.
AB testing	Welch’s t-test (continuous)	- In most cases Student’s t test can be effectively applied with good results. - However, it may rarely happen that its second assumption (similar variance of the sampling distributions) is violated. - In that case, we cannot compute a pooled variance and rather than Student’s t test we should use Welch’s t-test. - This test operates under the same assumptions of Student’s t-test but removes the requirement on the similar variances. - Then, we can use a slightly different t statistic, which also has a Student’s t distribution, but with a different number of degrees of freedom ν.
AB testing	Mann-Whitney U test (continuous)	- This test makes the least assumptions about the nature of our observations, as it is fully nonparametric. - The idea of Mann-Whitney U test is to compute the U statistic. - The values of this test statistic are tabulated, as the distribution can be computed under the null hypothesis that, for random samples X and Y from the two populations, the probability P(X < Y) is the same as P(X > Y).
AB testing	Beta function (matrix bayes)	- Use beta function. the functions which models the data for an A/B test A/B tests are random experiments with exactly two possible outcomes, “convert” or “don’t”, and as such is described as a Bernoulli trial, and the Beta distribution is the conjugate prior for such a process. - as result we can say the probability of 1 sample is better than the control one. input: control = [16500, 30] test = [17000, 50] res = pd.DataFrame([control,test]).T res.columns = [‘control’,’test’]
AB testing	Delta
AB testing	abcd
Outlier		To detect outlier in data
Outlier	Grubbs’ Test	- Is used to identify the presence of outliers in a dataset To use this test, a dataset should be approximately normally distributed and have at least 7 observations.

Parameters

Level of significance: Refers to the degree of significance in which we accept or reject the null-hypothesis.
Type I error: When we reject the null hypothesis, although that hypothesis was true.
Type II errors: When we accept the null hypothesis but it is false.
One tailed test :
- A test of a statistical hypothesis , where the region of rejection is on only one side of the sampling distribution , is called a one-tailed test.
- Example :- a college has ≥ 4000 student or data science ≤ 80% org adopted.
Two-tailed test :
- A two-tailed test is a statistical test in which the critical area of a distribution is two-sided and tests whether a sample is greater than or less than a certain range of values.
- Example : a college != 4000 student or data science != 80% org adopted
P-value : or calculated probability, is the probability of finding the observed, or more extreme, results when the null hypothesis (H 0) of a study question is true — the definition of ‘extreme’ depends on how the hypothesis is being tested.
Degree of freedom: the freedom of vary in dataset. Each value is completely free to vary
paired: compares values (e.g. means) of the same subjects at two time points (before / after).

Hypothesis Testing

Hypothesis Testing is a family of statistical methods used to identify whether a sample of observed data can be used to accept or reject a predefined hypothesis.
Hypothesis testing only allows for validation of hypotheses, not for developing hypothesis.
The mathematics of a hypothesis test make an educated balance between the result of the sample measurements vs the number of observations.
The P-Value is the standard way to formulate an outcome of a hypothesis test and is interpreted in the same way for every possible test.
The P-Value is a score between 0 and 1, which tells us whether or not the difference between our sample observations and our hypothesis is significantly different.
The P-Value = 0.05 is its reference value.

Interpret a P-Value

A p-value is the probability of observing a sample statistic that is at least as extreme as your sample statistic, given that the null hypothesis is true.
A p-value simply tells you the strength of evidence in support of a null hypothesis.
If the p-value is less than the significance level, we reject the null hypothesis.
So, when you get a p-value of 0.000, you should compare it to the significance level.
Common significance levels include 0.1, 0.05, and 0.01. Since 0.000 is lower than all of these significance levels,
We would reject the null hypothesis in each case.
Rejects the null hypothesis (H0) and concludes that there is sufficient evidence to say that H1
A p-value indicates how believable the null hypothesis is, given the sample data.
Specifically, assuming the null hypothesis is true, the p-value tells us the probability of obtaining an effect
At least as large as the one we actually observed in the sample data.
If the p-value of a hypothesis test is sufficiently low, we can reject the null hypothesis.
Specifically, when we conduct a hypothesis test, we must choose a significance level at the outset.
Common choices for significance levels are 0.01, 0.05, and 0.10.
If the p-values is less than our significance level, then we can reject the null hypothesis.
Otherwise, if the p-value is equal to or greater than our significance level, then we fail to reject the null hypothesis.
The P-Value is a score between 0 and 1, which tells us whether or not the difference between our sample observations and our hypothesis is significantly different
Example:
- Suppose a factory claims that they produce tires that have a mean weight of 200 pounds.
- An auditor hypothesizes that the true mean weight of tires produced at this factory is different from 200 pounds.
- So he runs a hypothesis test and finds that the p-value of the test is 0.04.
- Here is how to interpret this p-value:
  - If the factory does indeed produce tires that have a mean weight of 200 pounds, then 4% of all audits will obtain the effect observed in the sample, or larger, because of random sample error.
  - This tells us that obtaining the sample data that the auditor did would be pretty rare if indeed the factory produced tires that have a mean weight of 200 pounds.

How Not to Interpret a P-Value:

The biggest misconception about p-values is that they are equivalent to the probability of making a mistake by rejecting a true null hypothesis (known as a Type I error).
There are two primary reasons that p-values can’t be the error rate:
- P-values are calculated based on the assumption that the null hypothesis is true and that the difference between the sample data and the null hypothesis is simple caused by random chance. Thus, p-values can’t tell you the probability that the null is true or false since it is 100% true based on the perspective of the calculations.
- Although a low p-value indicates that your sample data are unlikely assuming the null is true, a p-value still can’t tell you which of the following cases is more likely:
  - The null is false
  - The null is true but you obtained an odd sample
In regards to the previous example, here is a correct and incorrect way to interpret the p-value:
- Correct Interpretation:
  - Assuming the factory does produce tires with a mean weight of 200 pounds, you would obtain the observed difference that you did obtain in your sample or a more extreme difference in 4% of audits due to random sampling error.
  - Incorrect Interpretation:
    - If you reject the null hypothesis, there is a 4% chance that you are making a mistake.

Strength of stat value

Perfect — values near to ±1
High degree — values between ±0.5 and ±1
Moderate degree — values between ±0.3 and ±0.49
Low degree — values below ±0.29
No correlation — values close to 0

Type I error rate

Which is defined by our significance level (alpha) and tells us the probability of rejecting a null hypothesis that is actually true.
In other words, it’s the probability of getting a “false positive”, i.e. when we claim there is a statistically significant difference among groups, but there actually isn’t.
When we perform one hypothesis test, the type I error rate is equal to the significance level, which is commonly chosen to be 0.01, 0.05, or 0.10.
However, when we conduct multiple hypothesis tests at once, the probability of getting a false positive increases.
f we conduct several hypothesis tests at once using a significance level of .05, the probability that we get a false positive increases to beyond just 0.05.

Multiple Comparisons in ANOVA

When we conduct an ANOVA, there are often three or more groups that we are comparing to one another.
Thus, when we conduct a post hoc test to explore the difference between the group means, there are several pairwise comparisons we want to explore.
If we have more than four groups, the number of pairwise comparisons we will want to look at will only increase even more. The family-wise error rate increases rapidly as the number of groups (and consequently the number of pairwise comparisons) increases.
In fact, once we reach six groups, the probability of us getting a false positive is actually above 50%!
This means we would have serious doubts about our results if we were to make this many pairwise comparisons, knowing that our family-wise error rate was so high.
Fortunately, post hoc tests provide us with a way to make multiple comparisons between groups while controlling the family-wise error rate.

Post Hoc Tests & Statistical Power:

Post hoc tests do a great job of controlling the family-wise error rate, but the tradeoff is that they reduce the statistical power of the comparisons.
This is because the only way to lower the family-wise error rate is to use a lower significance level for all of the individual comparisons.
For example, when we use Tukey’s Test for six pairwise comparisons and we want to maintain a family-wise error rate of .05, we must use a significance level of approximately 0.011 for each individual significance level. The more pairwise comparisons we have, the lower the significance level we must use for each individual significance level.
The problem with this is that lower significance levels correspond to lower statistical power.
This means that if a difference between group means actually does exist in the population, a study with lower power is less likely to detect it.
One way to reduce the effects of this tradeoff is to simply reduce the number of pairwise comparisons we make.
For example, in the previous examples we performed six pairwise comparisons for the four different groups.
However, depending on the needs of your study, you may only be interested in making a few comparisons.

Data distribution VS Stats distribution:

Tests that rely on the assumption of normally distributed test statistics can also be applied if the original sampling distribution is highly non-normal!
Indeed, thanks to the Central Limit Theorem, the distribution of the test statistic is asymptotically normal as the sample size increases.
This is very useful in the common case of A/B tests that produce observations that are zero-inflated and/or multimodal.

Correlation vs Everybody

Correlation vs Causation

Correlation is a relationship or connection between two variables where whenever one changes, the other is likely to also change. But a change in one variable doesn’t cause the other to change. That’s a correlation, but it’s not causation.

Correlation vs Association

Correlation: when we use the word correlation we’re typically talking about the Pearson Correlation Coefficient. This is a measure of the linear association between two random variables X and Y. It has a value between -1 and 1 where:
- -1 indicates a perfectly negative linear correlation between two variables
- 0 indicates no linear correlation between two variables
- 1 indicates a perfectly positive linear correlation between two variables
We use two words to describe the correlation between two random variables:
1. Direction
  - Positive: Two random variables have a positive correlation if Y tends to increase as X increases.
  - Negative: Two random variables have a negative correlation if Y tends to decrease as X increases.
2. Strength
  - Weak: Two random variables have a weak correlation if the points in a scatterplot are loosely scattered.
  - Strong: Two random variables have a strong correlation if the points in a scatterplot are tightly packed together.
Association: When statisticians use the word association they can be talking about any relationship between two variables, whether it’s linear or non-linear. However, just knowing that the correlation between the two variables is zero can be misleading because it hides the fact there there exists a non-linear relationship instead.
Correlation can only tell us if two random variables have a linear relationship while association can tell us if two random variables have a linear or non-linear relationship.
Correlation quantifies the relationship between two random variables by using a number between -1 and 1, but association does not use a specific number to quantify a relationship.

Correlation vs Regression

Does a linear regression imply causation? No
Correlation: On correlation allows to quantify the degree to which two variables are related. In a correlation, both variables are in equal conditions (the correlation coefficient is the same if they are swapped).
Regression: The regression-based analysis tries to find the best-fitting line (or curve) to predict the value of a dependent variable Y from the known value of an independent variable X.
In a regression, however, it does matter what is X and what is Y, since in general the function that best predicts Y from X does not match the function that best predicts X from Y.

Correlation does not imply causation

Causality is very difficult to prove
Not only does it require a higher level of statistical rigor, it also requires A LOT of carefully collected data
Using correlations in a business context to maximize our chances of making the “best” decision.
And doing it in such a way where we can trust that the insights give us a reasonable expectation about how any given decision will impact the things we care about.
After all, this is the goal of data.
The solution: A heuristic approach for using correlations to inform decisions
1. Be intentional when testing correlations. Don’t correlate random things. Search long enough and you’re bound to find a really “surprising” correlation. Instead, focus on correlating things that are already connected. A great way to do this is by focusing on the customer’s action.
2. Correlate conversion rates (%), not totals.
3. Ensure trends are consistent over longer periods of time. Often times we’ll look at correlations in an aggregated way, stripping time from our analysis. But everything is changing in time so a correlation that existed in the past may have disappeared today, and you’ll never know it if you don’t analyze the data over time.
4. Always monitor the results. The downfall of using a correlation, is that we could be wrong. Albeit less likely when we follow best practices above, it’s still a risk. But we if can act quickly on correlated findings and vigilantly monitor the results, we can significantly minimize the risk of any wrong decision becoming a catastrophe.

Does causation imply correlation?

It seems it does. But again, the answer is that it does not have to. In the first place, the existence of causality does not imply that there is some kind of linear correlation (the way in which correlation between two variables is usually imagined).
However, the correlation coefficient does not provide information about the slope of the relationship nor many other aspects of nonlinear relationship.
Secondly, the existence of causality does not even imply that some kind of complex correlation between two variables can be measured.
In probability theory and information theory, the concept of mutual information measures the dependence between two random variables. That is, it measures how much knowing one of these variables reduces uncertainty about the other (it is, therefore, closely linked to the concept of entropy).

Does no correlation imply no causation?

This is not true in general. And any control system serves as counterexample.
Control is by definition impossible without causal relationships, but to control something means, roughly speaking, that some variable remains constant,
Which implies that this variable will not be correlated with other variables, including those that cause it to be constant.

Under what conditions does correlation imply causation?

A good way to clarify all this is to think of the structure of Bayesian network that may be generating the observable data. The key is to look for possible hidden variables.
If there is some hidden variable which the observed data depends on, then correlation would not imply causation (we would speak of a spurious relationship). If we are able to discard any hidden variable, then a causal relationship can be inferred.
In any case, the recommendation is to build a comprehensive list ranging all possible options and methodically review each of them to determine which one is most likely.
Reichenbach’s Common Cause Principle (CCP) [3] states that if an improbable coincidence has occurred, there must exist a common cause.
This means that strong correlations have causal explanations. For example, suppose that in a room two light bulbs suddenly go out. It is considered improbable that, by chance, both bulbs have blown at the same time, so we will look for the cause in a common burned fuse or in a general interruption of the electrical supply. Thereby, the improbable coincidence is explained as result of a common cause.

Transitivity and causal bidirectionality

If we have a probabilistic causal chain such as A → B → C, that is, where A causes B, and where B causes C, can we infer that A causes C?
- Again, intuition can play a trick on us, and the answer (at this point, expected) is that not necessarily. The formal explanation is that probabilistic causal relationships are guaranteed to be transitive only if the so-called Markov condition is met. This condition is related to the concept of conditional independence and states that, given the present, the future does not depend on the past.
- In general, causal intransitivity may be due to various reasons [1]. One of the most common is causal chunking. While each causal link in the chain may seem –to some extent– plausible, the overall causal connection between the first cause and the last effect seems too weak, which, from an analytical standpoint, leads to causal intransitivity.
- Besides that, causality is not necessarily one-way. One interesting aspect of causal relationships is the possibility of bidirectional or reciprocal causation, giving rise to feedback mechanisms.

Statistic Test Details

One-Tailed and Two-Tailed Tests

Type	Details
one sample	- Involves making a “greater than” or “less than ” statement. - For example, suppose we assume the mean height of a male in the U.S. is greater than or equal to 70 inches. - The null hypothesis would be H0: µ ≥ 70 inches and the alternative hypothesis would be Ha: µ < 70 inches.
two sample	- Involves making an “equal to” or “not equal to” statement. - For example, suppose we assume the mean height of a male in the U.S. is equal to 70 inches. - The null hypothesis would be H0: µ = 70 inches and the alternative hypothesis would be Ha: µ ≠ 70 inches.

One Sample t-test

The mean of a variable is different from a reference value.
Is used to test whether or not the mean of a population is equal to some value.
Suppose we want to know whether or not the mean weight of a certain species of turtle in Florida is equal to 310 pounds.
Take sample from population and measure their mean.
It’s virtually guaranteed that the mean weight of turtles in our sample will differ from 310 pounds.
The question is whether or not this difference is statistically significant.
The alternative hypothesis can be either two-tailed, left-tailed, or right-tailed:
- H1 (two-tailed): μ ≠ μ0 (population mean is not equal to some hypothesized value μ0)
- H1 (left-tailed): μ < μ0 (population mean is less than some hypothesized value μ0)
- H1 (right-tailed): μ > μ0 (population mean is greater than some hypothesized value μ0)
assumptions:
- The variable under study should be either an interval or ratio variable.
- The observations in the sample should be independent.
- The variable under study should be approximately normally distributed.
- The variable under study should have no outliers.

Two Sample t-test

The means of two groups are different
is used to test whether or not the means of two populations are equal.
Suppose we want to know whether or not the mean weight between two different species of turtles is equal.
Take sample from population and measure their mean.
It’s virtually guaranteed that the mean weight between the two samples will be at least a little different.
The question is whether or not this difference is statistically significant.
The alternative hypothesis can be either two-tailed, left-tailed, or right-tailed:
- H1 (two-tailed): μ1 ≠ μ2 (the two population means are not equal)
- H1 (left-tailed): μ1 < μ2 (population 1 mean is less than population 2 mean)
- H1 (right-tailed): μ1> μ2 (population 1 mean is greater than population 2 mean)
assumptions:
- The observations in one sample should be independent of the observations in the other sample.
- The data should be approximately normally distributed.
- The two samples should have approximately the same variance. If this assumption is not met, you should instead perform Welch’s t-test.
- The data in both samples was obtained using a random sampling method.

ANOVA

The means of more than two groups are different

Proportions z-test

The percentages of successes in a variable with two outcomes is different from a reference value

Two proportions z-test

The percentages of successes in a variable with two outcomes is different between two groups