Statistical Tests
Solutions
- Statistical Tests
- Types of Data (naming)
- Statistical Analysis Methods
- Statistic Description
- Statistic Test Details
- Correlation vs Everybody
- Statistic Test Details
Types of Data (naming)
- Qualitative (Categorical)
- Nominal (identify)
- Ordinal (rank / order)
- Quantitative (Numerical)
- Discrete (count)
- Continuous (measure)
Statistical Analysis Methods
| Type | Variables | dependent variable (y) | independent variable (x) | group targeted | methods | target stat | Details of independent variable |
|---|---|---|---|---|---|---|---|
| Relationship | Categorical + Numerical | Point-Biserial | |||||
| Compare groups | Categorical + Numerical | Categorical | Ordinal, Interval | 2 groups, single variable | Student’s t-test for 2 paired samples | paired samples, parametric, normal distribution | |
| Compare groups | Categorical + Numerical | Ordinal, Interval | Ordinal / Categorical | 2 groups, single variable | Wilcoxon test, signed-rank test | paired samples, nonparametric, non-normal distribution | |
| Compare groups | Categorical + Numerical | 2 groups, 2 variable | Student’s t-test for 2 independent samples | compare means | Equal population variances, parametric, independent samples, large sample size, unknown and similiar variances, normal distribution | ||
| Categorical + Numerical | Categorical | Numerical | 2 groups | Welch’s t-test for 2 independent samples | Unequal population variances, parametric, independent samples, large sample size, unknown and non similiar variances, normal distribution | ||
| Categorical + Numerical | Z-test | compare means | large sample size, known variance | ||||
| Compare groups | Categorical + Numerical | Categorical | Ordinal, Interval | 2 groups, 2 variable | Mann-Whitney U test | Nonparametric, independent samples, non-normal data, small sample size | |
| Compare groups | Categorical + Numerical | Ordinal, Interval | Ordinal / Categorical | 2 groups | Wilcoxon-Mann-Whitney Test | unpaired samples | |
| Compare groups | Categorical + Numerical | Categorical | Numerical | > 2 groups, multivariable | One-way ANOVA | compare means | Equal population variances, parametric, independent samples |
| Categorical + Numerical | > 2 groups | Welch’s ANOVA | Unequal population variances, parametric, independent samples | ||||
| Compare groups | Categorical + Numerical | Ordinal, Interval | Ordinal / Categorical | > 2 groups | Kruskal-Wallis test | Nonparametric, independent samples, unparied samples, non-normal data | |
| Compare groups | Categorical + Numerical | Interval | > 2 groups | Repeated measures ANOVA | paired samples, parametric | ||
| Compare groups | Categorical + Numerical | Ordinal, Interval | Ordinal / Categorical | > 2 groups, multivariable | Friedman test | paired samples, nonparametric | |
| Categorical + Numerical | Nominal | Ordinal / Categorical | 2 groups | Bowker Test | paired samples | ||
| Compare groups | Categorical + Numerical | Categorical (2 categories) | Ordinal / Categorical | >= 2 groups | Fisher’s Test | unpaired samples, small sample size | |
| Compare groups | Categorical + Numerical | Nominal | Ordinal / Categorical | >= 2 groups | Fisher’s Test | unpaired samples, small sample size | |
| Compare groups | Categorical + Numerical | Categorical (2 categories), Nominal | Ordinal / Categorical | >= 2 groups | Chi-squared-Test | compare Proportions | unpaired samples, large sample size |
| Compare groups | 2 Categorical | Categorical (2 categories) | Ordinal / Categorical | 2 groups for each variable | McNemar’s test, Sign-Test | 2 paired samples | |
| Compare groups | 2 Categorical | Categorical (2 categories) | Ordinal / Categorical | 2 groups for each variable | Cochran’s Q test | > 2 paired samples | |
| 2 Categorical | 2 groups for each variable | Fisher’s exact test | Expected frequencies < 5, independent samples | ||||
| Relationship | 2 Categorical | Categorical (2 categories) | 2 groups for each variable | Chi-square test of independence | Expected frequencies >= 5, independent samples | ||
| Relationship | 2 Categorical | Categorical (2 categories) | > 2 groups for at least one variable | Chi-square test of independence | |||
| Relationship | 2 Numerical | Normal | Normal/interval (ordinal) | Pearson Correlation | Linear Relation, Parametric | ||
| Relationship | 2 Numerical | Ordinal, Interval | Normal/interval (ordinal) | Spearman Correlation | non-Linear Relation, Nonparametric | ||
| Predict Outcomes | Multi-Numerical | Numerical | 1 variable | Simple linear regression | |||
| Predict Outcomes | Multi-Numerical | Numerical | multiple numerical variable | Multiple linear regression | |||
| Predict Outcomes | Categorical + Numerical | Numerical | multiple categorical variable + numerical | ANCOVA | |||
| Multi-Categorical | 2 groups | Binary Logistic regression | |||||
| Multi-Categorical | > 2 groups | Multinominal logistic regression | |||||
| 1 Categorical | 2 groups | One-proportion test | |||||
| Compare groups | 1 Categorical | Categorical | > 2 groups | Chi-square goodness of t test | |||
| Compare groups | 1 Numerical | Interval | One-sample Student’s t-test | Parametric | |||
| Compare groups | 1 Numerical | Ordinal, Interval | One-sample Wilcoxon test | Nonparametric | |||
| 1 Numerical | Nominal | Binomial Test | |||||
| 1 Numerical | Ordinal, Interval | Median Test | |||||
| Compare groups | 1 Numerical | Normal | t-Test | ||||
| Compare groups | Categorical + Numerical | Normal | Ordinal / Categorical | 2 groups | t-Test (for paired) | paired and unpaired samples | |
| Compare groups | Categorical + Numerical | Normal | Ordinal / Categorical | > 2 groups | Linear Model (ANOVA) | paired and unpaired samples | |
| Categorical + Numerical | Categorical (2 categories) | Normal/interval (ordinal) | (Conditional) Logistic Regression | ||||
| Categorical + Numerical | Nominal | Normal/interval (ordinal) | Multinomial logistic regression | ||||
| More than 1 | Categorical (2 categories) | Combination | Logistic Regression | ||||
| More than 1 | Nominal, Categorical | Combination | Multinomial logistic regression | ||||
| More than 1 | Ordinal | Combination | Ordered logit | ||||
| More than 1 | Interval, Normal | Combination | Multivariate Linear Model | ||||
| Categorical + Numerical | Censored Interval | Ordinal / Categorical | >= 2 groups | Log-Rank Test | |||
| Categorical + Numerical | Censored Interval | Ordinal / Categorical, Normal / interval (ordinal) | Survival Analysis, Cox proportional hazards regression | ||||
| Categorical + Numerical | Combination | Combination | Clustering, factor analysis, PCA, canonical correlation |
Statistic Description
| Type | Variables | Explains |
|---|---|---|
| Normality | to check if your data has a Gaussian distribution | |
| Normality | Shapiro-Wilk | - Tests whether a data sample has a Gaussian distribution. - It is used to determine whether or not a sample comes from a normal distribution. |
| Normality | D’Agostino’s K^2 | Tests whether a data sample has a Gaussian distribution. |
| Normality | Anderson-Darling | Tests whether a data sample has a Gaussian distribution. |
| Normality | Kolmogorov-Smirnov | - Performs the (one-sample or two-sample) Kolmogorov-Smirnov test for goodness of fit. - The Kolmogorov-Smirnov test is used to test whether or not or not a sample comes from a certain distribution. |
| Normality | Lilliefors’ tes | - Test assumed normal or exponential distribution using Lilliefors’ test. - Lilliefors’ test is a Kolmogorov-Smirnov test with estimated parameters. |
| Normality | Jarque-Bera | The Jarque-Bera tests whether the sample data has the skewness and kurtosis matching a normal distribution. |
| Multivariate Normality | The Henze-Zirkler multivariate normality test | - The Henze-Zirkler multivariate normality test to test whether or not several variables are normally distributed as a group we must perform a multivariate normality test. - The Henze-Zirkler Multivariate Normality Test determines whether or not a group of variables follows a multivariate normal distribution. - The null and alternative hypotheses for the test are as follows: - H0 (null): The variables follow a multivariate normal distribution. - Ha (alternative): The variables do not follow a multivariate normal distribution. |
| Correlation | Correlation | to check if two samples are related |
| Correlation | pearsonr | Tests whether two samples have a linear relationship. used in numerical vs numerical |
| Correlation | Spearman’s Rank | Tests whether two samples have a monotonic relationship / non linear and with outliers. used in categorical (ordinal) vs numerical |
| Correlation | Kendall’s Rank | Tests whether two samples have a monotonic relationship |
| Correlation | Chi-Square Test of Independence | Tests whether two categorical variables are related or independent. used in categorical (ordinal) vs categorical (ordinal). - rules: - Non-parametric; does not require assumptions about population parameters - Compares difference in population proportions between groups - Contingency table of observed values is required. - the expected counts are calculated under the assumption that the 2 random variables are independent. - The Chi-square test is sensitive to sample size (i.e. asymptotic): - As the sample size increases, the absolute differences become a smaller proportion of the expected value. - The outcome is that a strong association may not surface if sample size is small. - In large sample sizes, statistical significance may surface while the association is not substantial (i.e. very weak) - using chi squared test on small samples, might end up committing a Type II error. |
| Correlation | Chi-Square Goodness of Fit Test | is used to determine whether or not a categorical variable follows a hypothesized distribution. To perform a Chi-Square goodness of fit test to determine if the data is consistent with the hypothesized distribution. |
| Correlation | Fisher’s Exact Test | - is used to determine whether or not there is a significant association between two categorical variables. - It is typically used as an alternative to the Chi-Square Test of Independence - when one or more of the cell counts in a 2×2 table is less than 5. - example data: data = [[8, 4], [4, 9]] - The null and alternative hypotheses for the test are as follows: - H0: (null hypothesis) The two variables are independent. - H1: (alternative hypothesis) The two variables are not independent. |
| Correlation | McNemar’s Test | - is used to determine if there is a statistically significant difference in proportions between paired data. - example data: Before agree, before not agree data = After agree [[30, 40], After not agree [12, 18]] - To determine if there was a statistically significant difference in the proportion of people who supported the law before and after viewing the video. - The question in the McNemar test is: do these two proportions, pA and pB, significantly differ? |
| Stationary | to check if a time series is stationary or not | |
| Stationary | Augmented Dickey-Fuller Unit Root | Tests whether a time series has a unit root, e.g. has a trend or more generally is autoregressive |
| Stationary | Kwiatkowski-Phillips-Schmidt-Shin | Tests whether a time series is trend stationary or not |
| Parametric hypothesis (interval) | to compare data samples (for normal distribution data) | |
| Parametric hypothesis (interval) | Student’s t-test | - Tests whether the means of two independent samples are significantly different (unknown standard deviation, same variance) - example: - Is this difference between means statistically significant to conclude that the groups are different? - have two sections of students: section A and section B. - The mean scores in mathematics of sections are 95 and 90 respectively. Therefore, the difference is 5. - The question is: does this difference of 5 provide enough evidence that the mean score between the two sections are different? - assumptions: - Both samples are approximately normally distributed. - Both samples have approximately the same variance. - If the ratio of the larger variance to the smaller variance is less than 4, then we can assume the variances are approximately equal - The observations in one sample are independent of the observations in the other sample. - Both samples were obtained using a random sampling method. |
| Parametric hypothesis (interval) | Paired Student’s t-test | - Tests whether the means of two paired samples are significantly different - is used to compare the means of two samples when each observation in one sample can be paired with an observation in the other sample. - A paired samples t-test is commonly used in two scenarios: - A measurement is taken on a subject before and after some treatment - A measurement is taken under two different conditions - In both cases we are interested in comparing the mean measurement between two groups in which each observation - in one sample can be paired with an observation in the other sample. - assumptions: - The participants should be selected randomly from the population. - The differences between the pairs should be approximately normally distributed. - There should be no extreme outliers in the differences. |
| Parametric hypothesis (interval) | One proportion z-test | - Tests whether population proportion is equal or not to some hypothesized population proportion. - is used to compare an observed proportion to a theoretical one. - example: - it’s virtually guaranteed that the proportion of residents in the sample who support the law will be at least a little different - from the proportion of residents in the entire population who support the law. - The question is whether or not this difference is statistically significant. - assumption: known standard deviations - A one proportion z-test always uses the following null hypothesis: - H0: p = p0 (population proportion is equal to some hypothesized population proportion p0) - The alternative hypothesis can be either two-tailed, left-tailed, or right-tailed: - H1 (two-tailed): p ≠ p0 (population proportion is not equal to some hypothesized value p0) - H1 (left-tailed): p < p0 (population proportion is less than some hypothesized value p0) - H1 (right-tailed): p > p0 (population proportion is greater than some hypothesized value p0) |
| Parametric hypothesis (interval) | Two proportion z-test | - Tests whether population proportion is equal or not to other population proportion. - example: - Suppose we want to know if there is a difference in the proportion of residents who support a certain law in county A - compared to the proportion who support the law in county B. - it’s virtually guaranteed that the proportion of residents who support the law will be at least a little different between the two samples. - The question is whether or not this difference is statistically significant. - assumption: known standard deviations - A two proportion z-test always uses the following null hypothesis: - H0: μ1 = μ2 (the two population proportions are equal) - The alternative hypothesis can be either two-tailed, left-tailed, or right-tailed: - H1 (two-tailed): π1 ≠ π2 (the two population proportions are not equal) - H1 (left-tailed): π1 < π2 (population 1 proportion is less than population 2 proportion) - H1 (right-tailed): π1 > π2 (population 1 proportion is greater than population 2 proportion) |
| Parametric hypothesis (interval) | Welch’s t-test | - Tests whether the means of two independent samples are significantly different (not assume have the same variance) - this test assumes that both groups of data are sampled from populations that follow a normal distribution, - but it does not assume that those two populations have the same variance. -There are two differences in how the Student’s t-test and Welch’s t-test are carried out: - The test statistic - The degrees of freedom. the degrees of freedom for Welch’s t-test tends to be smaller than the degrees of freedom for Student’s t-test. - Some people argue that the Welch’s t-test should be the default choice for comparing the means of two independent groups since it performs better than the Student’s t-test when sample sizes and variances are unequal between groups, and it gives identical results when sample sizes are variances are equal. - In practice, when you are comparing the means of two groups it’s unlikely that the standard deviations for each group will be identical. - This makes it a good idea to just always use Welch’s t-test, so that you don’t have to make any assumptions about equal variances. |
Statistic Test Details
Parameters
- Level of significance: Refers to the degree of significance in which we accept or reject the null-hypothesis.
- Type I error: When we reject the null hypothesis, although that hypothesis was true.
- Type II errors: When we accept the null hypothesis but it is false.
- One tailed test :
- A test of a statistical hypothesis , where the region of rejection is on only one side of the sampling distribution , is called a one-tailed test.
- Example :- a college has ≥ 4000 student or data science ≤ 80% org adopted.
- Two-tailed test :
- A two-tailed test is a statistical test in which the critical area of a distribution is two-sided and tests whether a sample is greater than or less than a certain range of values.
- Example : a college != 4000 student or data science != 80% org adopted
- P-value : or calculated probability, is the probability of finding the observed, or more extreme, results when the null hypothesis (H 0) of a study question is true — the definition of ‘extreme’ depends on how the hypothesis is being tested.
- Degree of freedom: the freedom of vary in dataset. Each value is completely free to vary
Hypothesis Testing
- Hypothesis Testing is a family of statistical methods used to identify whether a sample of observed data can be used to accept or reject a predefined hypothesis.
- Hypothesis testing only allows for validation of hypotheses, not for developing hypothesis.
- The mathematics of a hypothesis test make an educated balance between the result of the sample measurements vs the number of observations.
- The P-Value is the standard way to formulate an outcome of a hypothesis test and is interpreted in the same way for every possible test.
- The P-Value is a score between 0 and 1, which tells us whether or not the difference between our sample observations and our hypothesis is significantly different.
- The P-Value = 0.05 is its reference value.
Interpret a P-Value
- A p-value is the probability of observing a sample statistic that is at least as extreme as your sample statistic, given that the null hypothesis is true.
- A p-value simply tells you the strength of evidence in support of a null hypothesis.
- If the p-value is less than the significance level, we reject the null hypothesis.
- So, when you get a p-value of 0.000, you should compare it to the significance level.
- Common significance levels include 0.1, 0.05, and 0.01. Since 0.000 is lower than all of these significance levels,
- We would reject the null hypothesis in each case.
-
Rejects the null hypothesis (H0) and concludes that there is sufficient evidence to say that H1
- A p-value indicates how believable the null hypothesis is, given the sample data.
- Specifically, assuming the null hypothesis is true, the p-value tells us the probability of obtaining an effect
-
At least as large as the one we actually observed in the sample data.
- If the p-value of a hypothesis test is sufficiently low, we can reject the null hypothesis.
- Specifically, when we conduct a hypothesis test, we must choose a significance level at the outset.
-
Common choices for significance levels are 0.01, 0.05, and 0.10.
- If the p-values is less than our significance level, then we can reject the null hypothesis.
-
Otherwise, if the p-value is equal to or greater than our significance level, then we fail to reject the null hypothesis.
-
The P-Value is a score between 0 and 1, which tells us whether or not the difference between our sample observations and our hypothesis is significantly different
- Example:
- Suppose a factory claims that they produce tires that have a mean weight of 200 pounds.
- An auditor hypothesizes that the true mean weight of tires produced at this factory is different from 200 pounds.
- So he runs a hypothesis test and finds that the p-value of the test is 0.04.
- Here is how to interpret this p-value:
- If the factory does indeed produce tires that have a mean weight of 200 pounds, then 4% of all audits will obtain the effect observed in the sample, or larger, because of random sample error.
- This tells us that obtaining the sample data that the auditor did would be pretty rare if indeed the factory produced tires that have a mean weight of 200 pounds.
How Not to Interpret a P-Value:
-
The biggest misconception about p-values is that they are equivalent to the probability of making a mistake by rejecting a true null hypothesis (known as a Type I error).
- There are two primary reasons that p-values can’t be the error rate:
- P-values are calculated based on the assumption that the null hypothesis is true and that the difference between the sample data and the null hypothesis is simple caused by random chance. Thus, p-values can’t tell you the probability that the null is true or false since it is 100% true based on the perspective of the calculations.
- Although a low p-value indicates that your sample data are unlikely assuming the null is true, a p-value still can’t tell you which of the following cases is more likely:
- The null is false
- The null is true but you obtained an odd sample
- In regards to the previous example, here is a correct and incorrect way to interpret the p-value:
- Correct Interpretation:
- Assuming the factory does produce tires with a mean weight of 200 pounds, you would obtain the observed difference that you did obtain in your sample or a more extreme difference in 4% of audits due to random sampling error.
- Incorrect Interpretation:
- If you reject the null hypothesis, there is a 4% chance that you are making a mistake.
- Correct Interpretation:
Strength of stat value
- Perfect — values near to ±1
- High degree — values between ±0.5 and ±1
- Moderate degree — values between ±0.3 and ±0.49
- Low degree — values below ±0.29
- No correlation — values close to 0
Type I error rate
- Which is defined by our significance level (alpha) and tells us the probability of rejecting a null hypothesis that is actually true.
-
In other words, it’s the probability of getting a “false positive”, i.e. when we claim there is a statistically significant difference among groups, but there actually isn’t.
- When we perform one hypothesis test, the type I error rate is equal to the significance level, which is commonly chosen to be 0.01, 0.05, or 0.10.
- However, when we conduct multiple hypothesis tests at once, the probability of getting a false positive increases.
- f we conduct several hypothesis tests at once using a significance level of .05, the probability that we get a false positive increases to beyond just 0.05.
Multiple Comparisons in ANOVA
- When we conduct an ANOVA, there are often three or more groups that we are comparing to one another.
-
Thus, when we conduct a post hoc test to explore the difference between the group means, there are several pairwise comparisons we want to explore.
- If we have more than four groups, the number of pairwise comparisons we will want to look at will only increase even more. The family-wise error rate increases rapidly as the number of groups (and consequently the number of pairwise comparisons) increases.
- In fact, once we reach six groups, the probability of us getting a false positive is actually above 50%!
- This means we would have serious doubts about our results if we were to make this many pairwise comparisons, knowing that our family-wise error rate was so high.
- Fortunately, post hoc tests provide us with a way to make multiple comparisons between groups while controlling the family-wise error rate.
Post Hoc Tests & Statistical Power:
- Post hoc tests do a great job of controlling the family-wise error rate, but the tradeoff is that they reduce the statistical power of the comparisons.
-
This is because the only way to lower the family-wise error rate is to use a lower significance level for all of the individual comparisons.
-
For example, when we use Tukey’s Test for six pairwise comparisons and we want to maintain a family-wise error rate of .05, we must use a significance level of approximately 0.011 for each individual significance level. The more pairwise comparisons we have, the lower the significance level we must use for each individual significance level.
- The problem with this is that lower significance levels correspond to lower statistical power.
-
This means that if a difference between group means actually does exist in the population, a study with lower power is less likely to detect it.
- One way to reduce the effects of this tradeoff is to simply reduce the number of pairwise comparisons we make.
- For example, in the previous examples we performed six pairwise comparisons for the four different groups.
- However, depending on the needs of your study, you may only be interested in making a few comparisons.
Data distribution VS Stats distribution:
- Tests that rely on the assumption of normally distributed test statistics can also be applied if the original sampling distribution is highly non-normal!
- Indeed, thanks to the Central Limit Theorem, the distribution of the test statistic is asymptotically normal as the sample size increases.
- This is very useful in the common case of A/B tests that produce observations that are zero-inflated and/or multimodal.
Correlation vs Everybody
Correlation vs Causation
Correlation is a relationship or connection between two variables where whenever one changes, the other is likely to also change. But a change in one variable doesn’t cause the other to change. That’s a correlation, but it’s not causation.
Correlation vs Association
- Correlation: when we use the word correlation we’re typically talking about the Pearson Correlation Coefficient. This is a measure of the linear association between two random variables X and Y. It has a value between -1 and 1 where:
- -1 indicates a perfectly negative linear correlation between two variables
- 0 indicates no linear correlation between two variables
- 1 indicates a perfectly positive linear correlation between two variables
- We use two words to describe the correlation between two random variables:
- Direction
- Positive: Two random variables have a positive correlation if Y tends to increase as X increases.
- Negative: Two random variables have a negative correlation if Y tends to decrease as X increases.
- Strength
- Weak: Two random variables have a weak correlation if the points in a scatterplot are loosely scattered.
- Strong: Two random variables have a strong correlation if the points in a scatterplot are tightly packed together.
- Direction
-
Association: When statisticians use the word association they can be talking about any relationship between two variables, whether it’s linear or non-linear. However, just knowing that the correlation between the two variables is zero can be misleading because it hides the fact there there exists a non-linear relationship instead.
- Correlation can only tell us if two random variables have a linear relationship while association can tell us if two random variables have a linear or non-linear relationship.
- Correlation quantifies the relationship between two random variables by using a number between -1 and 1, but association does not use a specific number to quantify a relationship.
Correlation vs Regression
-
Does a linear regression imply causation? No
-
Correlation: On correlation allows to quantify the degree to which two variables are related. In a correlation, both variables are in equal conditions (the correlation coefficient is the same if they are swapped).
-
Regression: The regression-based analysis tries to find the best-fitting line (or curve) to predict the value of a dependent variable Y from the known value of an independent variable X.
-
In a regression, however, it does matter what is X and what is Y, since in general the function that best predicts Y from X does not match the function that best predicts X from Y.
Correlation does not imply causation
- Causality is very difficult to prove
-
Not only does it require a higher level of statistical rigor, it also requires A LOT of carefully collected data
- Using correlations in a business context to maximize our chances of making the “best” decision.
- And doing it in such a way where we can trust that the insights give us a reasonable expectation about how any given decision will impact the things we care about.
-
After all, this is the goal of data.
- The solution: A heuristic approach for using correlations to inform decisions
- Be intentional when testing correlations. Don’t correlate random things. Search long enough and you’re bound to find a really “surprising” correlation. Instead, focus on correlating things that are already connected. A great way to do this is by focusing on the customer’s action.
- Correlate conversion rates (%), not totals.
- Ensure trends are consistent over longer periods of time. Often times we’ll look at correlations in an aggregated way, stripping time from our analysis. But everything is changing in time so a correlation that existed in the past may have disappeared today, and you’ll never know it if you don’t analyze the data over time.
- Always monitor the results. The downfall of using a correlation, is that we could be wrong. Albeit less likely when we follow best practices above, it’s still a risk. But we if can act quickly on correlated findings and vigilantly monitor the results, we can significantly minimize the risk of any wrong decision becoming a catastrophe.
Does causation imply correlation?
- It seems it does. But again, the answer is that it does not have to. In the first place, the existence of causality does not imply that there is some kind of linear correlation (the way in which correlation between two variables is usually imagined).
- However, the correlation coefficient does not provide information about the slope of the relationship nor many other aspects of nonlinear relationship.
-
Secondly, the existence of causality does not even imply that some kind of complex correlation between two variables can be measured.
- In probability theory and information theory, the concept of mutual information measures the dependence between two random variables. That is, it measures how much knowing one of these variables reduces uncertainty about the other (it is, therefore, closely linked to the concept of entropy).
Does no correlation imply no causation?
- This is not true in general. And any control system serves as counterexample.
- Control is by definition impossible without causal relationships, but to control something means, roughly speaking, that some variable remains constant,
- Which implies that this variable will not be correlated with other variables, including those that cause it to be constant.
Under what conditions does correlation imply causation?
-
A good way to clarify all this is to think of the structure of Bayesian network that may be generating the observable data. The key is to look for possible hidden variables.
-
If there is some hidden variable which the observed data depends on, then correlation would not imply causation (we would speak of a spurious relationship). If we are able to discard any hidden variable, then a causal relationship can be inferred.
-
In any case, the recommendation is to build a comprehensive list ranging all possible options and methodically review each of them to determine which one is most likely.
-
Reichenbach’s Common Cause Principle (CCP) [3] states that if an improbable coincidence has occurred, there must exist a common cause.
-
This means that strong correlations have causal explanations. For example, suppose that in a room two light bulbs suddenly go out. It is considered improbable that, by chance, both bulbs have blown at the same time, so we will look for the cause in a common burned fuse or in a general interruption of the electrical supply. Thereby, the improbable coincidence is explained as result of a common cause.
Transitivity and causal bidirectionality
-
If we have a probabilistic causal chain such as A → B → C, that is, where A causes B, and where B causes C, can we infer that A causes C?
-
Again, intuition can play a trick on us, and the answer (at this point, expected) is that not necessarily. The formal explanation is that probabilistic causal relationships are guaranteed to be transitive only if the so-called Markov condition is met. This condition is related to the concept of conditional independence and states that, given the present, the future does not depend on the past.
-
In general, causal intransitivity may be due to various reasons [1]. One of the most common is causal chunking. While each causal link in the chain may seem –to some extent– plausible, the overall causal connection between the first cause and the last effect seems too weak, which, from an analytical standpoint, leads to causal intransitivity.
-
Besides that, causality is not necessarily one-way. One interesting aspect of causal relationships is the possibility of bidirectional or reciprocal causation, giving rise to feedback mechanisms.
-
Statistic Test Details
One-Tailed and Two-Tailed Tests
| Type | Details |
|---|---|
| one sample | - Involves making a “greater than” or “less than ” statement. - For example, suppose we assume the mean height of a male in the U.S. is greater than or equal to 70 inches. - The null hypothesis would be H0: µ ≥ 70 inches and the alternative hypothesis would be Ha: µ < 70 inches. |
| two sample | - Involves making an “equal to” or “not equal to” statement. - For example, suppose we assume the mean height of a male in the U.S. is equal to 70 inches. - The null hypothesis would be H0: µ = 70 inches and the alternative hypothesis would be Ha: µ ≠ 70 inches. |
One Sample t-test
- The mean of a variable is different from a reference value.
- Is used to test whether or not the mean of a population is equal to some value.
- Suppose we want to know whether or not the mean weight of a certain species of turtle in Florida is equal to 310 pounds.
- Take sample from population and measure their mean.
- It’s virtually guaranteed that the mean weight of turtles in our sample will differ from 310 pounds.
-
The question is whether or not this difference is statistically significant.
- The alternative hypothesis can be either two-tailed, left-tailed, or right-tailed:
- H1 (two-tailed): μ ≠ μ0 (population mean is not equal to some hypothesized value μ0)
- H1 (left-tailed): μ < μ0 (population mean is less than some hypothesized value μ0)
- H1 (right-tailed): μ > μ0 (population mean is greater than some hypothesized value μ0)
- assumptions:
- The variable under study should be either an interval or ratio variable.
- The observations in the sample should be independent.
- The variable under study should be approximately normally distributed.
- The variable under study should have no outliers.
Two Sample t-test
- The means of two groups are different
- is used to test whether or not the means of two populations are equal.
- Suppose we want to know whether or not the mean weight between two different species of turtles is equal.
- Take sample from population and measure their mean.
- It’s virtually guaranteed that the mean weight between the two samples will be at least a little different.
-
The question is whether or not this difference is statistically significant.
- The alternative hypothesis can be either two-tailed, left-tailed, or right-tailed:
- H1 (two-tailed): μ1 ≠ μ2 (the two population means are not equal)
- H1 (left-tailed): μ1 < μ2 (population 1 mean is less than population 2 mean)
- H1 (right-tailed): μ1> μ2 (population 1 mean is greater than population 2 mean)
- assumptions:
- The observations in one sample should be independent of the observations in the other sample.
- The data should be approximately normally distributed.
- The two samples should have approximately the same variance. If this assumption is not met, you should instead perform Welch’s t-test.
- The data in both samples was obtained using a random sampling method.
ANOVA
- The means of more than two groups are different
Proportions z-test
- The percentages of successes in a variable with two outcomes is different from a reference value
Two proportions z-test
- The percentages of successes in a variable with two outcomes is different between two groups