biostats-part-5-hypothesis-testing

2025-04-09 Wednesday #biostatistics [[maps-of-content]] # Statistical Hypothesis Testing: Making Decisions from Data > [!success]- Concept Sketch: [[]] > ![[]] > [!abstract]- Quick Review > > **Core Essence**: Hypothesis testing is a systematic approach to making decisions from data that acknowledges the inevitable risk of errors, using p-values to measure evidence against a starting assumption (null hypothesis). > > **Key Concepts**: > > - Null hypothesis (H₀) vs Alternative hypothesis (H₁) > - Type I errors (false positives) vs Type II errors (false negatives) > - p-values as measures of evidence strength against the null > > **Must Remember**: > > - We never "prove" hypotheses true; we only decide whether to reject the null > - p-values below 0.05 traditionally indicate enough evidence to reject the null > - Statistical significance doesn't guarantee practical importance > > **Critical Relationships**: > > - Smaller p-values = stronger evidence against the null hypothesis > - Type I error rate (α) determines the threshold for rejecting the null > - Statistical decisions always carry some risk of reaching incorrect conclusions ## Introduction to Hypothesis Testing Hypothesis testing provides a structured framework for drawing conclusions from data while acknowledging the inherent uncertainty in sampling. At its core, this process involves formulating competing claims about a population parameter, collecting evidence through sampling, and using statistical methods to determine which claim is better supported by the data. The beauty of hypothesis testing lies in its systematic approach to uncertainty—it doesn't eliminate the possibility of error, but rather quantifies and manages it in a rigorous way. ### The Foundation: Two Competing Hypotheses **The null hypothesis (H₀)** is our starting assumption or status quo position. It typically represents: - No effect - No difference - A specific value for a parameter **The alternative hypothesis (H₁ or HA)** represents all other possibilities not specified in the null hypothesis. For example: - H₀: The mean number of cavities in children living in poverty by age 6 is 4 - H₁: The mean number of cavities in children living in poverty by age 6 is not 4 > [!note] Key Insight We never set out to prove the null hypothesis true. Rather, we attempt to gather enough evidence to reject it in favor of the alternative. ## The Court System Analogy To understand hypothesis testing intuitively, consider the parallel with the U.S. criminal justice system: |Criminal Justice System|Statistical Hypothesis Testing| |---|---| |Defendant is presumed innocent|Null hypothesis is presumed true| |Prosecution must prove guilt|Data must provide evidence against the null| |"Beyond reasonable doubt" standard|Significance level (α) threshold| |Verdict: "Guilty" or "Not guilty"|Decision: "Reject H₀" or "Fail to reject H₀"| |False conviction|Type I error| |Letting guilty person go free|Type II error| > [!tip] Mental Framework > Just as a jury never declares someone "innocent" (only "not guilty"), in statistics we never "accept" the null hypothesis—we only "fail to reject" it. This subtle distinction acknowledges that absence of evidence is not evidence of absence. ## The Two Types of Statistical Errors When making decisions from data, there are two possible ways to be wrong: ### Type I Error (False Positive) **Definition**: Rejecting the null hypothesis when it is actually true. - Symbol: α (alpha) - Traditionally acceptable rate: 0.05 (5%) - Example: Concluding a medical treatment works when it actually doesn't - Court analogy: Convicting an innocent person ### Type II Error (False Negative) **Definition**: Failing to reject the null hypothesis when it is actually false. - Symbol: β (beta) - Traditionally acceptable rate: 0.20 (20%) - Example: Concluding a medical treatment doesn't work when it actually does - Court analogy: Letting a guilty person go free > [!warning] Important Reality > Error rates are never exactly zero. We are always potentially making incorrect conclusions from our data and can never prove anything with 100% certainty. ## p-values: Measuring Evidence Against the Null A p-value is a fundamental concept that quantifies the strength of evidence against the null hypothesis. ### What is a p-value? **Definition**: The probability of observing data as extreme as, or more extreme than, what we actually observed, _assuming the null hypothesis is true_. - Range: Always between 0 and 1 - Small p-values (typically ≤ 0.05): Strong evidence against the null - Large p-values (> 0.05): Insufficient evidence against the null ### Interpreting p-values - **p ≤ 0.05**: Traditionally considered statistically significant; we reject the null hypothesis - **p > 0.05**: Not statistically significant; we fail to reject the null hypothesis - **p between 0.01 and 0.05**: Strong evidence against the null - **p < 0.01**: Very strong evidence against the null > [!note] Continuum of Evidence p-values should be interpreted along a continuum of evidence strength, not just as a binary "significant/not significant" decision. ### Conceptual Calculation of p-values While the exact calculation varies by test type, the general approach involves: 1. Calculate a test statistic (e.g., t-statistic) from your sample data 2. Compare this statistic to what would be expected under the null hypothesis 3. Determine how unusual your result is under the null hypothesis For example, with a t-test: - The t-statistic measures the difference between your sample mean and the null hypothesis value, relative to the variability in your data - If |t| > 2, the corresponding p-value is typically < 0.05 ```mermaid flowchart TD A[Collect Sample Data] --> B[Calculate Test Statistic] B --> C[Compare to Distribution Under H₀] C --> D[Calculate p-value] D --> E{p ≤ α?} E -->|Yes| F[Reject H₀] E -->|No| G[Fail to Reject H₀] ``` ## Common Misconceptions about Hypothesis Testing > [!warning] Beware of These Fallacies > These misconceptions lead to serious errors in scientific interpretation and decision-making. ### Fallacy 1: Failing to reject means the null is true **Reality**: We may simply lack sufficient evidence (e.g., due to small sample size) to detect an effect. ### Fallacy 2: The p-value is the probability that the null is true **Reality**: The p-value is the probability of observing your data (or more extreme) assuming the null is true, not the other way around. ### Fallacy 3: A p-value < 0.05 proves the alternative is true **Reality**: A small p-value provides evidence against the null, but doesn't eliminate the possibility of a Type I error. ### Fallacy 4: Small p-values indicate large effects **Reality**: p-values depend on both effect size AND sample size. A tiny effect can yield a small p-value with a large enough sample. ### Fallacy 5: Data can prove a theory true or false **Reality**: Data can only support or refute a theory to a certain degree; 100% certainty is never achieved. ### Fallacy 6: Statistical significance equals practical importance **Reality**: A result can be statistically significant but too small to matter in real-world applications. ## Practical Application: Making Decisions from Data > [!example]- Case Application: Testing a New Teaching Method > > **Scenario**: Researchers want to determine if a new teaching method improves test scores compared to the traditional method. > > **Hypotheses**: > > - H₀: The mean test score with the new method equals the mean with the traditional method > - H₁: The mean test score with the new method differs from the mean with the traditional method > > **Study**: 40 students are randomly assigned to either the new or traditional method, and all take the same test afterward. > > **Results**: The new method group scored an average of 5 points higher. Statistical analysis yields p = 0.03. > > **Decision**: Since p < 0.05, we reject the null hypothesis and conclude there is a statistically significant difference between the teaching methods. > > **Proper Interpretation**: "The data provide strong evidence that the new teaching method produces different test scores compared to the traditional method (p = 0.03)." > > **Improper Interpretation**: "We've proven the new method is better" or "There's only a 3% chance that the traditional method is just as good." > > **Additional Considerations**: > > - Is a 5-point difference practically meaningful in an educational context? > - What was the variability in scores within each group? > - Would the effect persist in different student populations? ## Summary: The Essence of Hypothesis Testing Hypothesis testing provides a structured framework for making decisions from data while acknowledging and quantifying the uncertainty inherent in sampling. The process involves: 1. Formulating null and alternative hypotheses 2. Collecting data from a sample 3. Calculating a test statistic and corresponding p-value 4. Making a decision based on the p-value and significance level 5. Interpreting results with an understanding of possible errors > [!tip] Most Important Takeaway > **Statistical hypothesis testing doesn't eliminate uncertainty—it quantifies and manages it.** Every statistical decision carries some risk of error, but the framework allows us to make those decisions in a systematic, transparent way while controlling the probability of different types of mistakes. > [!visual]- Sketch idea > > **Core Concept**: The Complete Hypothesis Testing Framework **Full Description**: The full process of hypothesis testing from formulating hypotheses to making and interpreting decisions. "The Statistical Decision Journey" > > 1. "Question" (starting point) > 2. "Form H₀ and H₁" (fork in the road) > 3. "Collect Data" (gathering symbols) > 4. "Calculate Test Statistic & p-value" (mathematical symbols) > 5. "Decision Point" (another fork: p ≤ α or p > α) > 6. Two paths leading to "Reject H₀" or "Fail to Reject H₀" > 7. Both paths converge to "Interpret with Caution" (finish line) Add small notes about potential errors at each decision point. -- Reference: - Biostats