<sub>2025-05-27</sub> <sub>#data-visualization #data-management #r-programming #hmp669</sub> <sup>[[maps-of-content|๐ŸŒ Maps of Content โ€” All Notes]] </sup> <sup>Series: [[hmp669|HMP 669 โ€” Data Management and Visualization]]</sup> <sup>Topic: [[hmp669#Data Visualization Using R|Data Visualization Using R]]</sup> # R Markdown: Reproducible Reporting > [!abstract]- Learning Overview > > R Markdown transforms scattered code and results into unified, professional reports where your analysis tells a complete, verifiable story. > > **Key Concepts**: > > - **Integration**: Code + output + narrative in one file > - **Structure**: Header (YAML) + code chunks + free text > - **Knitting**: The magic button that weaves everything together > > **Critical Connections**: > > - Code chunks execute R commands while free text provides context > - The knitting process creates a fresh R session, ensuring reproducibility > - YAML header controls the final report's appearance and format > > **Must Remember**: > > - R Markdown files end in `.Rmd` (not `.R`) > - Knitting runs your entire file from scratch in a new session > - Everything needed must exist within the R Markdown file itself > [!info]- Required R Packages > > |Package|Purpose|Installation| > |---|---|---| > |`rmarkdown`|Core R Markdown functionality|`install.packages("rmarkdown")`| > |`knitr`|Document knitting engine|`install.packages("knitr")`| > > **Setup Installation:** > ```r > install.packages(c("rmarkdown", "knitr")) > install.packages("tidyverse") # Includes ggplot2 and dplyr > ``` > > **Load** > ```r > library(rmarkdown) > library(knitr) > ``` > > **Note**: Most RStudio installations include rmarkdown and knitr by default, but installing them explicitly ensures you have the latest versions. > [!code]- Syntax Reference > > | Element | Syntax | Purpose | > |---------|--------|---------| > | **File Management** ||| > | New R Markdown | File โ†’ New โ†’ R Markdown | Create new document | > | File extension | `.Rmd` | R Markdown document | > | **Header/YAML** ||| > | Header boundaries | `---` | Start and end YAML | > | Title | `title: "My Report"` | Document title | > | Output format | `output: html_document` | Final format (html/pdf/word) | > | **Code Chunks** ||| > | Basic chunk | ````{r}` code here ``` | Execute R code | > | Named chunk | ````{r chunk-name}` | Chunk with identifier | > | Hide code | ````{r echo=FALSE}` | Run but don't show code | > | Show only code | ````{r eval=FALSE}` | Display but don't run | > | **Text Formatting** ||| > | Heading 1 | `# Title` | Major section | > | Heading 2 | `## Subtitle` | Subsection | > | Bold text | `**bold**` | Emphasis | > | Italic text | `*italic*` | Light emphasis | > | Bullet list | `* item` | Unordered list | > | **Execution** ||| > | Run chunk | Green play button | Execute single chunk | > | Knit document | Knit button (yarn icon) | Generate full report | --- ## The Problem R Markdown Solves Picture this familiar scenario: You've spent hours crafting brilliant R code, generating insights that could change everything. Your supervisor asks, "Can you share those results?" You scramble through multiple files---your `.R` script here, a saved plot there, some output you copied into a Word document last week. **The disconnect is real and dangerous.** Traditional R scripts create "analytical amnesia"---your code evolves, your results scatter, and suddenly no one (including you) can confidently say which version of code produced which results. > [!warning] The Version Control Nightmare > > When code and results live separately, you risk: > > - **Accidental misalignment**: Code v4 paired with results from v2 > - **Lost reproducibility**: Can't recreate specific findings > - **Communication chaos**: Explaining results without showing the work R Markdown solves this by creating **analytical integrity**---a single document where code, output, and narrative live together in harmony. --- ## Understanding R Markdown's Architecture Think of an R Markdown document like a well-organized research paper that can execute itself. It has three distinct but interconnected components: ### The Header (YAML): Your Document's DNA ```yaml --- title: "My Analysis Report" author: "Your Name" date: "2024-01-15" output: html_document --- ``` The header is your document's **control center**, wrapped in three dashes (`---`) at the very top. Here you define: - **Identity**: Title, author, date - **Appearance**: Output format (HTML, PDF, Word) - **Behavior**: Themes, table of contents, figure sizes > [!tip] YAML stands for "Yet Another Markdown Language" > > Think of YAML as your document's blueprint---it tells R Markdown how to construct your final report before any content is processed. ### Code Chunks: Where the Magic Happens ````r ```{r analysis, echo=TRUE, warning=FALSE} library(ggplot2) summary(mtcars) ``` ```` Code chunks are **gray-backgrounded islands** of R code within your document. They're delimited by three backticks (\`\`\`) and contain: - **Language specification**: `{r}` tells R Markdown this is R code - **Chunk name**: Optional but helpful for debugging (`analysis` in the example) - **Options**: Control what appears in your final report **Key chunk options:** - `eval=TRUE/FALSE`: Run the code or just display it? - `echo=TRUE/FALSE`: Show the code in the final report? - `warning=TRUE/FALSE`: Display warnings? - `error=TRUE/FALSE`: Show errors if they occur? > [!note] The Green Play Button > > Each code chunk has a green play arrow in its top-right corner. Click it to run just that chunk---perfect for testing as you build. | Option | Default | Effect | | ------------ | -------- | --------------------------------------------------------- | | `eval` | TRUE | Whether to evaluate the code and include its results | | `echo` | TRUE | Whether to display code along with its results | | `warning` | TRUE | Whether to display warnings | | `error` | FALSE | Whether to display errors | | `message` | TRUE | Whether to display messages | | `tidy` | FALSE | Whether to reformat code in a tidy way when displaying it | | `results` | "markup" | Options: "markup", "asis", "hold", or "hide" | | `cache` | FALSE | Whether to cache results for future renders | | `comment` | "##" | Comment character to preface results with | | `fig.width` | 7 | Width in inches for plots created in chunk | | `fig.height` | 7 | Height in inches for plots created in chunk | ### Free Text: Your Narrative Voice Everything with a **white background** outside the header and code chunks is free text. This is where you: - Explain your thinking - Interpret results - Provide context and conclusions **Formatting superpowers:** - `# Heading 1` creates large section headers - `## Heading 2` creates subsections - `* Item` creates bullet points - `1. Item` creates numbered lists - Embed images, hyperlinks, even citations --- ## The Knitting Process: From Source to Story When you click the **Knit button** (the yarn ball icon): - Click Knit - Fresh R Session Starts - Read Header/YAML - Execute Code Chunks Sequentially - Combine Code + Output + Text - Generate Final Report - Save to Your Project Folder > [!warning] Fresh Session Reality > > **Critical insight**: Knitting creates a completely new R environment. This means: > > - No memory of your interactive R session > - All packages must be loaded within the R Markdown file > - All objects must be created within the document > - This is a feature, not a bug---it ensures reproducibility! **The knitting process ensures that your report is:** - **Reproducible**: Anyone can run your file and get identical results - **Transparent**: Every result is directly traceable to specific code - **Professional**: Clean, formatted output ready for sharing --- ## Real-World Application: Building Your First Report Analyzing car efficiency > [!example]- Analyzing car efficiency > File: `car_analysis.Rmd` > > ````markdown > --- > title: "Fuel Efficiency Analysis" > author: "Data Analyst" > date: "`r Sys.Date()`" > output: html_document > --- > > ## Executive Summary > > This analysis examines the relationship between car weight and fuel efficiency using the built-in `mtcars` dataset. > > ```{r setup, include=FALSE} > library(ggplot2) > library(dplyr) > ``` > > ## Data Overview > ``` > summary(mtcars[c("mpg", "wt")]) > ``` > > ## Key Finding > Cars with lower weight tend to have better fuel efficiency: > > ``` > ggplot(mtcars, aes(x = wt, y = mpg)) + > geom_point() + > geom_smooth(method = "lm") + > labs(title = "Weight vs. Fuel Efficiency", > x = "Weight (1000 lbs)", > y = "Miles per Gallon") > ``` > > ## Conclusion > > The analysis reveals a strong negative correlation between vehicle weight and fuel efficiency, suggesting that **lighter vehicles achieve better gas mileage**. > ```` > > When knitted, this becomes a polished report combining your analysis, visualizations, and insights. ## Creating Your R Markdown Workflow ### Getting Started 1. **Install requirements**: Ensure `rmarkdown` and `knitr` packages are installed 2. **Create new file**: File โ†’ New File โ†’ R Markdown 3. **Choose template**: Select output format and provide basic info 4. **Start building**: Add code chunks with Ctrl+Alt+I (or Cmd+Option+I on Mac) ### Best Practices - **One concept per chunk**: Keep code chunks focused and manageable - **Meaningful chunk names**: Use descriptive names without spaces - **Load packages early**: Put `library()` calls in an early setup chunk - **Test incrementally**: Run chunks individually before knitting the full document > [!tip] Pro Tip: The Setup Chunk > > Create an early chunk named `setup` with `include=FALSE` to load all your packages and set global options. This keeps your report clean while ensuring everything is ready. --- ## Connecting the Dots: Why This Matters R Markdown represents a **paradigm shift** from "code that produces results" to "stories told with code." This approach: **Transforms communication**: Instead of emailing separate files, you share one comprehensive document that tells the complete analytical story. **Builds trust**: Colleagues can see exactly how you arrived at your conclusions, building confidence in your findings. **Saves time**: No more recreating analyses or hunting for the "right version" of results. **Scales expertise**: Your documented process becomes a template others can learn from and build upon.