<sub>2025-05-29</sub> <sub>#data-visualization #data-management #r-programming #statistical-analysis #hmp669</sub>
<sup>Series: [[hmp669|HMP 669 — Data Management and Visualization]]</sup>
<sup>Topic: [[hmp669#Data Visualization Using R|Data Visualization Using R]]</sup>
# R Plotting with ggplot2
> [!abstract]- Overview
>
> ggplot2 transforms data visualization through the "grammar of graphics" - a structured approach that builds plots layer by layer like constructing a sentence.
>
> **Key Concepts**:
>
> - **Grammar of Graphics**: The foundational framework organizing every plot into required and optional components
> - **Layered Construction**: Building complexity through adding components with the `+` operator
> - **Aesthetic Mapping**: Connecting your data variables to visual elements (position, color, size, shape)
>
> **Critical Connections**:
>
> - Data drives aesthetics, aesthetics drive geometry, geometry creates the visual story
> - Required components (data + aesthetics + geometry) create the foundation; optional layers add sophistication
> - Every visual element can be customized to serve your communication goals
>
> **Must Remember**:
>
> - All ggplot2 plots follow the same basic structure: `ggplot(data, aes()) + geom_type()`
> - The `+` sign connects layers - think of it as visual grammar for "and then add..."
> - Accessibility in color choices ensures your insights reach the widest possible audience
> [!info]- Required R Packages
>
>
> |Package|Purpose|Installation|
> |---|---|---|
> |`ggplot2`|Core plotting functionality|`install.packages("ggplot2")` or `install.packages("tidyverse")`|
> |`RColorBrewer`|Professional color palettes|`install.packages("RColorBrewer")`|
> |`viridis`|Accessibility-focused color palettes|`install.packages("viridis")`|
>
> **Setup code:**
>
>
> ```r
> library(ggplot2) # Core plotting
> library(RColorBrewer) # Color palettes (optional)
> library(viridis) # Accessible colors (optional)
> ```
> [!code]- Syntax Reference
>
>
> |Command/Syntax|Purpose|Example|
> |---|---|---|
> |**Core Structure**|||
> |`ggplot(data, aes())`|Initialize plot with data and aesthetics|`ggplot(df, aes(x = age, y = height))`|
> |`+ geom_point()`|Add scatter plot layer|`+ geom_point()`|
> |`+ geom_line()`|Add line plot layer|`+ geom_line()`|
> |**Aesthetics**|||
> |`aes(x = var, y = var)`|Map variables to axes|`aes(x = age, y = height)`|
> |`aes(color = var)`|Map variable to color|`aes(color = gender)`|
> |`aes(size = var)`|Map variable to size|`aes(size = population)`|
> |**Labels & Titles**|||
> |`labs(x = "text")`|Set x-axis label|`labs(x = "Age (years)")`|
> |`labs(y = "text")`|Set y-axis label|`labs(y = "Height (cm)")`|
> |`labs(title = "text")`|Set plot title|`labs(title = "Growth Over Time")`|
> |**Limits & Scales**|||
> |`xlim(min, max)`|Set x-axis limits|`xlim(0, 100)`|
> |`ylim(min, max)`|Set y-axis limits|`ylim(0, 200)`|
> |`scale_color_manual()`|Custom colors|`scale_color_manual(values = c("red", "blue"))`|
> |`scale_color_viridis_d()`|Viridis discrete colors|`scale_color_viridis_d()`|
> |**Themes**|||
> |`theme_grey()`|Default grey theme|`+ theme_grey()`|
> |`theme_classic()`|Clean white theme|`+ theme_classic()`|
> |`theme_minimal()`|Minimal theme|`+ theme_minimal()`|
> |**Point Customization**|||
> |`geom_point(size = n)`|Point size|`geom_point(size = 3)`|
> |`geom_point(shape = n)`|Point shape (0-25)|`geom_point(shape = 22)`|
> |`geom_point(alpha = n)`|Transparency (0-1)|`geom_point(alpha = 0.7)`|
> |`geom_point(color = "color")`|Point outline color|`geom_point(color = "red")`|
> |`geom_point(fill = "color")`|Point fill color|`geom_point(fill = "blue")`|
## Why Visualization Transforms Understanding
**Data visualization isn't just about making pretty pictures** - it's about pattern recognition that our brains can't achieve through numbers alone.
Think of raw data as a conversation in a foreign language. Visualization becomes your translator, revealing stories hidden in spreadsheet cells and making complex relationships instantly comprehensible to diverse audiences.
## The Grammar of Graphics: Your Visual Language Foundation
### Understanding the Framework
The **grammar of graphics** works like sentence construction - you need certain essential elements, and you can add optional components to create more sophisticated expression.
![[r-plotting-ggplot2-1748582983995.webp]]
**Required Components** (Every plot needs these three):
1. **Data** - Your dataset, the raw material for visualization
2. **Aesthetics (aes)** - Instructions for mapping data to visual properties
3. **Geometries (geom)** - The visual layer defining your plot type
**Optional Components** (Add sophistication and customization):
- **Statistical layers (stat)** - Calculated transformations (trend lines, summaries)
- **Scales** - Control color, size, shape mappings and legends
- **Coordinates (coord)** - Define plotting space (usually Cartesian)
- **Themes** - Non-data styling (fonts, backgrounds, grid lines)
- **Facets** - Split data into multiple related plots
## Building Your First Plot: Step by Step
### The Basic Structure
Every ggplot2 plot follows this fundamental pattern:
```r
ggplot(dataset, aes(x = variable1, y = variable2)) +
geom_type()
```
Basic Plot Code:
![[r-plotting-ggplot2-1748583118143.webp]]
Let's break this down:
**`ggplot(dataset, aes())`** - Establishes your foundation
- Tells R which data to use
- Maps variables to visual properties through aesthetics
**`+ geom_type()`** - Adds the visual layer
- The `+` operator connects components (essential ggplot2 syntax)
- `geom_point()` creates scatter plots
- `geom_line()` creates line plots
- Many other geometry options available
### Adding Depth with a Third Variable
```r
ggplot(dataset, aes(x = age, y = height, color = sex)) +
geom_point()
```
**What happens here**: Points are automatically colored by sex, and ggplot2 adds a legend to help interpret the colors. This is the power of aesthetic mapping - you declare what you want, and ggplot2 handles the visual implementation.
Add color and legend:
![[r-plotting-ggplot2-1748583216600.webp]]
## Customization: Making Your Plots Communicate
### Labels and Titles
```r
ggplot(data, aes(x = age, y = height)) +
geom_point() +
labs(
x = "Age (years)",
y = "Height (cm)",
title = "Growth Patterns in Study Population",
color = "Gender" # Legend title
)
```
> [!note] Clear, descriptive labels are often the difference between a confusing graphic and an insightful one. Include units when relevant, and make titles specific enough to standalone.
Adding Labels:
![[r-plotting-ggplot2-1748583316444.webp]]
### Focusing with Axis Limits
```r
ggplot(data, aes(x = age, y = height)) +
geom_point() +
xlim(0, 10) + # Focus on children (0-10 years)
ylim(50, 150) # Relevant height range
```
**Always provide both start and end values** for axis limits. This technique helps you zoom into specific data ranges of interest.
Modify the axis:
![[r-plotting-ggplot2-1748583371489.webp]]
### Color: Your Most Powerful Communication Tool
#### Color Specification Methods
Default: Starting with red, ggplot picks colors equidistant
around color wheel
**Simple color names:**
```r
geom_point(color = "red")
```
**Hex codes** (professional standard):
```r
geom_point(color = "#FF5733") # Vibrant orange-red
```
**Accessible color palettes** (recommended):
```r
scale_color_viridis_d() # For discrete variables
scale_color_viridis_c() # For continuous variables
```
> [!warning] Accessibility Matters
> The viridis color palette is specifically designed to be distinguishable by people with various types of color vision differences. Using accessible palettes ensures your insights reach the maximum number of people.
#### Professional Color Palette Options
```r
# RColorBrewer palettes
+ scale_color_brewer(palette = "Set1")
# Manual color specification
+ scale_color_manual(values = c("red", "blue", "green"))
# Viridis (accessibility-focused)
+ scale_color_viridis_d(option = "plasma")
```
Other resources: pick your own with coolers.co or I want Hue
## Advanced Point Customization
### Within the Geometry Layer
All point-specific customizations happen within `geom_point()`:
```r
geom_point(
size = 3, # Larger points (default = 1)
shape = 22, # Filled squares
color = "darkgreen", # Outline color
fill = "lightblue", # Fill color (for applicable shapes)
alpha = 0.7 # 70% opacity
)
```
### Point Shape Reference
**Common shape numbers:**
- `16` - Filled circle (default)
- `21` - Circle with separate fill and outline
- `22` - Square with separate fill and outline
- `23` - Diamond with separate fill and outline
- `1` - Open circle
- `2` - Open triangle
![[r-plotting-ggplot2-1748583770507.webp]]
> [!tip] Transparency for Overlapping Data
> When you have many data points that overlap, `alpha = 0.5` or similar creates transparency that reveals data density - darker areas show where more points cluster together.
## Themes: Styling Beyond the Data
### Preset Themes
```r
# Default grey background
+ theme_grey()
# Clean white background, no gridlines
+ theme_classic()
# Minimal styling
+ theme_minimal()
```
### Custom Theme Adjustments
```r
+ theme(
plot.background = element_rect(fill = "white"),
plot.margin = ,
plot.title = element_text(size = 16, face = "bold"),
axis.text = element_text(size = 12)
)
```
## Real-World Application: Building a Complete Visualization
> [!example]- Example
>
> ```r
> ggplot(health_data, aes(x = age, y = biomarker_level, color = treatment_group)) +
> geom_point(size = 2.5, alpha = 0.8) +
> labs(
> x = "Age (years)",
> y = "Biomarker Level (ng/mL)",
> title = "Treatment Response by Age Group",
> subtitle = "Higher biomarker levels indicate positive response",
> color = "Treatment"
> ) +
> xlim(18, 85) +
> scale_color_viridis_d(option = "plasma") +
> theme_classic() +
> theme(
> plot.title = element_text(size = 14, face = "bold"),
> legend.position = "bottom"
> )
> ```
>
> **This code demonstrates:**
>
> - **Required components**: data, aesthetics (x, y, color), geometry (points)
> - **Labels**: Professional titles and axis labels with units
> - **Customization**: Point size, transparency, accessible colors
> - **Styling**: Clean theme with custom title formatting
## Connecting the Concepts: Your Visualization Workflow
### The Layered Approach
ggplot2 construction:
1. **Foundation** (required): "Here's my data and how it maps to visual space"
2. **Visual layer** (required): "Here's how to represent each data point"
3. **Communication layers** (optional): "Here's how to interpret what you're seeing"
4. **Aesthetic layers** (optional): "Here's how to make it beautiful and accessible"
### From Simple to Sophisticated
**Start simple**, ensure your basic plot communicates the essential relationship, then add layers that enhance understanding:
```r
# Simple foundation
ggplot(data, aes(x = var1, y = var2)) + geom_point()
# Add information
+ aes(color = var3)
# Add clarity
+ labs(title = "Clear Title", x = "X Label", y = "Y Label")
# Add accessibility
+ scale_color_viridis_d()
# Add polish
+ theme_classic()
```
--
Reference
- HMP 669