<sub>2025-04-07 Monday</sub> <sub>#r-programming #rstudio #data-science </sub>
<sup>[[maps-of-content]] </sup>
# R Programming Fundamentals: Decision-Making & Reusable Code
> [!success]- Concept Sketch: [[]]
> ![[]]
> [!abstract]- Quick Review
>
> **Core Essence**: R is both a powerful data analysis environment and a programming language with conditional expressions, functions, and loops that form the foundation of efficient code.
>
> **Key Concepts**:
>
> - Conditional expressions (if-else, ifelse())
> - User-defined functions
> - For loops and their alternatives
> - Logical operations (any(), all())
>
> **Must Remember**:
>
> - The ifelse() function works element-wise on vectors
> - Variables created inside functions have local scope
> - For loops are fundamental but not always the most efficient choice in R
> - R's "apply family" functions often provide better alternatives to loops
>
> **Critical Relationships**:
>
> - if-else statements for single conditions vs. ifelse() for vectorized operations
> - Functions encapsulate reusable code and create local variable environments
> - Loops provide iteration but apply functions often offer more elegant solutions
> [!code]- Code Reference
>
> Download
>
> |Command/Syntax|Purpose|Example|
> |---|---|---|
> |**Conditional Expressions**|||
> |`if (condition) {...} else {...}`|Execute code based on a condition|`if (x > 0) { "positive" } else { "non-positive" }`|
> |`ifelse(test, yes, no)`|Vectorized conditional|`ifelse(x > 0, "positive", "non-positive")`|
> |`any(logical_vector)`|TRUE if any element is TRUE|`any(c(FALSE, TRUE, FALSE))` # TRUE|
> |`all(logical_vector)`|TRUE if all elements are TRUE|`all(c(TRUE, TRUE, FALSE))` # FALSE|
> |**Functions**|||
> |`function_name <- function(args) {...}`|Define a function|`square <- function(x) { x^2 }`|
> |**For Loops**|||
> |`for (var in sequence) {...}`|Iterate over a sequence|`for (i in 1:5) { print(i) }`|
> |**Apply Family**|||
> |`lapply(list, function)`|Apply function to each list element|`lapply(list(1:3, 4:6), sum)`|
> |`sapply(list, function)`|Same as lapply with simplified result|`sapply(list(1:3, 4:6), sum)`|
> |`apply(matrix, margin, function)`|Apply to rows (1) or columns (2)|`apply(matrix(1:6, 2, 3), 1, sum)`|
> |`tapply(vector, index, function)`|Apply to subsets of a vector|`tapply(1:10, rep(1:2, 5), sum)`|
> |`mapply(function, ...)`|Apply function with multiple arguments|`mapply(rep, 1:3, 3:1)`|
## Introduction to R's Dual Nature
R serves two complementary purposes that make it particularly valuable for data scientists and analysts. Understanding this dual nature helps frame how we approach learning the language.
### Data Analysis Environment
R excels as a tool for working with data, allowing users to:
- Perform exploratory data analysis efficiently
- Build robust data analysis pipelines
- Create compelling data visualizations
> [!note] While our primary focus is using R for data analysis, the programming concepts we're learning serve as essential building blocks for these analytical tasks.
### Programming Language
Beyond its analytical capabilities, R is a full-featured programming language that enables:
- Creation of custom functions tailored to specific needs
- Development of complex packages that extend R's capabilities
- Contribution to the language itself (for advanced users)
The programming concepts we'll explore—**conditionals**, **functions**, and **loops**—form the foundation of both basic data analysis and more advanced programming.
## Conditional Expressions: Making Decisions in Code
Conditional expressions allow your code to make decisions based on whether conditions are true or false. R provides two main approaches to conditional logic.
### If-Else Statements
The if-else statement is a fundamental construct that creates branching paths in your code.
The basic syntax:
```r
if (condition) {
# Code to execute if condition is TRUE
} else {
# Code to execute if condition is FALSE
}
```
For example, to print the reciprocal of a number unless that number is zero:
```r
a <- 5
if (a != 0) {
print(1/a)
} else {
print("No reciprocal for zero")
}
# Output: 0.2
```
> [!tip] The else part is optional. You can use just `if` when you only need to execute code when the condition is true.
### The ifelse() Function
While if-else statements work for single values, the `ifelse()` function is designed to work with vectors, applying the condition to each element.
```r
ifelse(test, yes, no)
```
Where:
- `test` is a logical vector
- `yes` is the value to return when test is TRUE
- `no` is the value to return when test is FALSE
This is particularly useful for data wrangling operations.
```mermaid
graph TD
A[ifelse function] --> B[test: logical vector]
A --> C[yes: values if TRUE]
A --> D[no: values if FALSE]
B --> E[Returns vector with values from yes or no]
```
#### Example: Handling Missing Values
```r
data_vector <- c(10, NA, 30, NA, 50)
clean_data <- ifelse(is.na(data_vector), 0, data_vector)
# Result: c(10, 0, 30, 0, 50)
```
> [!warning] While `ifelse()` is convenient, it can sometimes be slower than vectorized alternatives for large datasets.
### Logical Aggregation: any() and all()
These functions help evaluate multiple logical conditions:
- **any()**: Returns TRUE if at least one element in the vector is TRUE
- **all()**: Returns TRUE only if every element in the vector is TRUE
```r
x <- c(TRUE, FALSE, TRUE)
any(x) # TRUE
all(x) # FALSE
```
> [!note] These functions are particularly useful for data validation and conditional filtering.
## Functions: Creating Reusable Code
Functions allow you to package code that performs specific tasks, making your work more organized and efficient.
### Anatomy of a Function
```r
function_name <- function(arg1, arg2, ...) {
# Function body: code that performs operations
# The last expression evaluated is returned
result
}
```
### Key Function Characteristics
1. **Functions are objects** in R and must be assigned a name
2. **Local scope**: Variables defined inside a function exist only during function execution
3. **Arguments** can have default values
4. **Return values** are typically the last expression evaluated
### Example: Creating a Simple Function
```r
# Function to calculate the square of a number
square <- function(x) {
x^2
}
# Using the function
square(4) # Returns 16
```
### Function with Multiple Arguments
```r
# Function with multiple arguments and a default value
calculate_area <- function(length, width = 1) {
length * width
}
calculate_area(5) # Returns 5 (using default width)
calculate_area(5, 3) # Returns 15
```
> [!tip] Well-designed functions should do one thing well and have a clear purpose, making your code more readable and maintainable.
## For Loops: Repetitive Operations
For loops allow you to execute code repeatedly, with each iteration using a different value from a sequence.
### Basic Structure
```r
for (variable in sequence) {
# Code to execute in each iteration
}
```
The loop variable takes on each value in the sequence during successive iterations.
### Example: Summing Integers
```r
# Calculate the sum of integers from 1 to n
sum_to_n <- function(n) {
total <- 0
for (i in 1:n) {
total <- total + i
}
return(total)
}
sum_to_n(5) # Returns 15 (1+2+3+4+5)
```
### Loop Behavior Notes
- After the loop completes, the loop variable retains its last value
- Loops can be nested inside each other for more complex iterations
- You can use `break` to exit a loop early or `next` to skip to the next iteration
> [!case]- Case Application: Data Transformation
>
> Imagine you have a list of data frames, each containing temperature readings for different cities, and you need to convert all temperatures from Celsius to Fahrenheit.
>```r
> # List of city temperature data frames
> city_temps <- list(
> city1 = data.frame(day = 1:5, temp_c = c(22, 24, 21, 25, 23)),
> city2 = data.frame(day = 1:5, temp_c = c(18, 20, 19, 17, 21)),
> city3 = data.frame(day = 1:5, temp_c = c(27, 28, 30, 29, 26))
> )
>
> # Using a for loop
> for (i in 1:length(city_temps)) {
> city_temps[[i]]$temp_f <- city_temps[[i]]$temp_c * 9/5 + 32
> }
>
> # Alternative using an apply function (more R-like approach)
> city_temps <- lapply(city_temps, function(df) {
> df$temp_f <- df$temp_c * 9/5 + 32
> return(df)
> })
>
>```
> This example shows both a for loop approach and the more R-idiomatic apply function approach to solve the same problem.
## Beyond Loops: The Apply Family
While for loops are fundamental to programming, R offers more elegant and often more efficient alternatives through the "apply family" of functions.
### Key Apply Functions
- **apply()**: Apply a function to rows or columns of a matrix
- **lapply()**: Apply a function to each element of a list
- **sapply()**: Same as lapply but simplifies the result when possible
- **tapply()**: Apply a function to subsets of a vector
- **mapply()**: Multivariate version of sapply
mermaid
```mermaid
graph LR
A[Apply Family] --> B[apply: matrices/arrays]
A --> C[lapply: lists]
A --> D[sapply: simplified lapply]
A --> E[tapply: grouped data]
A --> F[mapply: multiple arguments]
```
### Example: Comparing Loop vs. Apply
```r
# Data: List of numeric vectors
number_list <- list(a = 1:5, b = 6:10, c = 11:15)
# Using a for loop
results_loop <- numeric(length(number_list))
for (i in 1:length(number_list)) {
results_loop[i] <- sum(number_list[[i]])
}
# Using sapply
results_apply <- sapply(number_list, sum)
# Both produce the same result: c(15, 40, 65)
```
> [!tip] The apply family of functions often leads to code that is:
>
> - More concise and readable
> - Less prone to errors
> - Potentially more efficient
> - More aligned with R's functional programming nature
## Summary: Putting It All Together
R's power comes from combining its data analysis capabilities with solid programming constructs:
1. **Conditional expressions** allow your code to make decisions based on data conditions
2. **Functions** help you organize, reuse, and modularize your code
3. **Loops and apply functions** provide ways to perform repetitive operations efficiently
These foundational programming concepts are essential not just for writing R code, but for building robust, efficient data analysis workflows.
> [!note] While we've covered the basics, these concepts become even more powerful when combined. For example, you might create a function that uses conditional logic inside a loop, or use an apply function with a custom function that has its own conditional expressions.
### The Most Important Takeaway
**R's programming features are practical tools that make everyday data analysis more efficient, reproducible, and maintainable. Master these basics to transform your approach to data problems.**
--
Reference:
- Data Science, HarvardX