<sub>2025-04-07 Monday</sub> <sub>#r-programming #rstudio #data-science </sub> <sup>[[maps-of-content]] </sup> # R Programming Fundamentals: Decision-Making & Reusable Code > [!success]- Concept Sketch: [[]] > ![[]] > [!abstract]- Quick Review > > **Core Essence**: R is both a powerful data analysis environment and a programming language with conditional expressions, functions, and loops that form the foundation of efficient code. > > **Key Concepts**: > > - Conditional expressions (if-else, ifelse()) > - User-defined functions > - For loops and their alternatives > - Logical operations (any(), all()) > > **Must Remember**: > > - The ifelse() function works element-wise on vectors > - Variables created inside functions have local scope > - For loops are fundamental but not always the most efficient choice in R > - R's "apply family" functions often provide better alternatives to loops > > **Critical Relationships**: > > - if-else statements for single conditions vs. ifelse() for vectorized operations > - Functions encapsulate reusable code and create local variable environments > - Loops provide iteration but apply functions often offer more elegant solutions > [!code]- Code Reference > > Download > > |Command/Syntax|Purpose|Example| > |---|---|---| > |**Conditional Expressions**||| > |`if (condition) {...} else {...}`|Execute code based on a condition|`if (x > 0) { "positive" } else { "non-positive" }`| > |`ifelse(test, yes, no)`|Vectorized conditional|`ifelse(x > 0, "positive", "non-positive")`| > |`any(logical_vector)`|TRUE if any element is TRUE|`any(c(FALSE, TRUE, FALSE))` # TRUE| > |`all(logical_vector)`|TRUE if all elements are TRUE|`all(c(TRUE, TRUE, FALSE))` # FALSE| > |**Functions**||| > |`function_name <- function(args) {...}`|Define a function|`square <- function(x) { x^2 }`| > |**For Loops**||| > |`for (var in sequence) {...}`|Iterate over a sequence|`for (i in 1:5) { print(i) }`| > |**Apply Family**||| > |`lapply(list, function)`|Apply function to each list element|`lapply(list(1:3, 4:6), sum)`| > |`sapply(list, function)`|Same as lapply with simplified result|`sapply(list(1:3, 4:6), sum)`| > |`apply(matrix, margin, function)`|Apply to rows (1) or columns (2)|`apply(matrix(1:6, 2, 3), 1, sum)`| > |`tapply(vector, index, function)`|Apply to subsets of a vector|`tapply(1:10, rep(1:2, 5), sum)`| > |`mapply(function, ...)`|Apply function with multiple arguments|`mapply(rep, 1:3, 3:1)`| ## Introduction to R's Dual Nature R serves two complementary purposes that make it particularly valuable for data scientists and analysts. Understanding this dual nature helps frame how we approach learning the language. ### Data Analysis Environment R excels as a tool for working with data, allowing users to: - Perform exploratory data analysis efficiently - Build robust data analysis pipelines - Create compelling data visualizations > [!note] While our primary focus is using R for data analysis, the programming concepts we're learning serve as essential building blocks for these analytical tasks. ### Programming Language Beyond its analytical capabilities, R is a full-featured programming language that enables: - Creation of custom functions tailored to specific needs - Development of complex packages that extend R's capabilities - Contribution to the language itself (for advanced users) The programming concepts we'll explore—**conditionals**, **functions**, and **loops**—form the foundation of both basic data analysis and more advanced programming. ## Conditional Expressions: Making Decisions in Code Conditional expressions allow your code to make decisions based on whether conditions are true or false. R provides two main approaches to conditional logic. ### If-Else Statements The if-else statement is a fundamental construct that creates branching paths in your code. The basic syntax: ```r if (condition) { # Code to execute if condition is TRUE } else { # Code to execute if condition is FALSE } ``` For example, to print the reciprocal of a number unless that number is zero: ```r a <- 5 if (a != 0) { print(1/a) } else { print("No reciprocal for zero") } # Output: 0.2 ``` > [!tip] The else part is optional. You can use just `if` when you only need to execute code when the condition is true. ### The ifelse() Function While if-else statements work for single values, the `ifelse()` function is designed to work with vectors, applying the condition to each element. ```r ifelse(test, yes, no) ``` Where: - `test` is a logical vector - `yes` is the value to return when test is TRUE - `no` is the value to return when test is FALSE This is particularly useful for data wrangling operations. ```mermaid graph TD A[ifelse function] --> B[test: logical vector] A --> C[yes: values if TRUE] A --> D[no: values if FALSE] B --> E[Returns vector with values from yes or no] ``` #### Example: Handling Missing Values ```r data_vector <- c(10, NA, 30, NA, 50) clean_data <- ifelse(is.na(data_vector), 0, data_vector) # Result: c(10, 0, 30, 0, 50) ``` > [!warning] While `ifelse()` is convenient, it can sometimes be slower than vectorized alternatives for large datasets. ### Logical Aggregation: any() and all() These functions help evaluate multiple logical conditions: - **any()**: Returns TRUE if at least one element in the vector is TRUE - **all()**: Returns TRUE only if every element in the vector is TRUE ```r x <- c(TRUE, FALSE, TRUE) any(x) # TRUE all(x) # FALSE ``` > [!note] These functions are particularly useful for data validation and conditional filtering. ## Functions: Creating Reusable Code Functions allow you to package code that performs specific tasks, making your work more organized and efficient. ### Anatomy of a Function ```r function_name <- function(arg1, arg2, ...) { # Function body: code that performs operations # The last expression evaluated is returned result } ``` ### Key Function Characteristics 1. **Functions are objects** in R and must be assigned a name 2. **Local scope**: Variables defined inside a function exist only during function execution 3. **Arguments** can have default values 4. **Return values** are typically the last expression evaluated ### Example: Creating a Simple Function ```r # Function to calculate the square of a number square <- function(x) { x^2 } # Using the function square(4) # Returns 16 ``` ### Function with Multiple Arguments ```r # Function with multiple arguments and a default value calculate_area <- function(length, width = 1) { length * width } calculate_area(5) # Returns 5 (using default width) calculate_area(5, 3) # Returns 15 ``` > [!tip] Well-designed functions should do one thing well and have a clear purpose, making your code more readable and maintainable. ## For Loops: Repetitive Operations For loops allow you to execute code repeatedly, with each iteration using a different value from a sequence. ### Basic Structure ```r for (variable in sequence) { # Code to execute in each iteration } ``` The loop variable takes on each value in the sequence during successive iterations. ### Example: Summing Integers ```r # Calculate the sum of integers from 1 to n sum_to_n <- function(n) { total <- 0 for (i in 1:n) { total <- total + i } return(total) } sum_to_n(5) # Returns 15 (1+2+3+4+5) ``` ### Loop Behavior Notes - After the loop completes, the loop variable retains its last value - Loops can be nested inside each other for more complex iterations - You can use `break` to exit a loop early or `next` to skip to the next iteration > [!case]- Case Application: Data Transformation > > Imagine you have a list of data frames, each containing temperature readings for different cities, and you need to convert all temperatures from Celsius to Fahrenheit. >```r > # List of city temperature data frames > city_temps <- list( > city1 = data.frame(day = 1:5, temp_c = c(22, 24, 21, 25, 23)), > city2 = data.frame(day = 1:5, temp_c = c(18, 20, 19, 17, 21)), > city3 = data.frame(day = 1:5, temp_c = c(27, 28, 30, 29, 26)) > ) > > # Using a for loop > for (i in 1:length(city_temps)) { > city_temps[[i]]$temp_f <- city_temps[[i]]$temp_c * 9/5 + 32 > } > > # Alternative using an apply function (more R-like approach) > city_temps <- lapply(city_temps, function(df) { > df$temp_f <- df$temp_c * 9/5 + 32 > return(df) > }) > >``` > This example shows both a for loop approach and the more R-idiomatic apply function approach to solve the same problem. ## Beyond Loops: The Apply Family While for loops are fundamental to programming, R offers more elegant and often more efficient alternatives through the "apply family" of functions. ### Key Apply Functions - **apply()**: Apply a function to rows or columns of a matrix - **lapply()**: Apply a function to each element of a list - **sapply()**: Same as lapply but simplifies the result when possible - **tapply()**: Apply a function to subsets of a vector - **mapply()**: Multivariate version of sapply mermaid ```mermaid graph LR A[Apply Family] --> B[apply: matrices/arrays] A --> C[lapply: lists] A --> D[sapply: simplified lapply] A --> E[tapply: grouped data] A --> F[mapply: multiple arguments] ``` ### Example: Comparing Loop vs. Apply ```r # Data: List of numeric vectors number_list <- list(a = 1:5, b = 6:10, c = 11:15) # Using a for loop results_loop <- numeric(length(number_list)) for (i in 1:length(number_list)) { results_loop[i] <- sum(number_list[[i]]) } # Using sapply results_apply <- sapply(number_list, sum) # Both produce the same result: c(15, 40, 65) ``` > [!tip] The apply family of functions often leads to code that is: > > - More concise and readable > - Less prone to errors > - Potentially more efficient > - More aligned with R's functional programming nature ## Summary: Putting It All Together R's power comes from combining its data analysis capabilities with solid programming constructs: 1. **Conditional expressions** allow your code to make decisions based on data conditions 2. **Functions** help you organize, reuse, and modularize your code 3. **Loops and apply functions** provide ways to perform repetitive operations efficiently These foundational programming concepts are essential not just for writing R code, but for building robust, efficient data analysis workflows. > [!note] While we've covered the basics, these concepts become even more powerful when combined. For example, you might create a function that uses conditional logic inside a loop, or use an apply function with a custom function that has its own conditional expressions. ### The Most Important Takeaway **R's programming features are practical tools that make everyday data analysis more efficient, reproducible, and maintainable. Master these basics to transform your approach to data problems.** -- Reference: - Data Science, HarvardX