<sub>2025-04-07 Monday</sub> <sub>#r-programming #rstudio #data-science </sub> <sup>[[maps-of-content]] </sup> > [!success]- Concept Sketch: [[]] > ![[]] > [!abstract]- Quick Review > > **Core Essence**: R is a statistical computing language designed for data analysis, with RStudio serving as a powerful integrated development environment that enhances productivity. > > **Key Concepts**: > > - R (language) vs. RStudio (IDE) > - Objects, functions, and arguments > - Data types (numeric, character, logical, factor) > - Data frames as tabular data structures > - Packages as extensions of functionality > > **Must Remember**: > > - Install R first, then RStudio > - Use `<-` for assignment, not `=` > - Access columns with `dataframe$column` > - Load packages with `library(package)` > - Use `?function` or `help(function)` when stuck > > **Critical Relationships**: > > - RStudio depends on R but enhances its usability > - Functions operate on objects through arguments > - Data frames contain multiple vectors (columns) > - Packages extend base R's capabilities > [!code]- Code Reference > > |Command/Syntax|Purpose|Example| > |---|---|---| > |**RStudio Operations**||| > |Ctrl+Shift+N / Cmd+Shift+N|Create new R script|| > |Ctrl+S / Cmd+S|Save script|| > |Ctrl+Enter / Cmd+Return|Run current line|| > |Ctrl+Shift+Enter / Cmd+Shift+Return|Run entire script|| > |**Base R Commands**||| > |`<-`|Assignment operator|`x <- 10`| > |`c()`|Combine values into vector|`c(1, 2, 3, 4, 5)`| > |`class()`|Check object type|`class(x)`| > |`str()`|View object structure|`str(data_frame)`| > |`head()`|View first 6 rows|`head(data_frame)`| > |`names()`|Get column names|`names(data_frame)`| > |`length()`|Get vector length|`length(vector)`| > |`levels()`|Get factor levels|`levels(factor_variable)`| > |`ls()`|List workspace objects|`ls()`| > |**Package Management**||| > |`install.packages()`|Install R packages|`install.packages("dplyr")`| > |`library()`|Load packages|`library(dplyr)`| > |**Help Functions**||| > |`help()` or `?`|Get function documentation|`?mean`| > |`args()`|Check function arguments|`args(mean)`| # Introduction to R and RStudio: Visual Note Guide R is a powerful statistical computing language designed specifically for data analysis, while RStudio is an integrated development environment (IDE) that makes working with R more intuitive and productive. This guide will walk you through the essentials of both, organized for effective visual note-taking and learning. ## 1. R and RStudio: The Foundation ### What is R? R is an open-source language and environment developed by statisticians for data analysis. Unlike general-purpose programming languages, R was created specifically for interactive data exploration and statistical computing. > [!note] R's interactive nature is crucial for data science, allowing quick exploration of data patterns and relationships—an essential part of the analytical process. ### What is RStudio? RStudio is not R itself, but rather an IDE that provides a more user-friendly interface for writing, testing, and debugging R code. It offers features like syntax highlighting, code completion, and integrated help that significantly enhance productivity. ### Installation Process **Installation Order:** 1. Install R first (from CRAN) 2. Then install RStudio (from the RStudio website) > [!warning] You cannot use RStudio without first installing R, as RStudio relies on R to execute code. ## 2. Getting Started with RStudio ### The RStudio Interface When you first open RStudio, you'll see three primary panes: - **Left pane**: R Console (where commands are executed) - **Top-right pane**: Environment/History/Connections tabs - **Bottom-right pane**: Files/Plots/Packages/Help/Viewer tabs > [!tip] You can create a fourth pane (top-left) by opening a script file, which is highly recommended for saving your work. mermaid ```mermaid graph TD subgraph "RStudio Interface" A[Script Editor<br>Top Left] --> B[R Console<br>Bottom Left] C[Environment/History<br>Top Right] --> D[Files/Plots/Packages/Help<br>Bottom Right] end style A fill:#f9f,stroke:#333,stroke-width:2px style B fill:#bbf,stroke:#333,stroke-width:2px style C fill:#bfb,stroke:#333,stroke-width:2px style D fill:#fbb,stroke:#333,stroke-width:2px ``` ### Creating and Saving Scripts Scripts are critical for reproducible work in R, allowing you to save and rerun your analysis. **Creating a new script**: - Menu: File → New File → R Script - Keyboard: Ctrl+Shift+N (Windows) / Command+Shift+N (Mac) **Saving your script**: - Menu: File → Save - Keyboard: Ctrl+S (Windows) / Command+S (Mac) > [!tip] Use meaningful filenames that indicate the purpose of the script, and include the date in YYYY-MM-DD format for chronological sorting. ### Running Code There are multiple ways to execute R code from your script: - **Run entire script**: Click "Source" or press Ctrl+Shift+Enter (Windows) / Command+Shift+Return (Mac) - **Run a single line**: Place cursor on the line and press Ctrl+Enter (Windows) / Command+Return (Mac) - **Run selected code**: Highlight code and press Ctrl+Enter (Windows) / Command+Return (Mac) > [!note] The keyboard shortcuts (key bindings) dramatically speed up your workflow. The more you use them, the more efficient you'll become! ## 3. Core R Concepts ### Objects: The Building Blocks In R, everything you work with is an object. Objects store values that can be accessed and manipulated. **Creating objects with assignment**: ```r # Preferred assignment operator x <- 10 # Alternative (but not recommended for assignments) y = 20 ``` > [!warning] Always use `<-` for assignment rather than `=` to avoid confusion, as `=` is also used for specifying function arguments. **Viewing object contents**: Simply type the object name and press Enter, or use `print()`: ```r x # Displays: [1] 10 print(x) # Displays: [1] 10 ``` ### Functions: Doing the Work Functions are named operations that take inputs (arguments) and produce outputs. **Basic syntax**: ```r function_name(argument1, argument2, ...) ``` **Examples**: ```r # Square root function sqrt(16) # Returns: [1] 4 # Logarithm function (natural log by default) log(10) # Returns: [1] 2.302585 ``` **Nested functions**: ```r # Functions can be nested (evaluated from inside out) log(sqrt(16)) # Same as log(4), returns: [1] 1.386294 ``` > [!tip] To see what arguments a function accepts, use the `args()` function: ```r > args(log) # Displays: function (x, base = exp(1)) NULL > ``` ### Getting Help R has comprehensive built-in documentation accessible through: ```r # These all provide help for the log function help(log) ?log ``` > [!case] Learning by Exploration > > Maria is new to R and wants to understand how the `mean()` function works. She first checks the documentation: ```r > ?mean > ``` > This shows her that `mean()` takes arguments like `x` (the data), `trim` (for trimmed mean), and `na.rm` (for handling missing values). > > To see this in practice, she creates a vector and calculates its mean: ```r > test_scores <- c(85, 92, 78, 90, NA, 88) > mean(test_scores) # Returns NA due to missing value > mean(test_scores, na.rm = TRUE) # Returns 86.6 > ``` > By exploring the function with real data, Maria quickly learns how to handle missing values in calculations. ## 4. Working with Packages ### Understanding Base R vs. Packages - **Base R**: Core functionality available after installation - **Packages**: Add-on extensions that provide additional functions and datasets > [!note] The R ecosystem's strength comes from thousands of specialized packages developed by experts in various fields. ### Package Workflow **Installing packages** (done once): ```r # Installing a single package install.packages("dplyr") # Installing multiple packages install.packages(c("ggplot2", "tidyr", "readr")) ``` **Loading packages** (each session): ```r # Makes package functions available in your session library(dplyr) ``` > [!warning] If you try to load a package that isn't installed, you'll get an error: "there is no package called 'packagename'". > [!tip] Add `dependencies = TRUE` to automatically install any packages required by the package you're installing: ```r > install.packages("tidyverse", dependencies = TRUE) > ``` ## 5. Data Types and Structures ### Basic Data Types R has several fundamental data types: 1. **Numeric**: Numbers (`1`, `2.5`, `-3.14`) 2. **Character**: Text strings (`"hello"`, `"R"`, `"data"`) 3. **Logical**: Boolean values (`TRUE`, `FALSE`) 4. **Factor**: Categorical data with defined levels ```r # Checking data types class(42) # "numeric" class("hello") # "character" class(TRUE) # "logical" ``` ### Vectors: Series of Values Vectors are the simplest data structure in R, containing multiple values of the same type. ```r # Creating vectors with c() (combine) numbers <- c(1, 2, 3, 4, 5) words <- c("apple", "banana", "cherry") ``` > [!note] Even a single value in R is technically a vector of length 1. ### Data Frames: Tabular Data Data frames are table-like structures that combine multiple vectors of potentially different types. ```r # Example of a simple data frame students <- data.frame( name = c("Alice", "Bob", "Charlie"), age = c(22, 21, 23), gpa = c(3.8, 3.2, 3.9) ) ``` **Working with data frames**: ```r # View structure of a data frame str(students) # See the first 6 rows head(students) # Get column names names(students) # Access a column using $ students$name ``` ### Factors: Categorical Data Factors are special vectors used to represent categorical data with defined levels. ```r # Creating a factor sizes <- factor(c("small", "medium", "large", "medium", "small")) # Viewing levels levels(sizes) # Returns: "large" "medium" "small" ``` > [!warning] Factors may look like character vectors when printed, but they behave differently! Use `class()` to check the type. ## Summary R and RStudio provide a powerful environment for data analysis. R is the statistical computing language that performs calculations and manipulates data, while RStudio is the IDE that makes working with R more efficient. The basic workflow involves: 1. Creating objects to store data 2. Using functions to perform operations on those objects 3. Organizing data into appropriate structures (vectors, data frames) 4. Extending functionality through packages By understanding these fundamental concepts, you're well on your way to leveraging R for data analysis and visualization. > [!important] The single most important takeaway is understanding the distinct roles of R and RStudio, and how they work together to create a powerful data analysis environment: R provides the statistical computing engine, while RStudio provides the user-friendly interface for interacting with that engine. -- Reference: - Data Science, HarvardX