<sub>2025-04-07 Monday</sub> <sub>#r-programming #rstudio #data-science </sub>
<sup>[[maps-of-content]] </sup>
# Understanding Vectors in R: The Building Blocks of Data
> [!success]- Concept Sketch: [[]]
> ![[]]
> [!abstract]- Quick Review
>
> **Core Essence**: Vectors are the fundamental data structure in R that store sequences of elements of the same type, forming the building blocks for more complex data structures.
>
> **Key Concepts**:
>
> - Vectors store sequences of elements of the same data type
> - Vector creation using c(), seq(), and : operator
> - Subsetting vectors with [] using indices or names
> - Vector coercion manages data type consistency
> - Element-wise arithmetic operations
>
> **Must Remember**:
>
> - R uses 1-based indexing (first position is 1, not 0)
> - Element-wise operations apply functions to each element
> - Vector coercion follows logical → integer → numeric → character
> - Names make vectors more readable and accessible
> - All elements in a vector must be the same type
>
> **Critical Relationships**:
>
> - Vector naming connects values with their meanings
> - sort() returns sorted values while order() returns indices
> - Vector arithmetic combines vectors element-by-element
> - Missing values (NA) result when coercion fails
> [!code]- Code Reference
>
> Download
>
> |Command/Syntax|Purpose|Example|
> |---|---|---|
> |**Vector Creation**|||
> |`c()`|Concatenate elements into a vector|`c(1, 2, 3)`|
> |`seq()`|Generate a sequence|`seq(1, 10, by=2)`|
> |`:`|Create integer sequence|`1:10`|
> |`names()`|Assign/retrieve names|`names(x) <- c("a", "b", "c")`|
> |**Vector Access**|||
> |`[]`|Subset vector by position|`x[2]` or `x[c(1,3)]`|
> |`[]`|Subset vector by name|`x["name"]`|
> |**Type Conversion**|||
> |`as.numeric()`|Convert to numeric|`as.numeric(c("1", "2"))`|
> |`as.character()`|Convert to character|`as.character(1:3)`|
> |`as.logical()`|Convert to logical|`as.logical(c(0, 1))`|
> |**Sorting Functions**|||
> |`sort()`|Sort values|`sort(x)`|
> |`order()`|Get indices that would sort|`order(x)`|
> |`rank()`|Get rank of elements|`rank(x)`|
> |`max()`|Find maximum value|`max(x)`|
> |`which.max()`|Find position of maximum|`which.max(x)`|
> |`min()`|Find minimum value|`min(x)`|
> |`which.min()`|Find position of minimum|`which.min(x)`|
## Introduction to Vectors
Vectors are R's most fundamental data structure - the basic building blocks upon which more complex data structures are built. At their core, vectors are ordered collections of elements, all sharing the same data type. Whether you're working with simple numerical calculations or complex data analysis, understanding vectors is essential.
> [!note] Foundation Concept
> Almost all data structures in R - including data frames, matrices, and lists - are built using vectors as their foundation. Mastering vectors is your first step toward R proficiency.
## Creating Vectors in R
### The c() Function: Concatenating Elements
The most common way to create vectors is with the `c()` function, which stands for **concatenate**. This function combines individual elements into a single vector.
```r
# Creating a numeric vector
codes <- c(380, 124, 818)
# Creating a character vector
country <- c("italy", "canada", "egypt")
```
### Generating Sequences
R provides convenient ways to create sequential vectors:
**Using seq() function:**
```r
# Generate sequence from 1 to 10
seq(1, 10)
# Generate sequence with custom step
seq(1, 10, by=2) # Creates 1, 3, 5, 7, 9
```
**Using the colon operator:**
```r
# Shorthand for consecutive integers
1:10 # Equivalent to seq(1, 10)
```
### Naming Vector Elements
Vectors can have named elements, which enhances readability and accessibility:
**Assigning names during creation:**
```r
# Method 1: name = value syntax
codes <- c(italy = 380, canada = 124, egypt = 818)
# Method 2: "name" = value syntax
codes <- c("italy" = 380, "canada" = 124, "egypt" = 818)
```
**Assigning names after creation:**
```r
codes <- c(380, 124, 818)
country <- c("italy", "canada", "egypt")
names(codes) <- country
```
## Accessing Vector Elements (Subsetting)
Subsetting allows you to extract specific elements from a vector. R offers several approaches:
### Using Numeric Indices
R uses **1-based indexing** (unlike many programming languages that start at 0):
```r
# Access the second element
codes[2]
# Access multiple elements
codes[c(1, 3)] # First and third elements
# Access a range of elements
codes[1:2] # First two elements
```
### Using Names
If your vector has named elements, you can access them by name:
```r
# Access by single name
codes["canada"]
# Access by multiple names
codes[c("egypt", "italy")]
```
> [!tip] Meaningful Names
> Using descriptive names for vector elements makes your code more readable and helps prevent errors. Names connect the data to its real-world meaning.
## Vector Coercion: Managing Data Types
### Understanding Vector Homogeneity
A key characteristic of vectors is that all elements must be of the **same data type**. When you try to combine different types, R performs "coercion" to maintain this consistency.
### Implicit Coercion
When R encounters mixed data types in a vector, it automatically converts all elements to the most flexible type following this hierarchy:
- logical → integer → numeric → character
```r
# Mixing types results in character vector
x <- c(1, "canada", 3)
x # Results in: "1" "canada" "3"
class(x) # "character"
```
### Explicit Coercion
You can manually convert between types using the `as.*()` functions:
```r
# Convert to character
y <- as.character(1:5) # Results in: "1" "2" "3" "4" "5"
# Convert to numeric (with potential issues)
as.numeric(c("1", "b", "3")) # Results in: 1 NA 3 with a warning
```
### Handling Missing Values (NA)
When coercion fails, R introduces NA (Not Available) values:
```r
# Character "b" can't be converted to numeric
as.numeric(c("1", "b", "3")) # Results in: 1 NA 3
```
> [!warning] Coercion Caution
> Implicit coercion can lead to unexpected results. Always check your data types when combining different values, especially when reading from external sources or performing calculations.
## Sorting and Ordering Vectors
### Basic Sorting with sort()
The `sort()` function arranges vector elements in ascending order (by default):
```r
# Sort in ascending order
sort(c(5, 2, 8, 1)) # Results in: 1 2 5 8
# Sort in descending order
sort(c(5, 2, 8, 1), decreasing = TRUE) # Results in: 8 5 2 1
```
### Finding Positions with order()
The `order()` function returns the **indices** that would sort the vector:
```r
x <- c(31, 4, 15, 92, 65)
order(x) # Results in: 2 3 1 5 4
x[order(x)] # Same as sort(x): 4 15 31 65 92
```
This is particularly useful when you need to sort multiple vectors based on one vector's order.
### Determining Ranks with rank()
The `rank()` function returns the rank of each element:
```r
x <- c(31, 4, 15, 92, 65)
rank(x) # Results in: 3 1 2 5 4
```
### Finding Extremes
R provides functions to find maximum and minimum values and their positions:
```r
# Maximum value and its position
max(x) # 92
which.max(x) # 4 (4th position)
# Minimum value and its position
min(x) # 4
which.min(x) # 2 (2nd position)
```
## Vector Arithmetic: Element-wise Operations
### Basic Arithmetic
R performs arithmetic operations **element-wise** on vectors:
```r
# Element-wise addition
c(1, 2, 3) + c(10, 20, 30) # Results in: 11 22 33
# Element-wise multiplication
heights <- c(69, 62, 66, 70, 74)
heights * 2.54 # Convert inches to cm
```
When applying an operation between a vector and a single value (scalar), R applies the value to every element:
```r
# Subtract mean from each element
heights - mean(heights)
```
### Practical Applications: Calculating Rates
Element-wise operations are perfect for calculating rates and ratios:
```r
# Calculate murder rate per 100,000 people
murder_rate <- murders$total / murders$population * 100000
```
> [!example]- Case Study: Analyzing US State Murder Rates
>
> Suppose we have data about US states' populations and total murders:
>```r
> # Load necessary data
> states <- c("Alabama", "Alaska", "Arizona", "Arkansas", "California")
> population <- c(4779736, 710231, 6392017, 2915918, 37253956)
> total_murders <- c(135, 19, 232, 93, 1257)
>
> # Create named vectors
> names(population) <- states
> names(total_murders) <- states
>
> # Calculate murder rates per 100,000
> murder_rate <- total_murders / population * 100000
>
> # Find state with highest murder rate
> states[which.max(murder_rate)]
>
> # Sort states by murder rate
> states[order(murder_rate, decreasing = TRUE)]
>
>```
>
> This example demonstrates several vector concepts:
>
> 1. Creating and naming vectors
> 2. Performing element-wise calculations (murder rates)
> 3. Finding maximum values with which.max()
> 4. Sorting using order()
>
## Understanding Vector Relationships
## Summary
Vectors are the foundational building blocks in R programming. They store sequences of elements of the same data type and serve as components for more complex data structures. Understanding how to create, manipulate, and perform operations on vectors is essential for effective data analysis in R.
Key takeaways include:
- Creating vectors using c(), seq(), and the colon operator
- Accessing elements through numeric indices or names
- Understanding how R handles data types through coercion
- Sorting and ordering vectors with sort(), order(), and rank()
- Performing element-wise arithmetic operations
> [!important] Most Important Takeaway
> The true power of vectors in R comes from their ability to perform operations across entire datasets at once through element-wise operations, enabling efficient data transformation and analysis without explicit loops.
--
Reference:
- Data Science, HarvardX