Exercise 4: Data Structure in R
The dataset we imported in Exercise 2: Importing data in R is a data frame. DataFrame is a structure that R uses to keep the data in that particular format. If you do class(bodydata)
for the data we have imported before, we can see data.frame
as its class. There are other data structures in R. Some basic structure that R uses are discussed below:
Vector
A vector is a one-dimensional object where you can store elements of different modes such as “logical” (TRUE or FALSE), “integer”, “numeric”, “character” etc. All elements of a vector must be of same mode. For example,
x <- c(TRUE, FALSE, FALSE, TRUE, TRUE)
y <- c("TRUE", "FALSE", "Not Sure")
z <- c(2, 3, 5, 6, 10)
Here, x
, y
and z
are of class logical, character and numeric respectively. Although in vector y
we have TRUE
and FALSE
they are in character
format. The function c
is used to define a vector. However functions that are used to create sequences also gives us a vector. For example,
(a_sequence <- seq(from = 0, to = 10, by = 2))
[1] 0 2 4 6 8 10
(b_sequence <- 1:10)
[1] 1 2 3 4 5 6 7 8 9 10
Here both a_sequence
and b_sequence
are vector. Give special attention to the way we have created the sequence of numbers. It will be useful in many situations in future exercises.
Matrix
A matrix is a two dimensional structure with row and column. As this is an extension of vector structure, matrix must have elements of same mode as in a vector. For example:
(a_matrix <- matrix(1:25, nrow = 5, ncol = 5))
[,1] [,2] [,3] [,4] [,5]
[1,] 1 6 11 16 21
[2,] 2 7 12 17 22
[3,] 3 8 13 18 23
[4,] 4 9 14 19 24
[5,] 5 10 15 20 25
(b_matrix <- diag(1:5))
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 0 0 0
[2,] 0 2 0 0 0
[3,] 0 0 3 0 0
[4,] 0 0 0 4 0
[5,] 0 0 0 0 5
Here, a_matrix
is created from a vector of sequence of 1 to 25 in 5 rows and 5 columns. We can also define a diagonal matrix as b_matrix
with numbers from 1 to 5 in its diagonal.
Array
An array is an extension of Matrix structure in three or more dimension. We can define an array as,
(an_array <- array(1:24, dim = c(2, 4, 3)))
, , 1
[,1] [,2] [,3] [,4]
[1,] 1 3 5 7
[2,] 2 4 6 8
, , 2
[,1] [,2] [,3] [,4]
[1,] 9 11 13 15
[2,] 10 12 14 16
, , 3
[,1] [,2] [,3] [,4]
[1,] 17 19 21 23
[2,] 18 20 22 24
List
All the above structure we discussed require that the the elements in them to be of same mode such as numeric
, character
and logical
. Sometimes it is necessary to keep objects of different modes in same place. List is a structure that helps in such situation. A list can contain list
, matrix
, vector
, numeric
or any other data structure as its elements. For example:
a_list <- list(
a_matrix = matrix(1:6, nrow = 2, ncol = 3),
a_vector = 2:7,
a_list = list(a = 1, b = 3:6),
a_logical = c(TRUE, FALSE, TRUE, NA)
)
a_list
$a_matrix
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
$a_vector
[1] 2 3 4 5 6 7
$a_list
$a_list$a
[1] 1
$a_list$b
[1] 3 4 5 6
$a_logical
[1] TRUE FALSE TRUE NA
In above example, a_list
contains a matrix, a numeric vector, a list and a logical vector.
Data Frame
Data Frame is a list kept in tabular structure. Every column of a data frame has a name assigned to it. The bodydata
dataset we have imported is an example of data frame. Data frame is the most used data structure to keep data in tabular format. Lets create a data frame:
a_dataframe <- data.frame(
character = c("a", "b", "c"),
numeric = 1:3,
logical = c(TRUE, FALSE, NA)
)
a_dataframe
character numeric logical
1 a 1 TRUE
2 b 2 FALSE
3 c 3 NA
Every column of a data.frame is a vector. Different columns of a data frame can contain element of different modes. For example: the first column can be a character vector while the second column can be a numeric vector as in the example above.