Exercise 5: Exploring the data
Structure of an R-object
The first command you need to learn is str
function in order to explore any object in R. Lets apply this to our bodydata
,
str(bodydata)
'data.frame': 407 obs. of 4 variables:
$ Weight : num 65.6 80.7 72.6 78.8 74.8 86.4 78.4 62 81.6 76.6 ...
$ Height : num 174 194 186 187 182 ...
$ Age : num 21 28 23 22 21 26 27 23 21 23 ...
$ Circumference: num 71.5 83.2 77.8 80 82.5 82 76.8 68.5 77.5 81.9 ...
This output shows us that bodydata
is a data.frame
with 407 rows and 4 numeric variables - Weight, Height, Age, Circumference.
Accessing elements from R-objects
Different data structure have different way of accessing elements from them.
Extracting elements from vector
, matrix
and array
For vector, matrix and arraywe can use [
for accessing their elements. Lets create a vector, a matrix and an array as follows,
a_vector <- c("one", "two", "three", "four", "five")
a_matrix <- matrix(1:24, nrow = 3, ncol = 8)
an_array <- array(1:24, dim = c(2, 3, 4))
- Extracting element at position 3 to 5 in
a_vector
a_vector[3:5]
with givethree, four, five
, the elements at postion index 3, 4, and 5. In R, position index starts from 1.- Extracting element in rows 2, 3 and columns 2, 4, 6, 8 from
a_matrix
- This is a two dimensional structure, we give row-index and column-index inside
[
operator separated by comma as,
a_matrix[c(2, 3), c(2, 4, 6, 8)]
[,1] [,2] [,3] [,4]
[1,] 5 11 17 23
[2,] 6 12 18 24
We can also write this as,
a_matrix[2:3, seq(from = 2, to = 8, by = 2)]
[,1] [,2] [,3] [,4]
[1,] 5 11 17 23
[2,] 6 12 18 24
Here seq(from = 2, to = 8, by = 2)
create sequence even integer from 2 to 8 which is used as column index for extracting elements from a_matrix
.
Extracting first element of an_array
:
Here an_array
is an array structure of dimension three. So, we have to use three index vector inside [
operator in order to extract element from it. For instance an_array[1, 1, 1]
gives 1 as its first element of the array.
In all these structures we can only supply index of one or more dimension. For example,
a_matrix[1:2,]
where we have only mentioned the row index, will give elements in first and second row from all columns. i.e.
a_matrix[1:2, ]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 4 7 10 13 16 19 22
[2,] 2 5 8 11 14 17 20 23
Extracting elements from data.frame
and list
Lets create a data.frame
and a list
as,
a_dataframe <- data.frame(
fertilizer = c("Low", "Low", "High", "High"),
yield = c(12.5, 13.1, 15.3, 16.2)
)
a_list <- list(
facebook = data.frame(
name = c("Gareth", "Raju", "Marek", "Franchisco"),
has_profile = c(TRUE, TRUE, FALSE, TRUE)
),
twitter = c("@gareth", "@raju", "@marek", "franchisco")
)
- Extracting third and fourth row of
fertilizer
froma_dataframe
- Same as extracting elements as matrix as discussed above we can use row and column index as
a_dataframe[3:4, 1]
. We have used1
in place of column index sincefertilizer
is in first column. We can also use name instead asa_dataframe[3:4, "fertilizer"]
.
a_dataframe[3:4, "fertilizer"]
[1] High High
Levels: High Low
- Extracting first element of
a_list
- We can use
[[
for extracting elements froma_list
. For examplea_list[[1]]
will give the first element of the list. Here in our list we have two elements with namesfacebook
andtwitter
. So, we can also use their names asa_list[["facebook"]]
which is not possible if they do not have any name.
a_list[["facebook"]]
name has_profile
1 Gareth TRUE
2 Raju TRUE
3 Marek FALSE
4 Franchisco TRUE
We can also use $
operator to extract elements from named list and a data frame. For example, bodydata$Weight
extracts Weight
variable from bodydata
dataset.
View Data in RStudio
Newer version of RStudio support viewing data in different structures. To view bodydata
we have imported in Exercise 2: Importing data in R, we can use View(bodydata)
. If you have not imported the, you need to follow the exercise and import the data first. We can also click the data in “Environment” tab to view it.
Summary of data
We can compute basic descriptive summary statistics using summary
function as,
summary(bodydata)
Weight Height Age Circumference
Min. : 42.0 Min. :150 Min. :18.0 Min. : 57.9
1st Qu.: 58.5 1st Qu.:164 1st Qu.:23.0 1st Qu.: 68.0
Median : 68.6 Median :171 Median :27.0 Median : 75.6
Mean : 69.2 Mean :171 Mean :29.9 Mean : 76.9
3rd Qu.: 78.8 3rd Qu.:178 3rd Qu.:35.0 3rd Qu.: 84.3
Max. :108.6 Max. :198 Max. :67.0 Max. :113.2
Dimension of data
The number of elements in a data structure like vector and list we can use length
function. For example: if we extract Weight
variable from bodydata
we will get a numeric vector. The length of this vector is,
length(bodydata$Weight)
A multi-dimensional data structure like matrix, array and data frame has dimension. We can use dim
function to find the dimension.
dim(bodydata)
[1] 407 4
Here, the first and second item refers to the number of rows and number of columns of bodydata. Similarly, we can use nrow(bodydata)
and ncol(bodydata)
to obtain these number individually.
Lets Practice
- Take a look at the top 5 rows of
bodydata
- Take a look at the top 5 rows of
Height
andCircumference
variables ofbodydata
- Apply
summary
function onAge
variable ofbodydata