Exercise 5: Exploring the data

Structure of an R-object

The first command you need to learn is str function in order to explore any object in R. Lets apply this to our bodydata,

str(bodydata)
'data.frame':   407 obs. of  4 variables:
 $ Weight       : num  65.6 80.7 72.6 78.8 74.8 86.4 78.4 62 81.6 76.6 ...
 $ Height       : num  174 194 186 187 182 ...
 $ Age          : num  21 28 23 22 21 26 27 23 21 23 ...
 $ Circumference: num  71.5 83.2 77.8 80 82.5 82 76.8 68.5 77.5 81.9 ...

This output shows us that bodydata is a data.frame with 407 rows and 4 numeric variables - Weight, Height, Age, Circumference.

Accessing elements from R-objects

Different data structure have different way of accessing elements from them.

Extracting elements from vector, matrix and array

For vector, matrix and arraywe can use [ for accessing their elements. Lets create a vector, a matrix and an array as follows,

a_vector <- c("one", "two", "three", "four", "five")
a_matrix <- matrix(1:24, nrow = 3, ncol = 8)
an_array <- array(1:24, dim = c(2, 3, 4))
Extracting element at position 3 to 5 in a_vector
a_vector[3:5] with give three, four, five, the elements at postion index 3, 4, and 5. In R, position index starts from 1.
Extracting element in rows 2, 3 and columns 2, 4, 6, 8 from a_matrix
This is a two dimensional structure, we give row-index and column-index inside [ operator separated by comma as,
a_matrix[c(2, 3), c(2, 4, 6, 8)]
     [,1] [,2] [,3] [,4]
[1,]    5   11   17   23
[2,]    6   12   18   24

We can also write this as,

a_matrix[2:3, seq(from = 2, to = 8, by = 2)]
     [,1] [,2] [,3] [,4]
[1,]    5   11   17   23
[2,]    6   12   18   24

Here seq(from = 2, to = 8, by = 2) create sequence even integer from 2 to 8 which is used as column index for extracting elements from a_matrix.

Extracting first element of an_array: Here an_array is an array structure of dimension three. So, we have to use three index vector inside [ operator in order to extract element from it. For instance an_array[1, 1, 1] gives 1 as its first element of the array.

In all these structures we can only supply index of one or more dimension. For example, a_matrix[1:2,] where we have only mentioned the row index, will give elements in first and second row from all columns. i.e.

a_matrix[1:2, ]
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    1    4    7   10   13   16   19   22
[2,]    2    5    8   11   14   17   20   23

Extracting elements from data.frame and list

Lets create a data.frame and a list as,

a_dataframe <- data.frame(
  fertilizer = c("Low", "Low", "High", "High"),
  yield = c(12.5, 13.1, 15.3, 16.2)
)
a_list <- list(
  facebook = data.frame(
    name = c("Gareth", "Raju", "Marek", "Franchisco"),
    has_profile = c(TRUE, TRUE, FALSE, TRUE)
  ),
  twitter = c("@gareth", "@raju", "@marek", "franchisco")
)
Extracting third and fourth row of fertilizer from a_dataframe
Same as extracting elements as matrix as discussed above we can use row and column index as a_dataframe[3:4, 1]. We have used 1 in place of column index since fertilizer is in first column. We can also use name instead as a_dataframe[3:4, "fertilizer"].
a_dataframe[3:4, "fertilizer"]
[1] High High
Levels: High Low
Extracting first element of a_list
We can use [[ for extracting elements from a_list. For example a_list[[1]] will give the first element of the list. Here in our list we have two elements with names facebook and twitter. So, we can also use their names as a_list[["facebook"]] which is not possible if they do not have any name.
a_list[["facebook"]]
        name has_profile
1     Gareth        TRUE
2       Raju        TRUE
3      Marek       FALSE
4 Franchisco        TRUE

We can also use $ operator to extract elements from named list and a data frame. For example, bodydata$Weight extracts Weight variable from bodydata dataset.

View Data in RStudio

Newer version of RStudio support viewing data in different structures. To view bodydata we have imported in Exercise 2: Importing data in R, we can use View(bodydata). If you have not imported the, you need to follow the exercise and import the data first. We can also click the data in “Environment” tab to view it.

View dataset using command and by clicking the data in Environment tab

Figure 6: View dataset using command and by clicking the data in Environment tab

Summary of data

We can compute basic descriptive summary statistics using summary function as,

summary(bodydata)
     Weight          Height         Age       Circumference  
 Min.   : 42.0   Min.   :150   Min.   :18.0   Min.   : 57.9  
 1st Qu.: 58.5   1st Qu.:164   1st Qu.:23.0   1st Qu.: 68.0  
 Median : 68.6   Median :171   Median :27.0   Median : 75.6  
 Mean   : 69.2   Mean   :171   Mean   :29.9   Mean   : 76.9  
 3rd Qu.: 78.8   3rd Qu.:178   3rd Qu.:35.0   3rd Qu.: 84.3  
 Max.   :108.6   Max.   :198   Max.   :67.0   Max.   :113.2  

Dimension of data

The number of elements in a data structure like vector and list we can use length function. For example: if we extract Weight variable from bodydata we will get a numeric vector. The length of this vector is,

length(bodydata$Weight)

A multi-dimensional data structure like matrix, array and data frame has dimension. We can use dim function to find the dimension.

dim(bodydata)
[1] 407   4

Here, the first and second item refers to the number of rows and number of columns of bodydata. Similarly, we can use nrow(bodydata) and ncol(bodydata)to obtain these number individually.

Lets Practice

  1. Take a look at the top 5 rows of bodydata
  2. Take a look at the top 5 rows of Height and Circumference variables of bodydata
  3. Apply summary function on Age variable of bodydata