Exercise 6: Subsets of data and logical operators
Logical vector and index vector
A lot of times we want to get a subset of data filtering rows or columns of a dataframe. For which we can perform logical test and get TRUE
or FALSE
as result. This vector of logical can then be used to subset the observations from a dataframe.
For example, Lets extract observation from bodydata
with Weight greater than 80. You might be wondering why following code does not work,
isHeavy <- Weight > 80
Error in eval(expr, envir, enclos): object 'Weight' not found
But remember that, the variable Weight
is a part of bodydata
. We have to extract Weight
from the bodydata
first. In R, with
and within
function helps you in this respect. In the following code, with
function goes inside bodydata
and execute the expression Weight > 80
.
isHeavy <- with(bodydata, Weight > 80)
Here the logical vector isHeavy
is computed by performing a logical operation on Weight
variable within bodydata
. The same operation can be done as,
isHeavy <- bodydata$Weight > 80
Take a look at this variable, what is it? :
head(isHeavy)
[1] FALSE TRUE FALSE FALSE FALSE TRUE
Yes, it is a vector of TRUE
and FALSE
with same length as Weight
. Here the condition has compared each element of Weight
results TRUE
if it is greater than 80 and FALSE
if it is less than 80.
- Identify the elements
We can identify which observations that are heavy by the
which()
functionHeavyId <- which(isHeavy)
This will return a vector of row index for the observations that are heavy, i.e. greater than 80. So how many are heavy? To find the size of a vector we can use
length
function.length(HeavyId)
[1] 94
Here, 94 observations have Weight larger than 80.
Exercise
Identify who are taller than 180 and save this logical vector as an object called
isTall
.Answer
isTall <- with(bodydata, Height > 180)
How many observations have height taller than 180?
Answer
TallId <- which(isTall) length(TallId)
[1] 76
How many observations are both tall and heavy? Here, you can use
length
function as above to find how many person are taller than 180.Answer
isBoth <- isHeavy * isTall
- How is this computation done?
- Here
isHeavy
andisTall
containsTRUE
andFALSE
. The multiplication of logical operator results a logical vector withTRUE
only if both the vectors areTRUE
elseFALSE
.
Alternatively :
isBoth <- which(isHeavy & isTall)
The
&
operator resultTRUE
if bothisHeavy
andisTall
areTRUE
else,FALSE
which is same as previous.
Subsetting data frame
Example 1
Lets create a subset of the data called bodydataTallAndHeavy
containing only the observations for tall and heavy persons as defined by isBoth.
bodydataTallAndHeavy <- bodydata[isBoth, ]
For other logical tests see help file ?Comparison
Example 2
Lets create a random subset of 50 observations. For this we first sample 50 row index randomly from all rows in bodydata
. The sample
function is used for the purpose. In the following code, nrow(bodydata)
return the number of rows in bodydata
. The sample
function takes two argument x
which can be a vector or a integer and size
which is the size of the sample to be drawn.
idx <- sample(x = nrow(bodydata), size = 50)
Here, 50 rows are sampled from the total number of rows and the index of the selected rows are saved on vector idx
.
Using this vector we can select the observations in bodydata
to create a new data set called bodydataRandom
as,
bodydataRandom <- bodydata[idx, ]
Here is the first five rows of bodydataRandom
dataset.
head(bodydataRandom, n = 5)
Weight Height Age Circumference
234 74.1 168 55 90.1
283 53.6 162 23 63.2
91 70.2 175 27 73.5
378 58.0 163 20 71.0
105 55.5 169 20 68.0
Exercise
Create a subset of dataset bodydata
including the observation with Age
larger an 55 and Circumference
larger than 80. Save this dataset named subdata
.
Answer
idx <- with(bodydata, Age > 55 & Circumference > 80)
subdata <- bodydata[idx, ]
subdata
Weight Height Age Circumference
136 76.4 185 62 94.8
189 73.6 175 60 90.5
207 66.8 168 62 81.5
231 80.0 174 65 98.6
For those who are interested in playing more with data, have a look at http://r4ds.had.co.nz/transform.html