3 L03: Basics II
3.1 Data Manipulation & Exloratory Analysis
3.2 Some comments on the previous lesson homework:
3.2.1 Mess with “attachments”
R
automatically “attaches” a package when a package is loaded. What this means is that you can simply use a command from that package: likeglimpse(your_data_frame)
, which will give you a glimpse into your dataframe; a potential problem with this approach is that you may load another library that might have the same command — and since the later package will override “attachments,” you may not be aware of (or not pay attention to) the fact that you have lost some old connections.R
will warn you about any overriding, but you may miss that. This warning message will look like what you see below.- How this can be fixed? An efficient way to avoid this is to use the double-colon operator like this:
dplyr::glimpse(your_data_frame)
- If, out of the sudden, the command that worked before stops working and throws an error at you — this mess with attachments is likely to be the problem.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
- for more details: https://colinfay.me/intro-to-r/packages.html
3.2.2 Vector operations
- adding two vectors
- multiplying two vectors
<- c(1,2,3,4,5,6)
v1 <- c(1,10)
v2 <- c(100) v3
3.2.3 NA
Value?
- Why do you think we need such a value?
3.2.4 many
Vs "many"
What is the difference, from the example below?
<- c(1,2,3,4,5,6,7)
many
length("many")
## [1] 1
length(many)
## [1] 7
3.3 Goals
Getting to know the basics of working with data: manipulating data, basic techniques of exploratory analysis
3.4 Class
03_worksheets_data-manipulation-introduction.Rmd.zip
04_worksheets_data-manipulation-continued.Rmd.zip
NB: Original worksheets prepared by Lincoln Mullen, GMU (https://dh-r.lincolnmullen.com/worksheets.html)
3.5 Topics
- Selecting columns (
select()
) - Filtering rows (
filter()
) - Creating new columns (
mutate()
) - Sorting columns (
arrange()
) - Split-apply-combine (
group_by()
) - Summarizing or aggregating data (
summarize()
) - Data joining with two table verbs (
left_join()
et al.) - Data reshaping (
spread()
andgather()
)
3.6 Reference materials
Consult relevant chapters from:
- Healy, Kieran Data Visualization: A Practical Guide. Princeton University Press, 2018. ISBN: 978-0691181622. http://socviz.co/
- Hadley Wickham & Garrett Grolemund, R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly, 2017. ISBN: 978-1491910399. https://r4ds.had.co.nz/
- Wickham, Hadley. Advanced R, Second Edition. 2 edition. Boca Raton: Chapman and Hall/CRC, 2019. http://adv-r.had.co.nz/
3.7 Homework
- Finish your worksheets and submit as described below.
- Additional: if you’d like more practice, you can use
swirl
library:- To install:
install.packages("swirl")
- To run:
library(swirl)
- Then:
swirl()
- Then:
- To install:
3.8 Submitting homework
- Homework assignment must be submitted by the beginning of the next class;
- Email your homework to the instructor as attachments.
- In the subject of your email, please, add the following:
070184-LXX-HW-YourLastName-YourMatriculationNumber
, whereLXX
is the numnber of the lesson for which you submit homework;YourLastName
is your last name; andYourMatriculationNumber
is your matriculation number.
- In the subject of your email, please, add the following: