3 L03: Basics II

3.1 Data Manipulation & Exloratory Analysis

3.2 Some comments on the previous lesson homework:

3.2.1 Mess with “attachments”

  • R automatically “attaches” a package when a package is loaded. What this means is that you can simply use a command from that package: like glimpse(your_data_frame), which will give you a glimpse into your dataframe; a potential problem with this approach is that you may load another library that might have the same command — and since the later package will override “attachments,” you may not be aware of (or not pay attention to) the fact that you have lost some old connections.
  • R will warn you about any overriding, but you may miss that. This warning message will look like what you see below.
  • How this can be fixed? An efficient way to avoid this is to use the double-colon operator like this: dplyr::glimpse(your_data_frame)
  • If, out of the sudden, the command that worked before stops working and throws an error at you — this mess with attachments is likely to be the problem.
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

3.2.2 Vector operations

  • adding two vectors
  • multiplying two vectors
v1 <- c(1,2,3,4,5,6)
v2 <- c(1,10)
v3 <- c(100)

3.2.3 NA Value?

  • Why do you think we need such a value?

3.2.4 many Vs "many"

What is the difference, from the example below?

many <- c(1,2,3,4,5,6,7)

length("many")
## [1] 1
length(many)
## [1] 7

3.3 Goals

Getting to know the basics of working with data: manipulating data, basic techniques of exploratory analysis

3.5 Topics

  • Selecting columns (select())
  • Filtering rows (filter())
  • Creating new columns (mutate())
  • Sorting columns (arrange())
  • Split-apply-combine (group_by())
  • Summarizing or aggregating data (summarize())
  • Data joining with two table verbs (left_join() et al.)
  • Data reshaping (spread() and gather())

3.6 Reference materials

Consult relevant chapters from:

  • Healy, Kieran Data Visualization: A Practical Guide. Princeton University Press, 2018. ISBN: 978-0691181622. http://socviz.co/
  • Hadley Wickham & Garrett Grolemund, R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly, 2017. ISBN: 978-1491910399. https://r4ds.had.co.nz/
  • Wickham, Hadley. Advanced R, Second Edition. 2 edition. Boca Raton: Chapman and Hall/CRC, 2019. http://adv-r.had.co.nz/

3.7 Homework

  • Finish your worksheets and submit as described below.
  • Additional: if you’d like more practice, you can use swirl library:
    • To install: install.packages("swirl")
    • To run: library(swirl)
      • Then: swirl()

3.8 Submitting homework

  • Homework assignment must be submitted by the beginning of the next class;
  • Email your homework to the instructor as attachments.
    • In the subject of your email, please, add the following: 070184-LXX-HW-YourLastName-YourMatriculationNumber, where LXX is the numnber of the lesson for which you submit homework; YourLastName is your last name; and YourMatriculationNumber is your matriculation number.