9 L09: GIS III
9.1 Mapping Categorical and Continuous Data
library(tidyverse)
library(ggspatial)
library(ggplot2)
library(ggrepel)
library(rnaturalearth)
library(rnaturalearthdata)
<- ne_countries(scale = "medium", returnclass = "sf") world
9.1.1 Goals
Basic strategies for mapping data.
9.1.2 Software
- R
9.1.3 Data
library(historydata)
library(europop)
## Warning: package 'europop' was built under R version 4.0.2
#help(package = historydata)
#help(package = europop)
We will use data from packagess europop
and historydata
.
9.1.3.1 Some Vienna Data
Source: https://www.data.gv.at/katalog/dataset/stadt-wien_bezirksgrenzenwien. Download SHP file. You can load it, using the following code. (The file must be unzipped: BEZIRKSGRENZEOGD
is the default folder; BEZIRKSGRENZEOGDPolygon
is the name of most files in the unzipped folder)
- map of districts of Vienna (SHP/shape file);
- population of Vienna (csv file)
library(rgdal) # R wrapper around GDAL/OGR
library(ggmap) # for fortifying shapefiles (converting GIS files into data frames)
library(ggplot2) # ggplot2 library
Loading SHP file, like we did before. plot()
allows us to quickly assess whether data was loaded correctly.
<- readOGR("./data_temp/BEZIRKSGRENZEOGD", "BEZIRKSGRENZEOGDPolygon") wienBezirks
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/romanovienna/Dropbox/6_Teaching_New/_rgis_course/rgis_univie2021_draft/data_temp/BEZIRKSGRENZEOGD", layer: "BEZIRKSGRENZEOGDPolygon"
## with 23 features
## It has 15 fields
plot(wienBezirks)
Loading our data on the growth of Vienna…
library(readr)
<- read_csv("data_temp/wien_data/Wien_Stadt_Data.csv") wienBezirksData
## Parsed with column specification:
## cols(
## Bezirk_Nummer = col_character(),
## Bezirk_Name = col_character(),
## Kategorie = col_character(),
## Jahr = col_double(),
## Daten = col_double()
## )
head(wienBezirksData)
## # A tibble: 6 x 5
## Bezirk_Nummer Bezirk_Name Kategorie Jahr Daten
## <chr> <chr> <chr> <dbl> <dbl>
## 1 11 Simmering Häuser 1590 207
## 2 11 Simmering Häuser 1751 225
## 3 11 Simmering Häuser 1783 281
## 4 11 Simmering Häuser 1787 281
## 5 11 Simmering Häuser 1794 329
## 6 11 Simmering Häuser 1795 319
9.2 Maps
9.2.1 Choropleth: Population of Vienna over time
Choropleth map is very commonly used to visualize spatial data. Polygons are colored and the intensity of color (or change from one color to another with some color scale) is used to indicate continuous values Less frequently, this type of maps is also used for categorical data.
fortify
is a function that converts a coomplex SHP object into a dataframe that ggplot2
can understand; parameter region
is important — here you assign data from which column of the original shape file must be associated with the elements in new dataframe. We want to associate all polygons with districts (BEZ
contains district numbers — in the same format as we have in wienBezirksData
: 01
, 02
, 03
, etc.). The operation is quite simple, but crucial.
<- fortify(wienBezirks, region = "BEZ") wienBezirksMod
Now, we need to process somme data and then we can map it with ggplot
. In the chunk below we simply pick data for a specific year.
<- wienBezirksData %>%
wienBezirksDataTemp filter(Kategorie=="Einwohner" & Jahr == 2011)
Now, for adding labels to our map, we need to generate coordinates for where those labels will be added. The following chunk calculated average mean of all latitude and longitude values for each district.
<- wienBezirksMod %>%
distcenters group_by(id) %>%
::summarize(clat = mean(lat), clong = mean(long)) %>%
dplyrleft_join(wienBezirksData, by = c("id" = "Bezirk_Nummer")) %>%
select(id, clat, clong, Bezirk_Name) %>% unique %>%
mutate(label = paste0(id, ". ", Bezirk_Name))
Now we can put everything on the graph.
ggplot() +
geom_map(data = wienBezirksDataTemp, aes(map_id = Bezirk_Nummer, fill = Daten),
map = wienBezirksMod) + expand_limits(x = wienBezirksMod$long, y = wienBezirksMod$lat) +
geom_text_repel(data = distcenters, aes(x = clong, y = clat, label = label), size = 2) +
scale_fill_gradient(low = "white", high = "red", na.value = NA) +
labs(title="Population of Vienna, 2011", x ="", y = "")
As you already know, we can generate multiple maps to visualize different aspects of our data. With of Viennese data, we can generate graphs by years.
<- wienBezirksData %>%
wienBezirksDataTemp filter(Kategorie=="Einwohner" & Jahr >= 1857) %>%
filter(Jahr != 1864)
ggplot() +
geom_map(data = wienBezirksDataTemp, aes(map_id = Bezirk_Nummer, fill = Daten),
map = wienBezirksMod) + expand_limits(x = wienBezirksMod$long, y = wienBezirksMod$lat) +
scale_fill_gradient2(low = "white", mid="yellow", high = "red", na.value = NA) +
facet_wrap(~ Jahr) +
labs(title="Growth of Vienna, 1857-2011", x ="", y = "")
Advantages and drawbacks of choroplleth maps?
Homework: generate similar maps for the growth of the US (of the state level), using data from historydata
. You will need to preprocess data properly. Also, feel free to experiment with the visual parameters. Use help
or google for more details.
Additional Materials:
9.2.2 Dots: Population Growth - US population (by city)
In some (or, probably, most) cases, it might be better to avoid choropleth maps. Instead, we can use dots sized with specific parameters. But first, let’s prepare our base layer.
= "lightsteelblue2"
waterColor =c(-140, -50); ylim=c(20,55)
xlim
<- ggplot(data = world) +
baseplot geom_sf(fill="white", color="white") +
coord_sf(xlim = xlim, ylim = ylim, expand = FALSE)
<- theme(panel.background = element_rect(fill = waterColor),
themeParameters axis.title.y=element_blank(),
axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank(),
panel.grid.major = element_line(color = waterColor, linetype = "dotted", size = 0.5))
+ themeParameters baseplot
Homework: update this map by adding lakes and rivers. (See the previous lesson to refresh how this can be done.)
Let’s try to add some historical data: us_cities_pop
dataset gives us data by decades.
<- historydata::us_cities_pop %>%
us_data filter(year >= 1790 & year <= 1900)
+
baseplot geom_point(data = us_data, aes(x=lon, y=lat, size=population), alpha = 0.02) +
scale_size_continuous(range = c(0.05, 3)) +
facet_wrap(~ year) +
labs(title="Growth of the USA, 1790-1900", x ="", y = "") +
themeParameters
<- historydata::us_cities_pop %>%
us_data filter(year >= 1900)
+
baseplot geom_point(data = us_data, aes(x=lon, y=lat, size=population), alpha = 0.02) +
scale_size_continuous(range = c(0.05, 3)) +
facet_wrap(~ year) +
labs(title="Growth of the USA, 1900-2010", x ="", y = "") +
themeParameters
This kind of maps give a nice insight into such processes as growth of territories. Dots, of course, can be used to map any quantitative parameter.
9.2.3 Graphs: Population Growth by State
Sometimes, however, we might opt out of using maps. Package geofacet
(more on this package) offers a differnt approach: it offers the use of suggestive tables instead of maps. Below you can find a code sample that visualizes the same data as we had above.
#install.packages("geofacet")
library(geofacet)
## Warning: package 'geofacet' was built under R version 4.0.2
<- historydata::us_cities_pop %>%
us_states_pop group_by(state, year) %>%
::summarize(population = sum(population)) dplyr
## `summarise()` has grouped output by 'state'. You can override using the `.groups` argument.
<- ggplot(us_states_pop, aes(year, population)) +
graph01 geom_line() +
facet_geo(~ state, grid = "us_state_grid2", label = "name") +
labs(title = "Growth of the US, 1790-2010",
caption = "Data Source: `historydata`",
x = "Year",
y = "Population") +
theme(axis.text.x = element_text(angle = 90))
## Some values in the specified facet_geo column 'state' do not match the
## 'code' column of the specified grid and will be removed: IT
ggsave("us_facet_map.png", plot=graph01, width = 400, height = 200, units = "mm", dpi = "retina")
Homework: check documentation on geofacet
(https://cran.r-project.org/web/packages/geofacet/vignettes/geofacet.html). The package has an interface to develop custom grid. Develop a grid for Viennese districts and plot the growth of their population using your custom grid.
Grid designer is available here: https://hafen.github.io/grid-designer/. You will need some data for an easy start:
<- distcenters %>%
wien_starting_grid mutate(row = seq(1,23), col=seq(1,23)) %>%
mutate(code = id, name = label) %>%
select(row,col,code,name)
You can start it with the following line of code:
grid_design(data = wien_starting_grid, img = "https://upload.wikimedia.org/wikipedia/commons/thumb/1/1f/Vienna%2C_administrative_divisions_-_Nmbrs.svg/2880px-Vienna%2C_administrative_divisions_-_Nmbrs.svg.png")
<- data.frame(
mygrid row = c(5, 4, 4, 4, 5, 5, 6, 6, 6, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23),
col = c(9, 9, 8, 7, 8, 7, 9, 8, 7, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23),
code = c("02", "09", "08", "07", "01", "06", "03", "04", "05", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23"),
name = c("02. Leopoldstadt", "09. Alsergrund", "08. Josefstadt", "07. Neubau", "01. Innere-Stadt", "06. Mariahilf", "03. Landstraße", "04. Wieden", "05. Margareten", "10. Favoriten", "11. Simmering", "12. Meidling", "13. Hietzing", "14. Penzing", "15. Rudolfsheim-Fünfhaus", "16. Ottakring", "17. Hernals", "18. Währing", "19. Döbling", "20. Brigittenau", "21. Floridsdorf", "22. Donaustadt", "23. Liesing"),
stringsAsFactors = FALSE
)::grid_preview(mygrid) geofacet
## Note: You provided a user-specified grid. If this is a generally-useful
## grid, please consider submitting it to become a part of the geofacet
## package. You can do this easily by calling:
## grid_submit(__grid_df_name__)
Optional: complete the same task for the europop
dataset (library europop
).
9.2.4 Many Maps: Population by city — growth of the US
You might need to generate multiple high-resolution maps, instead of a set of tiny faceted maps. For this purpose loops are very helpful. They are a powerful tool and one should know how to use them. Below is an example of a very simple for
loop. The our loop prints single letters—one after another—from a built-in vector/variable letters.
(as you might remember from the early worksheets, letters
is a built-in vector of English letters.)
<- "This is letter"
string
for(l in letters){
print(paste(string, l))
}
## [1] "This is letter a"
## [1] "This is letter b"
## [1] "This is letter c"
## [1] "This is letter d"
## [1] "This is letter e"
## [1] "This is letter f"
## [1] "This is letter g"
## [1] "This is letter h"
## [1] "This is letter i"
## [1] "This is letter j"
## [1] "This is letter k"
## [1] "This is letter l"
## [1] "This is letter m"
## [1] "This is letter n"
## [1] "This is letter o"
## [1] "This is letter p"
## [1] "This is letter q"
## [1] "This is letter r"
## [1] "This is letter s"
## [1] "This is letter t"
## [1] "This is letter u"
## [1] "This is letter v"
## [1] "This is letter w"
## [1] "This is letter x"
## [1] "This is letter y"
## [1] "This is letter z"
We can use the same strategy to loop over some table of our data and generate maps based on the changing parameter. For example, we want to generate maps of U.S. cities, where each map would show us data forr a specific decade. Let’s try a simple example first: let’s generate headers for our future maps.
<- historydata::us_cities_pop
us_data <- unique(us_data$year)
decades
for (d in decades){
<- paste0("U.S. cities in ", d)
header print(header)
}
## [1] "U.S. cities in 1790"
## [1] "U.S. cities in 1800"
## [1] "U.S. cities in 1810"
## [1] "U.S. cities in 1820"
## [1] "U.S. cities in 1830"
## [1] "U.S. cities in 1840"
## [1] "U.S. cities in 1850"
## [1] "U.S. cities in 1860"
## [1] "U.S. cities in 1870"
## [1] "U.S. cities in 1880"
## [1] "U.S. cities in 1890"
## [1] "U.S. cities in 1900"
## [1] "U.S. cities in 1910"
## [1] "U.S. cities in 1920"
## [1] "U.S. cities in 1930"
## [1] "U.S. cities in 1940"
## [1] "U.S. cities in 1950"
## [1] "U.S. cities in 1960"
## [1] "U.S. cities in 1970"
## [1] "U.S. cities in 1980"
## [1] "U.S. cities in 1990"
## [1] "U.S. cities in 2000"
## [1] "U.S. cities in 2010"
Now we can modify this code so that it generates maps:
<- historydata::us_cities_pop
us_data <- unique(us_data$year)
decades
for (d in decades){
<- paste0("U.S. cities in ", d)
header
<- us_data %>%
dataTemp filter(year == d)
<- baseplot +
mapTemp geom_point(data = dataTemp, aes(x=lon, y=lat, size=population), alpha = 0.5) +
scale_size_continuous(range = c(0.05, 5)) +
labs(title=header, x ="", y = "") +
themeParameters
ggsave(paste0("us_map_", d, ".png"), plot=mapTemp, width = 400, height = 200, units = "mm", dpi = "retina")
}
Here are some results:
Homework: generate similar maps from europop
data.
9.2.5 Animated Maps: Population by city — growth of the US
One last ting for today: animated maps. Package gganimate
is a great tool fo rcreating animated graphs and maps.
#install.packages('gganimate')
library(gganimate)
## Warning: package 'gganimate' was built under R version 4.0.2
<- historydata::us_cities_pop us_data
Let’s start with a static plot, but first we need to prepare data. If our data is not prepared properly, our results may be very weird. Like on the map below:
Essentially, to avoid that we need to prepare data for each frame of our future animation. Our data is given for every 10th year (1790, 1800, 1810, etc.). gganimiate
tries to fill in the blanks by creation transition frames. In other words, only every 10th frame shows the real data. There are multiple ways of how this issue can be solved. In the code below we create values for missing years — the value of each 10th year is assigned to years 11, 12, 13, etc. We use a for
-loop for this.
= "lightsteelblue2"
waterColor =c(-130, -60); ylim=c(23,52)
xlim
<- us_data %>%
us_data_filtered filter(between(lon, xlim[1], xlim[2])) %>%
filter(between(lat, ylim[1], ylim[2]))
<- us_data_filtered
us_data_filtered_enriched
for (n in seq(1,9)){
<- us_data_filtered %>%
temp mutate(year = year + n)
<- add_row(us_data_filtered_enriched, temp)
us_data_filtered_enriched
}
<- us_data_filtered_enriched %>%
us_data_filtered_enriched filter(year <= 2010)
Now, the plot:
<- ggplot(data = world) +
baseplot geom_sf(fill="white", color="white") +
coord_sf(xlim = xlim, ylim = ylim, expand = FALSE)
<- baseplot +
usMap geom_point(data = us_data_filtered_enriched, aes(x=lon, y=lat, size = population), alpha = 0.1, show.legend = FALSE) +
scale_size_continuous(range = c(0.05, 5)) +
themeParameters
usMap
Now, the following code animates our map. And the last line saves our animation into a file. Check documentation for gganimate
for many other options for saving animations. NB: Keep in mind that this process make take a while, so you do not want to rerun it every time you knit your notebook!
<- usMap +
usMapAnimated transition_time(year) +
labs(title = "Growth of the US cities: {frame_time}")
anim_save("usMap_Animated.gif", animation=usMapAnimated, end_pause = 10, rewind = FALSE, height = 5, width = 8, units = "in", res = 300)
This is our animated map, which we saved into a gif file.
Homework: Animate any of the maps that you are tasked to prepare above.
9.3 Homework
- Homework is described in the lesson.
Submit your knitted notebook.
9.4 Submitting homework
- Homework assignment must be submitted by the beginning of the next class;
- Email your homework to the instructor as attachments.
- In the subject of your email, please, add the following:
070184-LXX-HW-YourLastName-YourMatriculationNumber
, whereLXX
is the numnber of the lesson for which you submit homework;YourLastName
is your last name; andYourMatriculationNumber
is your matriculation number.
- In the subject of your email, please, add the following: