Introduction

This tutorial is going to demonstrate how to make gorgeous maps of cities using streets and other geographic features. This is enabled by the osmdata package.

I borrow heavily from the excellent tutorial at ggplot2tutor.com. I extend this tutorial in order to a) demonstrate deeper functionality of this package; and b) to provide additional instruction and application in ggplot, the tidyverse, and rvest.

I’ve given one of these maps as a gift, printed in a large poster format and framed. You’ll be able to make and customize your own maps and learn some new R functionality along the way.

Check out my website joshuamccrain.com for more code, tutorials, and my published research.


Loading in open streets data

First install the osmdata package. I recommend doing an install as listed here to get the most up to date version of this package.

library(remotes)
remotes::install_github("ropensci/osmdata")

Now load in the rest of the packages we’ll be using.

library(tidyverse)
library(osmdata) # package for working with streets
library(showtext) # for custom fonts
library(ggmap)
library(rvest)

The osmdata package has a ton of awesome features. It appears that only a fraction of them are available for most cities, and that seems to be primarily those that are related to streets. We’ll focus on those today, but I hope they continue extending this package.

The avilable_tags() function shows you the available features of the different classifications of streets and other map features this package works with. E.g.:

available_tags("highway")
##  [1] "bridleway"              "bus guideway"          
##  [3] "bus stop"               "construction"          
##  [5] "corridor"               "crossing"              
##  [7] "cycleway"               "elevator"              
##  [9] "emergency access point" "escape"                
## [11] "footway"                "give way"              
## [13] "living street"          "milestone"             
## [15] "mini roundabout"        "motorway"              
## [17] "motorway junction"      "motorway link"         
## [19] "passing place"          "path"                  
## [21] "pedestrian"             "platform"              
## [23] "primary"                "primary link"          
## [25] "proposed"               "raceway"               
## [27] "residential"            "rest area"             
## [29] "road"                   "secondary"             
## [31] "secondary link"         "service"               
## [33] "services"               "speed camera"          
## [35] "steps"                  "stop"                  
## [37] "street lamp"            "tertiary"              
## [39] "tertiary link"          "toll gantry"           
## [41] "track"                  "traffic mirror"        
## [43] "traffic signals"        "trailhead"             
## [45] "trunk"                  "trunk link"            
## [47] "turning circle"         "turning loop"          
## [49] "unclassified"

We can then use getbb() to return the geographic location of a city simply by passing it a string.

getbb("Atlanta Georgia")
##         min       max
## x -84.55107 -84.28956
## y  33.64781  33.88682

Now we’re going to grab certain categories of streets from this package. You’ll notice that the key function here, add_osm_feature() takes two arguments, key and value. Since we’re grabbing only street features, and all streets fall under the key “highway”, that’s what we use. I have then arbitrarily divided the values into different categories based on size of the road. You’ll see how this comes into play later when we plot them, but feel free to play with this.

Finally, the osmdata_sf() function returns the pulled data in a format that ggmap knows how to work with, making plotting very simple. You can take a look at the dataframe it creates to get a sense of the data format.

big_streets <- getbb("Asheville United States")%>%
  opq()%>%
  add_osm_feature(key = "highway", 
                  value = c("motorway", "primary", "motorway_link", "primary_link")) %>%
  osmdata_sf()

big_streets
## Object of class 'osmdata' with:
##                  $bbox : 35.4164574,-82.671024,35.6561213,-82.459938
##         $overpass_call : The call submitted to the overpass API
##                  $meta : metadata including timestamp and version numbers
##            $osm_points : 'sf' Simple Features Collection with 5973 points
##             $osm_lines : 'sf' Simple Features Collection with 845 linestrings
##          $osm_polygons : 'sf' Simple Features Collection with 2 polygons
##        $osm_multilines : NULL
##     $osm_multipolygons : NULL
View(big_streets[["osm_lines"]])

We’ll now do the same and grab two more dataframes of streets: medium streets and small streets. Again, these classifications are arbitrary and can be changed or divided even further depending on your preference. This is the fun part!

med_streets <- getbb("Asheville United States")%>%
  opq()%>%
  add_osm_feature(key = "highway", 
                  value = c("secondary", "tertiary", "secondary_link", "tertiary_link")) %>%
  osmdata_sf()


small_streets <- getbb("Asheville United States")%>%
  opq()%>%
  add_osm_feature(key = "highway", 
                  value = c("residential", "living_street",
                            "unclassified",
                            "service", "footway"
                  )) %>%
  osmdata_sf()

Now I want to grab two more features that osmdata captures: rivers and railways. You can see the key in the add_osm_feature() function has now changed to “waterway” and “railway”. We’re only going to use “river” and “rail” respectively.

river <- getbb("Asheville United States")%>%
  opq()%>%
  add_osm_feature(key = "waterway", value = "river") %>%
  osmdata_sf()

railway <- getbb("Asheville United States")%>%
  opq()%>%
  add_osm_feature(key = "railway", value="rail") %>%
  osmdata_sf()
available_tags("railway")
##  [1] "abandoned"        "buffer stop"      "construction"    
##  [4] "crossing"         "derail"           "disused"         
##  [7] "funicular"        "halt"             "level crossing"  
## [10] "light rail"       "miniature"        "monorail"        
## [13] "narrow gauge"     "platform"         "preserved"       
## [16] "rail"             "railway crossing" "roundhouse"      
## [19] "signal"           "station"          "subway"          
## [22] "subway entrance"  "switch"           "tram"            
## [25] "tram stop"        "traverser"        "turntable"       
## [28] "wash"

Plotting

Now to the fun part: plotting. This is very straightforward because of ggmaps, which gives us the geom_sf() function. All we’re doing here is plotting the “big” streets without any other formatting.

ggplot() +
  geom_sf(data = big_streets$osm_lines,
          inherit.aes = FALSE,
          color = "black")

And now the river:

ggplot() +
  geom_sf(data = river$osm_lines,
          inherit.aes = FALSE,
          color = "black")

This already gives us something nice, but let’s add in the other streets and features as additional layers and see what it looks like.

ggplot() +
  geom_sf(data = big_streets$osm_lines,
          inherit.aes = FALSE,
          color = "black") +
  geom_sf(data = river$osm_lines,
          inherit.aes = FALSE,
          color = "black")

One immediate problem: the axes are far too large because of the river. Let’s limit them using coord_sf(). This takes playing around with depending on your preference over what part of the city you want to zoom in on.

ggplot() +
  geom_sf(data = big_streets$osm_lines,
          inherit.aes = FALSE,
          color = "black") +
  geom_sf(data = river$osm_lines,
          inherit.aes = FALSE,
          color = "black") +
   coord_sf(xlim = c(-82.6, -82.5), 
           ylim = c(35.52, 35.62),
           expand = FALSE)

A bit too zoomed in. Let’s readjust.

ggplot() +
  geom_sf(data = big_streets$osm_lines,
          inherit.aes = FALSE,
          color = "black") +
  geom_sf(data = river$osm_lines,
          inherit.aes = FALSE,
          color = "black") +
   coord_sf(xlim = c(-82.65, -82.48), 
           ylim = c(35.5, 35.65),
           expand = FALSE)

Much better. Let’s add the remaining features as well as some coloring, size, and alpha aesthetics.

ggplot() +
  geom_sf(data = river$osm_lines,
          inherit.aes = FALSE,
          color = "steelblue",
          size = .8,
          alpha = .3) +
  geom_sf(data = railway$osm_lines,
          inherit.aes = FALSE,
          color = "black",
          size = .2,
          linetype="dotdash",
          alpha = .5) +
  geom_sf(data = med_streets$osm_lines,
          inherit.aes = FALSE,
          color = "black",
          size = .3,
          alpha = .5) +
  geom_sf(data = small_streets$osm_lines,
          inherit.aes = FALSE,
          color = "#666666",
          size = .2,
          alpha = .3) +
  geom_sf(data = big_streets$osm_lines,
          inherit.aes = FALSE,
          color = "black",
          size = .5,
          alpha = .6) +
  coord_sf(xlim = c(-82.65, -82.48), 
           ylim = c(35.5, 35.65),
           expand = FALSE)

This looks pretty good. NB: the order in which you layer the features (e.g., river before the streets) really makes a difference and can be altered.

Now I’m going to add some custom fonts and label the map to something that looks good printed. You can also save these as images first and open in a text editor, but I wanted to keep it vectorized to make printing easier.

font_add_google(name = "Lato", family = "lato") # add custom fonts
showtext_auto()

ggplot() +
  geom_sf(data = river$osm_lines,
          inherit.aes = FALSE,
          color = "steelblue",
          size = .8,
          alpha = .3) +
  geom_sf(data = railway$osm_lines,
          inherit.aes = FALSE,
          color = "black",
          size = .2,
          linetype="dotdash",
          alpha = .5) +
  geom_sf(data = med_streets$osm_lines,
          inherit.aes = FALSE,
          color = "black",
          size = .3,
          alpha = .5) +
  geom_sf(data = small_streets$osm_lines,
          inherit.aes = FALSE,
          color = "#666666",
          size = .2,
          alpha = .3) +
  geom_sf(data = big_streets$osm_lines,
          inherit.aes = FALSE,
          color = "black",
          size = .5,
          alpha = .6) +
  coord_sf(xlim = c(-82.65, -82.48), 
           ylim = c(35.5, 35.65),
           expand = FALSE)  +
  theme_void() + # get rid of background color, grid lines, etc.
  theme(plot.title = element_text(size = 20, family = "lato", face="bold", hjust=.5),
        plot.subtitle = element_text(family = "lato", size = 8, hjust=.5, margin=margin(2, 0, 5, 0))) +
  labs(title = "ASHEVILLE", subtitle = "35.595°N / 82.552°W")

You can save it by adding one more line to the ggplot call:

ggsave(file="asheville.pdf", units="in", width = 6, height=7)

That’s all there is to it. Each city will look different based on its features, you’ll especially have to play around with how zoomed in you want the map to be. Play around with various cities and see what works!

You can also obviously change the background colors, street colors, etc. based on personal preferences. I like this clean look, personally.


Extensions

There are quite a few interesting extensions using the data available by Open Streets.

Highlighting individual streets

First, let’s say there’s a particular street you want to highlight on your map. Maybe its’ the one you grew up on, or the main street in your college town. This is pretty easy to do – simply find which “category” of street it falls under (small, medium, big) and create a separate dataframe that is filtered based on its name.

In this case, I’ll extract the famous Blue Ridge Parkway:

blue_ridge <- med_streets[["osm_lines"]] %>% 
  filter(name=="Blue Ridge Parkway")

Now all we have to do is add it as a separate layer on our existing map:

ggplot() +
  geom_sf(data = river$osm_lines,
          inherit.aes = FALSE,
          color = "steelblue",
          size = .8,
          alpha = .3) +
  geom_sf(data = railway$osm_lines,
          inherit.aes = FALSE,
          color = "black",
          size = .2,
          linetype="dotdash",
          alpha = .5) +
  geom_sf(data = med_streets$osm_lines,
          inherit.aes = FALSE,
          color = "black",
          size = .3,
          alpha = .5) +
  geom_sf(data = small_streets$osm_lines,
          inherit.aes = FALSE,
          color = "#666666",
          size = .2,
          alpha = .3) +
  geom_sf(data = big_streets$osm_lines,
          inherit.aes = FALSE,
          color = "black",
          size = .5,
          alpha = .6) +
  geom_sf(data = blue_ridge,
          inherit.aes = FALSE,
          color = "orange",
          size = 1,
          alpha = 1) +
  coord_sf(xlim = c(-82.65, -82.45), 
           ylim = c(35.5, 35.65),
           expand = FALSE)  +
  theme_void()

Notice that the call to the blue_ridge dataframe in ggplot passes the entire dataframe instead of a vector of $osm_lines – this is because geom_sf() is smart and knows what to do with properly formatted data.


Adding breweries (or other external geographic data):

Since we’re dealing with a standard mapping situation with ggplot, it’s easy to add in additional features to the map as long as you have their latitude in longitude. In this case, I’ll add in individual breweries as points on the map.

First, I am going to scrape the brewery names from a separate website using rvest and clean it up a little.

breweries <- read_html("http://ashevilleblog.com/asheville-brewery-list/") %>% 
  html_nodes(".entry strong") %>% 
  html_text %>% 
  .[str_detect(., "https|[|]")==F]
  
breweries <- breweries[breweries != " beer category" & str_length(breweries) > 3]

head(breweries)
## [1] "12 Bones South Brewing "          "Asheville Brewing Company "      
## [3] "Archetype Brewing"                "Ben’s Tune Up "                  
## [5] "Blue Ghost Brewing Company"       "Blue Mountain Pizza and Brew Pub"

Then I use Google Maps’ API to extract latitude and longitude for each brewery and put it into a dataframe. This uses the register_google() function which requires a free Google API key, which you can register for.

I then pass each of the breweries’ name as a string to the geocode() function (documentation here) – you can treat this similarly to how you would search for a location in Google Maps. If you provide it a specific enough string it works really well.

#register_google(key = "") #uncomment and run with your API key

breweries.add <- geocode(paste(breweries, "asheville", sep=" "), output="all")

breweries.add[[1]][["results"]][[1]][["geometry"]]
## $location
## $location$lat
## [1] 35.46621
## 
## $location$lng
## [1] -82.51643
## 
## 
## $location_type
## [1] "ROOFTOP"
## 
## $viewport
## $viewport$northeast
## $viewport$northeast$lat
## [1] 35.46756
## 
## $viewport$northeast$lng
## [1] -82.51508
## 
## 
## $viewport$southwest
## $viewport$southwest$lat
## [1] 35.46486
## 
## $viewport$southwest$lng
## [1] -82.51778

The geocode() function returns a large list of lists, which we then need to extract additional information from. You can do this with the purrr package, but since the lists are so nested I found that to be overly complicated and opted for a loop instead.

Having extracted the latitude and longitude of each brewery we can now plot it on a map.

breweries <- data.frame(lat = rep(NA, 37),
                        lng = rep(NA, 37))

for(i in 1:37){
  breweries$lat[i] <- breweries.add[[i]][["results"]][[1]][["geometry"]][["location"]][["lat"]]
  breweries$lng[i] <- breweries.add[[i]][["results"]][[1]][["geometry"]][["location"]][["lng"]]
}

head(breweries)
##        lat       lng
## 1 35.46621 -82.51643
## 2 35.59175 -82.55521
## 3 35.57818 -82.57507
## 4 35.59135 -82.55577
## 5 35.43685 -82.53055
## 6 35.69823 -82.56023

Now, overlay the breweries as individual points and format:

ggplot() +
  geom_sf(data = river$osm_lines,
          inherit.aes = FALSE,
          color = "steelblue",
          size = .8,
          alpha = .3) +
  geom_sf(data = railway$osm_lines,
          inherit.aes = FALSE,
          color = "black",
          size = .2,
          linetype="dotdash",
          alpha = .5) +
  geom_sf(data = med_streets$osm_lines,
          inherit.aes = FALSE,
          color = "black",
          size = .3,
          alpha = .5) +
  geom_sf(data = small_streets$osm_lines,
          inherit.aes = FALSE,
          color = "#666666",
          size = .2,
          alpha = .3) +
  geom_sf(data = big_streets$osm_lines,
          inherit.aes = FALSE,
          color = "black",
          size = .5,
          alpha = .6) +
  coord_sf(xlim = c(-82.65, -82.48), 
           ylim = c(35.5, 35.65),
           expand = FALSE)  +
  theme_void() +
  geom_point(data=breweries, aes(x=lng, y=lat), size = 2, alpha=.8, fill="firebrick4", color="white", pch=21, inherit.aes = F) + # add in breweries!
  theme(plot.title = element_text(size = 20, family = "lato", face="bold", hjust=.5),
        plot.subtitle = element_text(family = "lato", size = 8, hjust=.5, margin=margin(2, 0, 5, 0))) +
  labs(title = "ASHEVILLE", subtitle = "35.595°N / 82.552°W") 

You can see the final, vectorized output here.

Conclusion

This tutorial has extended the tutorial available at ggplot2tutor.com, which I still recommend checking out since it covers some more aesthetic options. However, I have shown how to extend the functionality of this package, how to add in additional geographic features such as rivers and railways, and how to overlay other geographic data. Since this is all done within ggplot, it produces a vectorized output that can be printed at any size without loss in quality.

This package is still relatively early on in its development. I imagine there are lots of applications for social scientific research given how easy it is to work with and combine with other geographic data.