Data Visualization

Capital Bikeshare Data

The Capital Bikeshare Data is a dataset that holds the number of capital bikeshare rides taken on each day between January 1st, 2015 and December 31st, 2022. The first plot below is an interactive time series plot that indicates how many rides were taken on each day. By clicking and dragging you can adjust to smaller or larger time periods, and hovering on a certain point will show you how many rides were taken that day. There are clearly seasonal patterns with many more rides taken in the warmer months than in the winter months. The year over year trend is fairly consistent, however 2020 clearly has lower ridership than other years due to the covid 19 pandemic.

Code
fig = plot_ly(cabi, type = "scatter", mode = "lines") %>% 
  add_trace(x = ~date, y = ~count, name = "Total Riders", line = list(color = "red", width = .75)) %>% 
  layout(title = "Capital Bikeshare Daily Ridership", xaxis = list(title = "Date"), yaxis = list(title = "Number of Rides"),showlegend = F)
fig

The next plot also shows the capital bikeshare rider data over time. In an effort to highlight some important events a limited time frame of the first plot is displayed. The low ride total days are typically fairly consistent. There aren’t many extreme low days and those that do exist are typically due to low temperatures or rain/snow. However, extreme high ride total days do exist. The plot below shows events that occurred on some of the more extreme days. As you can see ridership increases significantly when large events occur in DC.

Code
arrows =
  tibble(
    x1 = c(as.Date("2017-09-01"),
           as.Date("2018-09-03"),
           as.Date("2019-11-15"),
           as.Date("2020-09-01"),
           as.Date("2021-07-15"),
           as.Date("2022-06-15"),
           as.Date("2022-06-15")),
    x2 =c(as.Date("2018-04-18"),
          as.Date("2018-12-03"),
          as.Date("2019-04-14"), 
          as.Date("2020-06-13"), 
          as.Date("2020-11-07"),
          as.Date("2022-07-04"),
          as.Date("2021-07-15")),
    y1 = c(21400,
           22000,
           21500,
           18200,
           23200,
           23000,
           23000), 
    y2 = c(19400,
           16700,
           18300,
           15400,
           14800,
           19700,
           16800)
  )
fig = ggplot(cabi, 
             aes(x = date,
                 y = count)) + 
  geom_line(col = "red")+
  labs(
    title = "Capital Bikeshare Ridership Peak Days",
    x = "Date",
    y = "Number of Rides"
  ) +
  ylim(0, 25000) +
  xlim(as.Date("2016-06-01"), as.Date("2023-01-01"))+
  geom_curve(data = arrows, 
             aes(x = x1, y =y1, xend=x2, yend = y2),
             arrow = arrow(length = unit(0.08, "inch")), 
             size = 0.25,
             curvature = -0.3)+
  theme_classic()+
  theme(axis.title.x = element_text(size = 12,margin = margin(t = 10)),
        axis.title.y = element_text(size = 12, margin = margin(t = 20)),
        plot.title = element_text(hjust = 0.5, vjust = 2, size = 16) )
fig = fig + annotate("text", 
               x = as.Date("2017-06-01"), 
               y = 23000, 
               label = "Emancipation Day\nParade", 
               angle = 20)+
  annotate("text", 
           x = as.Date("2019-11-01"), 
           y = 23000, 
           label = "Cherry Blossom\nPeak", 
           angle = 20)+
  annotate("text", 
           x = as.Date("2021-07-01"), 
           y = 24700, 
           label = "Election Day", 
           angle = 20)+
  annotate("text", x = 
             as.Date("2020-08-15"), 
           y = 20000, 
           label = "Policy Brutality\nProtest", 
           angle = 20)+
  annotate("text", 
           x = as.Date("2022-05-15"), 
           y = 24000, 
           label = "July 4th", 
           angle = 20)+
  annotate("text", 
           x = as.Date("2018-09-03"), 
           y = 23500, 
           label = "George Bush\nMemorial", 
           angle = 20)
  
fig

Metro Rail Data

This plot shows metro monthly riders between 2011 and 2022. There is clearly a large change in the trend in the spring of 2020, this is due to the covid 19 pandemic impacting the number of people commuting and travelling in Washington DC. Before the pandemic there was a clear seasonal trend with the winter months having lower total ridership and the summer months, particularly July, having the highest ridership. The overall trend before the pandemic was slightly downwards. Post pandemic shutdowns there is still a clear seasonal trend, however the overall trend is significantly upwards.

Code
fig = plot_ly(metro, type = "scatter", mode = "lines") %>% 
  add_trace(x = ~Date, y = ~Riders_Total, line = list(color = "#038fff"), name = "Number of Riders") %>% 
  add_segments(x = as.Date("2020-02-01"), xend =as.Date("2020-02-01"), y = 0, yend = 25000000, line = list(color = "gray", dash = "dash"), name = "Pandemic Start") %>% 
  layout(title = "Monthly Metro Riders",
         xaxis = list(title = "Date"), 
         yaxis = list(title = "Number of Riders"),legend = list(orientation = 'h'))
fig

Metro Bus Data

This plot shows monthly bus riders between 2011 and 2022. There is clearly a large change in the trend in the spring of 2020, this is due to the covid 19 pandemic impacting the number of people commuting and travelling in Washington DC. Before the pandemic there was a clear seasonal trend with the winter months having lower total ridership and the warmer months, particularly May and October, having the highest ridership. The overall trend before the pandemic was slightly downwards. Post pandemic shutdowns there is still somewhat of a seasonal trend, however the overall trend is significantly upwards.

Code
fig = plot_ly(bus, type = "scatter", mode = "lines") %>% 
  add_trace(x = ~Date, y = ~Riders_Total, line = list(color = "#088d31"), name = "Number of Riders") %>% 
  add_segments(x = as.Date("2020-02-01"), xend =as.Date("2020-02-01"), y = 0, yend = 15000000, line = list(color = "gray", dash = "dash"), name = "Pandemic Start") %>% 
  layout(title = "Monthly Bus Riders",
         xaxis = list(title = "Date"), 
         yaxis = list(title = "Number of Riders"),legend = list(orientation = 'h'))
fig

NOAA Weather Data

The plots below show temperature and precipitation data for washington DC. This can be used in conjunction with transportation data to better understand transportation patterns.

Code
par(mfrow = c(2,1))
ggplot(noaa, x = aes(x = DATE)) + 
  geom_line(aes(x  =DATE, 
                y = TMAX),
            col = "red")+ 
  labs(
    title = "Daily Maximum Temperature Washington DC",
    x = "Date",
    y = "Temperature (F)"
  )+
  theme_classic()+
  theme(axis.title.x = element_text(size = 12,margin = margin(t = 10)),
        axis.title.y = element_text(size = 12, margin = margin(t = 20)),
        plot.title = element_text(hjust = 0.5, vjust = 2, size = 16) )

Code
ggplot(noaa, x = aes(x = DATE)) +
  geom_line(aes(x  =DATE, 
                y = PRCP),
            col = "blue")+ 
  labs(
    title = "Daily Precipitation Washington DC",
    x = "Date",
    y = "Precipitation (Inches)"
  )+
  theme_classic()+
  theme(axis.title.x = element_text(size = 12,margin = margin(t = 10)),
        axis.title.y = element_text(size = 12, margin = margin(t = 20)),
        plot.title = element_text(hjust = 0.5, vjust = 2, size = 16) )