Mini Project 01

Author

Christian Montano

Published

September 25, 2024

Introduction

This report provides an analysis of transit systems’ financials, ridership, and efficiency metrics. The data is sourced from ridership, fare revenue, and expenses datasets for 2022.

Loading Libraries and Data

Code
required_packages <- c("tidyverse", "readxl", "DT", "lubridate", "dplyr")

for (package in required_packages) {
  if (!require(package, character.only = TRUE)) {
    install.packages(package, dependencies = TRUE)
    library(package, character.only = TRUE)
  } else {
    library(package, character.only = TRUE)
  }
}
Loading required package: tidyverse
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Loading required package: readxl

Loading required package: DT

Data Preparation

Code
FARES <- readxl::read_xlsx("2022_fare_revenue.xlsx") |>
  select(-`State/Parent NTD ID`, 
         -`Reporter Type`,
         -`Reporting Module`,
         -`TOS`,
         -`Passenger Paid Fares`,
         -`Organization Paid Fares`) |>
  filter(`Expense Type` == "Funds Earned During Period") |>
  select(-`Expense Type`) |>
  group_by(`NTD ID`,       
           `Agency Name`,  
           `Mode`) |>      
  summarize(`Total Fares` = sum(`Total Fares`)) |>
  ungroup()
`summarise()` has grouped output by 'NTD ID', 'Agency Name'. You can override
using the `.groups` argument.
Code
EXPENSES <- readr::read_csv("2022_expenses.csv") |>
  select(`NTD ID`, 
         `Agency`,
         `Total`, 
         `Mode`) |>
  mutate(`NTD ID` = as.integer(`NTD ID`)) |>
  rename(Expenses = Total) |>
  group_by(`NTD ID`, `Mode`) |>
  summarize(Expenses = sum(Expenses)) |>
  ungroup()
Rows: 3744 Columns: 29
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (10): Agency, City, State, NTD ID, Organization Type, Reporter Type, UZA...
dbl  (2): Report Year, UACE Code
num (10): Primary UZA Population, Agency VOMS, Mode VOMS, Vehicle Operations...
lgl  (7): Vehicle Operations Questionable, Vehicle Maintenance Questionable,...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
`summarise()` has grouped output by 'NTD ID'. You can override using the `.groups` argument.
Code
FINANCIALS <- inner_join(FARES, EXPENSES, join_by(`NTD ID`, `Mode`))

TRIPS <- readxl::read_xlsx("ridership.xlsx", sheet="UPT") |>
  filter(`Mode/Type of Service Status` == "Active") |>
  select(-`Legacy NTD ID`, 
         -`Reporter Type`, 
         -`Mode/Type of Service Status`, 
         -`UACE CD`, 
         -`TOS`) |>
  pivot_longer(-c(`NTD ID`:`3 Mode`), 
               names_to="month", 
               values_to="UPT") |>
  drop_na() |>
  mutate(month=my(month)) 
MILES <- readxl::read_xlsx("ridership.xlsx", sheet="VRM") |>
  filter(`Mode/Type of Service Status` == "Active") |>
  select(-`Legacy NTD ID`, 
         -`Reporter Type`, 
         -`Mode/Type of Service Status`, 
         -`UACE CD`, 
         -`TOS`) |>
  pivot_longer(-c(`NTD ID`:`3 Mode`), 
               names_to="month", 
               values_to="VRM") |>
  drop_na() |>
  group_by(`NTD ID`, `Agency`, `UZA Name`, 
           `Mode`, `3 Mode`, month) |>
  summarize(VRM = sum(VRM)) |>
  ungroup() |>
  mutate(month=my(month))
`summarise()` has grouped output by 'NTD ID', 'Agency', 'UZA Name', 'Mode', '3
Mode'. You can override using the `.groups` argument.
Code
USAGE <- inner_join(TRIPS, MILES) |>
  mutate(`NTD ID` = as.integer(`NTD ID`))
Joining with `by = join_by(`NTD ID`, Agency, `UZA Name`, Mode, `3 Mode`,
month)`
Code
sample_n(USAGE, 1000) |> 
  mutate(month=as.character(month)) |> 
  DT::datatable()

Task 1

Rename a column: UZA Name to metro_area.

Code
USAGE <- USAGE %>%
  rename(
    metro_area = `UZA Name`,
    Unlinked_Passenger_Trips = UPT,
    Vehicle_Revenue_Miles = VRM
  )

Task 2

Recoding the Mode column

Code
unique_modes <- USAGE %>% distinct(Mode)
print(unique_modes)
# A tibble: 18 × 1
   Mode 
   <chr>
 1 DR   
 2 FB   
 3 MB   
 4 SR   
 5 TB   
 6 VP   
 7 CB   
 8 RB   
 9 LR   
10 YR   
11 MG   
12 CR   
13 AR   
14 TR   
15 HR   
16 IP   
17 PB   
18 CC   
Code
USAGE <- USAGE |>
  mutate(Mode=case_when(
    Mode == "HR" ~ "Heavy Rail", 
    Mode == "DR" ~ "Demand Response",
    Mode == "FB" ~ "Ferry Boat",
    Mode == "MB" ~ "Bus",
    Mode == "SR" ~ "Streetcar Rail",
    Mode == "TB" ~ "Trolleybus",
    Mode == "VP" ~ "Vanpool",
    Mode == "CB" ~ "Commuter Bus",
    Mode == "RB" ~ "Rapid Bus",
    Mode == "LR" ~ "Light Rail",
    Mode == "YR" ~ "Hybrid Rail",
    Mode == "MG" ~ "Monorail/Automated Guideway",
    Mode == "CR" ~ "Commuter Rail",
    Mode == "AR" ~ "Alaska Railroad",
    Mode == "TR" ~ "Tram",
    Mode == "IP" ~ "Inclined Plane",
    Mode == "PB" ~ "Public Bike",
    Mode == "CC" ~ "Cable Car",
    TRUE ~ "Unknown"))

sample_n(USAGE, 1000) %>%
  select(-`NTD ID`, -`3 Mode`) %>%
  mutate(month = as.character(month)) %>%
  DT::datatable(options = list(pageLength = 25, lengthMenu = c(25, 50, 100), scrollX = TRUE))

Data Analysis

Task 3: Analysis of Transit Data

1. Which Transit Agency Had the Most Total VRM?

The MTA New York City transportation is the transportation company with the most total Vehicle Revenue Miles (VRM) in our dataset. This agency has a total VRM of 1,083,285,350 on file. This suggests that MTA New York City Transit is a major participant in terms of operational reach since it offers a wide range of vehicle services.

Code
most_vrm_agency <- USAGE %>%
  group_by(Agency) %>%
  summarize(total_VRM = sum(Vehicle_Revenue_Miles, na.rm = TRUE)) %>%
  arrange(desc(total_VRM)) %>%
  slice(1)

2. What transit mode had the most total VRM in our data set?

The ‘Bus’ mode is the transportation mode with the largest total Vehicle Revenue Miles (VRM) in our dataset, with 4,944,449,088 VRM. This shows how many service kilometers buses go, underscoring their important function in the transportation network.

Code
most_vrm_mode <- USAGE %>%
  group_by(Mode) %>%
  summarize(total_VRM = sum(Vehicle_Revenue_Miles, na.rm = TRUE)) %>%
  arrange(desc(total_VRM)) %>%
  slice(1) 

3. How many trips were taken on the NYC Subway (Heavy Rail) in May 2024?

1,804,588,19 unlinked passenger journeys were made on the MTA New York City Transit-operated NYC Subway (Heavy Rail) in May 2024. This graph illustrates how frequently the subway system was used in that particular month.

Code
nyc_subway_may_2024 <- USAGE %>%
  filter(Agency == "MTA New York City Transit", Mode == "Heavy Rail", month == as.Date("2024-05-01")) %>%
  summarize(total_trips = sum(Unlinked_Passenger_Trips, na.rm = TRUE))

5. How much did NYC subway ridership fall between April 2019 and April 2020?

There was a notable decline in the number of riders of the NYC subway system from April 2019 to April 2020. There was a 2,119,696,600 reduction in unlinked passenger journeys overall. The COVID-19 epidemic, which resulted in significant cutbacks in public transportation utilization during that time, is blamed for this steep fall.

Code
nyc_subway_ridership_fall <- USAGE %>%
  filter(Agency == "MTA New York City Transit", Mode == "Heavy Rail", month %in% as.Date(c("2019-04-01", "2020-04-01"))) %>%
  group_by(month) %>%
  summarize(total_trips = sum(Unlinked_Passenger_Trips, na.rm = TRUE))

fall_in_ridership <- diff(nyc_subway_ridership_fall$total_trips)

Task 4

Which agency recorded the highest number of unlinked passenger trips (UPT) in 2022.

With 2,278,853,098 unlinked passenger journeys (UPT) in total, the MTA New York City Transit was the agency that logged the most UPT in 2022. In a heavily populated urban region, this emphasizes the critical role the NYC transportation system plays in facilitating passenger movement.

Code
top_agency_2022 <- USAGE %>%
  filter(year(month) == 2022) %>%
  group_by(Agency) %>%
  summarize(total_trips = sum(Unlinked_Passenger_Trips, na.rm = TRUE)) %>%
  arrange(desc(total_trips)) %>%
  slice(1)

Identify the mode of transport that showed the highest percentage growth in unlinked passenger trips (UPT) from 2021 to 2022.

According to the available statistics, Cable Car demonstrated the greatest percentage growth in unlinked passenger trips (UPT) between 2021 and 2022, growing at a pace of almost 221.15%. When compared to other forms of transportation, this suggests a notable comeback or rise in the use of cable cars during this time.

Code
growth_mode <- USAGE %>%
  filter(year(month) %in% c(2021, 2022)) %>%
  group_by(Mode, year = year(month)) %>%
  summarize(total_trips = sum(Unlinked_Passenger_Trips, na.rm = TRUE)) %>%
  spread(year, total_trips) %>%
  mutate(growth = (`2022` - `2021`) / `2021` * 100) %>%
  arrange(desc(growth)) %>%
  slice(1)
`summarise()` has grouped output by 'Mode'. You can override using the
`.groups` argument.
Code
print(growth_mode)
# A tibble: 18 × 4
# Groups:   Mode [18]
   Mode                            `2021`     `2022` growth
   <chr>                            <dbl>      <dbl>  <dbl>
 1 Alaska Railroad                 166263     219757  32.2 
 2 Bus                         2361047121 2852306101  20.8 
 3 Cable Car                       861664    2767193 221.  
 4 Commuter Bus                  21532192   33158947  54.0 
 5 Commuter Rail                169819029  279827042  64.8 
 6 Demand Response               61230716   74498027  21.7 
 7 Ferry Boat                    44537213   56512182  26.9 
 8 Heavy Rail                  1676806531 2304880039  37.5 
 9 Hybrid Rail                    2892052    4608930  59.4 
10 Inclined Plane                  744211     671491  -9.77
11 Light Rail                   206934482  285653028  38.0 
12 Monorail/Automated Guideway    6076407    9807444  61.4 
13 Public Bike                    4284974    3223453 -24.8 
14 Rapid Bus                     38592406   44597792  15.6 
15 Streetcar Rail                21211918   31665602  49.3 
16 Tram                                 0     687131 Inf   
17 Trolleybus                    36366924   48144006  32.4 
18 Vanpool                       15649743   19220466  22.8 

Which transit mode was the most common in terms of vehicle revenue miles (VRM) in 2022.

With a total of 2,178,902,024 vehicle revenue miles in 2022, bus travel was the most popular mode of transportation (VRM). This suggests that, in terms of distance traveled in revenue-generating services, buses were the most popular form of transportation.

Code
common_mode_2022 <- USAGE %>%
filter(year(month) == 2022) %>%
  group_by(Mode) %>%
  summarize(total_vrm = sum(Vehicle_Revenue_Miles, na.rm = TRUE)) %>%
  arrange(desc(total_vrm)) %>%
  slice(1)

Table Summization

Task 5

Creating a new table from USAGE that has annual total (sum) UPT and VRM for 2022.

Code
USAGE_2022_ANNUAL <- USAGE %>%
  mutate(year = year(as.Date(month))) %>% 
  filter(year == 2022) %>%  
  group_by(`NTD ID`, Agency, metro_area, Mode) %>%  
  summarize(
    UPT = sum(Unlinked_Passenger_Trips, na.rm = TRUE),  
    VRM = sum(Vehicle_Revenue_Miles, na.rm = TRUE)  
  ) %>%
  ungroup()
`summarise()` has grouped output by 'NTD ID', 'Agency', 'metro_area'. You can
override using the `.groups` argument.
Code
USAGE_2022_ANNUAL <- USAGE_2022_ANNUAL %>%
  mutate(
    Mode = case_when(
      Mode == "Bus" ~ "MB",
      Mode == "Heavy Rail" ~ "HR",
      Mode == "Commuter Rail" ~ "CR",
      Mode == "Ferry Boat" ~ "FB",
      Mode == "Light Rail" ~ "LR",
      Mode == "Trolleybus" ~ "TB",
      Mode == "Vanpool" ~ "VP",
      Mode == "Demand Response" ~ "DR",
      Mode == "Streetcar Rail" ~ "SR",
      Mode == "Hybrid Rail" ~ "YR",
      Mode == "Monorail/Automated Guideway" ~ "MG",
      Mode == "Alaska Railroad" ~ "AR",
      Mode == "Tram" ~ "TR",
      Mode == "Inclined Plane" ~ "IP",
      Mode == "Public Bike" ~ "PB",
      Mode == "Cable Car" ~ "CC",
      
      TRUE ~ "Unknown"
    )
  )

USAGE_AND_FINANCIALS <- left_join(USAGE_2022_ANNUAL, FINANCIALS, by = c("NTD ID", "Mode")) %>%
  drop_na()


print(USAGE_AND_FINANCIALS)
# A tibble: 1,043 × 9
   `NTD ID` Agency    metro_area Mode     UPT    VRM `Agency Name` `Total Fares`
      <dbl> <chr>     <chr>      <chr>  <dbl>  <dbl> <chr>                 <dbl>
 1        1 King Cou… Seattle--… MB    5.40e7 6.16e7 King County …      56846337
 2        1 King Cou… Seattle--… DR    6.63e5 1.29e7 King County …        832327
 3        1 King Cou… Seattle--… FB    4.00e5 5.12e4 King County …       1715265
 4        1 King Cou… Seattle--… SR    1.12e6 1.80e5 King County …        588495
 5        1 King Cou… Seattle--… TB    9.58e6 2.64e6 King County …      10123486
 6        1 King Cou… Seattle--… VP    7.03e5 4.41e6 King County …       5484481
 7        2 Spokane … Spokane, … MB    6.60e6 6.49e6 Spokane Tran…       6135110
 8        2 Spokane … Spokane, … DR    3.10e5 4.04e6 Spokane Tran…        531284
 9        2 Spokane … Spokane, … VP    9.06e4 9.06e5 Spokane Tran…        247718
10        3 Pierce C… Seattle--… MB    4.95e6 4.23e6 Pierce Count…       4356831
# ℹ 1,033 more rows
# ℹ 1 more variable: Expenses <dbl>

Farebox Recovery Among Major Systems

Task 6

Which transit system (agency and mode) had the most UPT in 2022?

With a total of 1,793,073,801 journeys, the MTA New York City Transit operating in the Heavy Rail (HR) mode was the transit system with the most unlinked passenger trips (UPT) in 2022. This suggests that New York City’s subway system is used rather frequently.

Code
most_upt <- USAGE_AND_FINANCIALS %>%
  arrange(desc(UPT)) %>%
  slice(1) %>%
  select(Agency, Mode, UPT)

Which transit system (agency and mode) had the highest farebox recovery, defined as the highest ratio of Total Fares to Expenses?

With a farebox recovery ratio of 2.38—the ratio of total fares to expenses—the Transit Authority, which operated the Vanpool (VP) mode, was the transit system with the greatest farebox recovery. This implies that the vanpool service is quite effective at using fare income to pay for its operating costs.

Code
highest_farebox_recovery <- USAGE_AND_FINANCIALS %>%
  mutate(farebox_recovery = `Total Fares` / Expenses) %>%
  arrange(desc(farebox_recovery)) %>%
  slice(1) %>%
  select(Agency, Mode, farebox_recovery)

Which transit system (agency and mode) has the lowest expenses per UPT?

With costs of around 1.18 per trip, the North Carolina State Agency, which runs the Bus (MB) mode, is the transportation system with the lowest expenses per unlinked passenger trip (UPT). This suggests a very economical bus service.

Code
lowest_expenses_per_upt <- USAGE_AND_FINANCIALS %>%
  mutate(expenses_per_upt = Expenses / UPT) %>%
  arrange(expenses_per_upt) %>%
  slice(1) %>%
  select(Agency, Mode, expenses_per_upt)

Which transit system (agency and mode) has the highest total fares per UPT?

With an average fee of 660.12 per trip, Altoona Metro Transit, which operates in the Demand Response (DR) mode, is the transit system with the highest total charges per UPT. This points to a demand-responsive fare structure that is expensive or premium services.

Code
highest_fares_per_upt <- USAGE_AND_FINANCIALS %>%
  mutate(fares_per_upt = `Total Fares` / UPT) %>%
  arrange(desc(fares_per_upt)) %>%
  slice(1) %>%
  select(Agency, Mode, fares_per_upt)

Which transit system (agency and mode) has the lowest expenses per VRM?

The New Mexico Department of Transportation, which uses the Vanpool (VP) mode, is the transportation system with the lowest costs per vehicle revenue mile (VRM), spending only 0.34 per VRM. This suggests a particularly economical operation in relation to the miles of the vehicle.

Code
lowest_expenses_per_vrm <- USAGE_AND_FINANCIALS %>%
  mutate(expenses_per_vrm = Expenses / VRM) %>%
  arrange(expenses_per_vrm) %>%
  slice(1) %>%
  select(Agency, Mode, expenses_per_vrm)

Which transit system (agency and mode) has the highest total fares per VRM?

The Chicago Water Taxi, which uses the Ferry Boat (FB) mode and generates around 237.46 in revenues per VRM, is the transportation system with the highest total fares per VRM. This implies a high income generation per mile traveled, which might be brought about by a high amount of passengers or premium services.

Code
highest_fares_per_vrm <- USAGE_AND_FINANCIALS %>%
  mutate(fares_per_vrm = `Total Fares` / VRM) %>%
  arrange(desc(fares_per_vrm)) %>%
  slice(1) %>%
  select(Agency, Mode, fares_per_vrm)

Conclusion

Based on all of this, what do you believe to be the most efficient transit system in the country? (Your answer may differ depending on which form of ‘efficiency’ you care most about)?

Farebox Recovery (income Efficiency): If you were to rank the transit systems based on how much revenue they generate relative to their expenses, the system with the largest farebox recovery would be considered the most efficient. This illustrates how effectively fare income offsets system expenses. The system with the lowest expenses per Vehicle Revenue Mile (VRM) or Unlinked is the one with the most operational efficiency (Cost per Vehicle Revenue Mile, or UPT). If operational efficiency is more important, passenger trip (UPT). This demonstrates which system can continue to provide services with low operational expenses. If the objective is to maximize fare income per trip or mile, the system with the greatest total fares per UPT or per VRM may be examined. This is known as user efficiency (total fares per UPT or per VRM).