Project link: https://github.com/eric-mc2/DNCTransit
In a previous post I proposed a statistical model to estimate whether the 2024 Chicago DNC affected public transit usage across Chicago.
The learning objective was to was to try to construct a believable model from a real-life event – or to convince myself that the model is too flawed to be trustworthy.
I gathered ridership data for train, bike, and rideshares, plus the dates and locations of the DNC. Then I estimated two regression models: a fixed-effects model and a difference-in-difference model. The results indicate that the DNC caused ridership to increase near the convention centers. However, mobility across the rest of Chicago also declined during that week.
Do we belive this design? I’ll list several issues that I thought of.
(Key: π‘ = dealbreaker; π = bad but maybe fixable; π«€ = not a dealbreaker)
Control Mis-Specification - π‘
Selection Bias - π‘
Substitutability of Transit - π
Spillover into control - π
Since the effects were statistically significant, I am not worried about these attenuation biases.
Spillover into treatment - π
This concern can be mitigated (for tract-level model) with a robustness check, by varying whether the buffer must contrain 100% of tract land area, 75%, 50%, etc.
Confounding Variables - π
It will be hard to disentagle the security effect (-) from the DNC effect (+). One way might be to test the sensitivity of the model to varying buffer sizes (smaller than the perimeter, equal to the perimeter, larger than the perimeter).
Gravity and Catchement Models - π«€
One way to mitigate selection bias is to choose control units that are “like” treatement units’ at baseline. I can model
$$ P(\text{near convention} | X) $$and then find other units with high probabilities that were in fact not near the DNC. Unfortunately, I don’t have a strong theoretical definition of “likeness”, nor the data to measure it. Ideally I’d like to operationalize “ability to handle large crowds”. I didn’t see maximum fire code capacity in Chicago’s building footprint dataset. But I can measure attendance at crowded events.
I’ll compare the DNC locations (United Center and McCormick Place) to Chicago’s other major event venues. I chose Wrigley Field, Guaranteed Rate Field, and Soldier Field because per-game sports attendance data is readily available1. I also pull in conference event data for McCormick Place2.
Now the control group is more “similar” in terms of its transit patterns. I’ll include event attendance as a variable in the regression.
One drawback of this method is that I can only now compare “event days”, drastically reducing the sample size. Worse, Soldier Field and Guaranteed Rate Field do not have games during the DNC, reducing the active control group just to transit options near Wrigley Field.
The sample sizes are just too small.
On second thought, using attendance as a regression variable makes it hard to interpret our treatment effect. Ceteris peribus, I’d be modeling the effect of the DNC in excess of the DNC attendees. That’s not at all what I want.
Why not drop the attendance term, but keep the reduced sample of transit near stadiums on all game/non-game days. Now the treatment and control are much more similar in terms of transit density:
Not Near DNC | Near DNC | P-Value | ||
---|---|---|---|---|
train | stations | 12 | 8 | |
station-days | 732 | 468 | ||
daily rides, mean (SD) | 3098.3 (2692.9) | 2093.5 (1985.5) | <0.001 | |
log(daily rides), mean (SD) | 7.7 (0.9) | 7.0 (1.8) | <0.001 | |
bus_distance, mean (SD) | 230.0 (340.2) | 165.3 (167.7) | 0.580 | |
bike_distance, mean (SD) | 232.1 (164.3) | 288.5 (281.6) | 0.620 | |
sqrt(area), mean (SD) | 1919.2 (1141.3) | 3123.8 (3564.8) | 0.382 | |
lat, mean (SD) | 0.4 (1.2) | -0.3 (0.3) | 0.073 | |
long, mean (SD) | -0.1 (0.9) | -0.8 (1.1) | 0.134 | |
bike | docks | 75 | 47 | |
dock-days | 3986 | 2846 | ||
daily rides, mean (SD) | 84.1 (64.9) | 66.7 (55.9) | <0.001 | |
log(daily rides), mean (SD) | 4.0 (1.2) | 3.9 (0.9) | <0.001 | |
train_distance, mean (SD) | 1232.8 (1050.7) | 1056.5 (1240.4) | 0.420 | |
bus_distance, mean (SD) | 140.0 (145.4) | 105.6 (73.5) | 0.087 | |
sqrt(area), mean (SD) | 603.4 (215.9) | 734.0 (711.5) | 0.227 | |
lat, mean (SD) | 0.3 (1.3) | -0.4 (0.4) | <0.001 | |
long, mean (SD) | 0.0 (0.8) | -0.2 (1.2) | 0.179 | |
uber | tracts | 64 | 40 | |
tract-days | 3877 | 2428 | ||
daily rides, mean (SD) | 746.7 (2018.7) | 927.5 (1896.5) | <0.001 | |
log(daily rides), mean (SD) | 5.5 (1.5) | 5.7 (1.5) | <0.001 | |
train_distance, mean (SD) | 2010.3 (1149.6) | 2293.7 (1027.7) | 0.195 | |
bus_distance, mean (SD) | 567.4 (324.9) | 503.0 (313.2) | 0.317 | |
bike_distance, mean (SD) | 840.2 (357.8) | 889.5 (504.9) | 0.592 | |
sqrt(area), mean (SD) | 608.4 (254.8) | 784.4 (320.0) | 0.004 | |
lat, mean (SD) | 0.5 (1.2) | -0.3 (0.4) | <0.001 | |
long, mean (SD) | -0.0 (0.7) | -0.5 (1.2) | 0.028 |
I estimate the same difference in difference model as before:
$$ \log{rides_{it}} \sim \beta_0 + \beta_1 \text{DNC}_t + \beta_2 \text{near DNC}_i + \beta_3 \text{DNC}_t \text{near DNC}_i + X_{it} + u_{it} $$DiD (Uber) | DiD (Train) | DiD (Bike) | |
---|---|---|---|
Near DNC | 0.6760* | 0.1569 | -0.0082 |
(0.3960) | (0.4565) | (0.1781) | |
During DNC | -0.1139*** | -0.1447 | -0.0957 |
(0.0412) | (0.0980) | (0.0737) | |
Near DNC:During DNC | 0.2746*** | 0.8982 | 0.4492*** |
(0.0854) | (0.6198) | (0.1048) | |
log(dist to train) | -0.2957 | 0.1952*** | |
(0.1995) | (0.0526) | ||
log(dist to bike) | 0.1554 | 0.5530 | |
(0.3104) | (0.3475) | ||
log(dist to bus) | 0.2676* | -0.0356 | -0.0679 |
(0.1437) | (0.1728) | (0.1009) | |
R-squared | 0.5182 | 0.4228 | 0.4733 |
R-squared Adj. | 0.5170 | 0.4159 | 0.4722 |
N | 6718.0 | 1280.0 | 7026.0 |
In the control group we observe a (non-causal) -10.8%, -13.5% (NS), -9.1% (NS) percentage-point change in rideshare, train, and bike rides, which agrees directionally with the un-subsetted data. The causal effect of the DNC on rideshares, train, and bike rides near the DNC is a +31.6%, +145.5% (NS), and +56.7% percentage-point change, an increase in magnitude.
As a robustness check on this model, I plot the parallel trends and run a placebo test, shifting the simulated treatment period back across 8 x 4-day windows. 38% of the simulated models returned statistically significant main effects, but the actual DNC effects (x’s) were much larger than the simulated effects (box & whisker).
I attempt to overcome selection effect bias by a pseudo-matched pairs method, comparing transit ridership near large event centers in Chicago. The results agree directionally and give credence to the original experimental design.
CSV’s downloaded from the sports-reference family of websites. ↩︎
Scraping events listed on tradefest.io ↩︎