Posted by Curt on 2 December, 2020 at 8:49 am. 4 comments already!


By Carl Bell

Identifying Voter Fraud through Suspicious Birthday Distributions in Pennsylvania Voter Registration Data, and the Effect on the 2020 Pennsylvania Presidential Election

Short Summary:

We construct a new metric of potential voter fraud using suspicious distributions of birthdays in Pennsylvania voter registration data. The basic idea is that people picking fake birthdays will make predictable non-random choices, like picking round numbers for days of the month, and not knowing what true birth month distributions look like. 

Under this metric, a number of counties in Pennsylvania have extremely unlikely distributions of voter birthdays. Seven counties representing almost 1.4 million votes total (Northumberland, Delaware, Montgomery, Lawrence, Dauphin, LeHigh, and Luzerne) have suspicious birthdays above the 99.5th percentile of plausible distributions, even when using conservative assumptions about what these distributions should look like.

These suspicious birthdays also matter significantly for election outcomes. While there are suspicious counties that vote Republican overall, in general more suspicious birthdays in a county are strongly associated with a larger Biden vote share, and a higher Biden vote share relative to all Democrat presidential candidates since 2000. More suspicious birthdays are also associated with a higher vote share for Jorgensen relative to Trump (consistent with a fraud scheme aiming to get Biden high but not “too high”, while simultaneously giving as few votes to Trump as possible). 

Finally, we quantify the magnitude of how this potential fraud may have impacted the election. Even a small reduction in the amount of suspicious birthdays (to the 98th percentile of the conservative distribution) would be predicted to have resulted in Trump winning the state by 71,500 votes. This suggests that whatever is driving the anomalous patterns in birthdays is sufficiently important to affect the statewide election result.

Executive Summary:

We use a largely ignored data source to identify suspicious voter registrations by county, a data source that is independent of the actual vote outcomes. In other words, we will construct metrics that identify counties that show indications of potential voter fraud regardless of who a county is voting for. Then, once this is done, we will show how these measures correlate with vote outcomes.

Our key insight is that someone making up fake birthdays for voter registrations is unlikely to be able to do so in a truly random manner. Instead, we identify several likely hallmarks of fake birthdays:

-They are likely to excessively cluster on round number days of the month (1, 10, 15, 20, 30, 31), since people generally overweight round numbers. 

-They are likely to excessively cluster on January and December for the same reason.

-They are likely to excessively cluster on months of the year which in general have few birthdays in overall demographic data (i.e. fake birthdays will be drawn roughly evenly across months, subject to the round number effect above, while true birthdays tend to cluster more in certain months like July and August, and less in months like February and November).

We call these “suspicious birthdays” — individually any one person can easily have any of the traits above, but having too many overall in a county suggests that fake birthdays have been added to the pool. We take these three measures of suspicious birthdays, and evaluate them against a combination of two types of benchmarks of what might be expected in the absence of fraud. These are designed to ensure that any unusual patterns are not coming from other reasons (e.g. births generally avoiding holidays, or people generally having sex more at certain times of the year):

-A “best guess” benchmark, where we compare each county to overall demographic data:
Historical Day-of-the-month in birthdays from the Social Security Master Death File, and
Historical Month-of-the-year in birthdays for that county from the National Center for Health Statistics

-A “conservative” benchmark, where we also add in measures of each county relative to the distribution of all voter birthdays in the state of Pennsylvania. This has the effect of measuring how unusual each county looks just compared to other counties, and so effectively strips out the average level of fraud across all counties. 

We find that even under the conservative distribution, seven counties representing almost 1.4 million votes total (Northumberland, Delaware, Montgomery, Lawrence, Dauphin, LeHigh, and Luzerne) have numbers of suspicious birthdays above the 99.5th percentile of plausible distributions. This represents the average abnormal metric across six different ways of measuring suspicious birthdays. In other words, these counties are not abnormal just along one or two measures, but across the whole range of them. The three worst offenders, Northumberland, Delaware and Montgomery, are above the 99.97th percentile, the 99.91th percentile, and 99.74th percentile respectively, a result extremely unlikely to occur by chance. Montgomery also has significant evidence of voter fraud across entirely separate measures. Meanwhile, 15 counties score above the 95th percentile of abnormal birthdays on average, and these represent almost 3.5 million votes (the additional eight are Berks, Northampton, Cumberland, Bucks, Philadelphia, Monroe, Lancaster and Erie). Recall, these measures are under the conservative benchmark – under the best guess benchmark, the deviations look even more extreme.

Read more

0 0 votes
Article Rating
Would love your thoughts, please comment.x